Official code for "Towards An End-to-End Framework for Flow-Guided Video Inpainting" (CVPR2022)

E2FGVI (CVPR 2022)

PWC PWC

Python 3.7 pytorch 1.6.0

English | 简体中文

This repository contains the official implementation of the following paper:

Towards An End-to-End Framework for Flow-Guided Video Inpainting
Zhen Li#, Cheng-Ze Lu#, Jianhua Qin, Chun-Le Guo*, Ming-Ming Cheng
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

[Paper] [Demo Video (Youtube)] [演示视频 (B站)] [Project Page (TBD)] [Poster (TBD)]

You can try our colab demo here: Open In Colab

News

  • 2022.05.15: We release E2FGVI-HQ, which can handle videos with arbitrary resolution. This model could generalize well to much higher resolutions, while it only used 432x240 videos for training. Besides, it performs better than our original model on both PSNR and SSIM metrics. 🔗 Download links: [Google Drive] [Baidu Disk] 🎥 Demo video: [Youtube] [B站]

  • 2022.04.06: Our code is publicly available.

Demo

teaser

More examples (click for details):

Coco (click me)
Tennis
Space
Motocross

Overview

overall_structure

🚀 Highlights:

  • SOTA performance: The proposed E2FGVI achieves significant improvements on all quantitative metrics in comparison with SOTA methods.
  • Highly effiency: Our method processes 432 × 240 videos at 0.12 seconds per frame on a Titan XP GPU, which is nearly 15× faster than previous flow-based methods. Besides, our method has the lowest FLOPs among all compared SOTA methods.

Work in Progress

  • Update website page
  • Hugging Face demo
  • Efficient inference

Dependencies and Installation

  1. Clone Repo

    git clone https://github.com/MCG-NKU/E2FGVI.git
  2. Create Conda Environment and Install Dependencies

    conda env create -f environment.yml
    conda activate e2fgvi
    • Python >= 3.7
    • PyTorch >= 1.5
    • CUDA >= 9.2
    • mmcv-full (following the pipeline to install)

    If the environment.yml file does not work for you, please follow this issue to solve the problem.

Get Started

Prepare pretrained models

Before performing the following steps, please download our pretrained model first.

Model 🔗 Download Links Support Arbitrary Resolution ? PSNR / SSIM / VFID (DAVIS)
E2FGVI [Google Drive] [Baidu Disk] 33.01 / 0.9721 / 0.116
E2FGVI-HQ [Google Drive] [Baidu Disk] 33.06 / 0.9722 / 0.117

Then, unzip the file and place the models to release_model directory.

The directory structure will be arranged as:

release_model
   |- E2FGVI-CVPR22.pth
   |- E2FGVI-HQ-CVPR22.pth
   |- i3d_rgb_imagenet.pt (for evaluating VFID metric)
   |- README.md

Quick test

We provide two examples in the examples directory.

Run the following command to enjoy them:

# The first example (using split video frames)
python test.py --model e2fgvi (or e2fgvi_hq) --video examples/tennis --mask examples/tennis_mask  --ckpt release_model/E2FGVI-CVPR22.pth (or release_model/E2FGVI-HQ-CVPR22.pth)
# The second example (using mp4 format video)
python test.py --model e2fgvi (or e2fgvi_hq) --video examples/schoolgirls.mp4 --mask examples/schoolgirls_mask  --ckpt release_model/E2FGVI-CVPR22.pth (or release_model/E2FGVI-HQ-CVPR22.pth)

The inpainting video will be saved in the results directory. Please prepare your own mp4 video (or split frames) and frame-wise masks if you want to test more cases.

Note: E2FGVI always rescales the input video to a fixed resolution (432x240), while E2FGVI-HQ does not change the resolution of the input video. If you want to custom the output resolution, please use the --set_size flag and set the values of --width and --height.

Example:

# Using this command to output a 720p video
python test.py --model e2fgvi_hq --video <video_path> --mask <mask_path>  --ckpt release_model/E2FGVI-HQ-CVPR22.pth --set_size --width 1280 --height 720

Prepare dataset for training and evaluation

Dataset YouTube-VOS DAVIS
Details For training (3,471) and evaluation (508) For evaluation (50 in 90)
Images [Official Link] (Download train and test all frames) [Official Link] (2017, 480p, TrainVal)
Masks [Google Drive] [Baidu Disk] (For reproducing paper results)

The training and test split files are provided in datasets/<dataset_name>.

For each dataset, you should place JPEGImages to datasets/<dataset_name>.

Then, run sh datasets/zip_dir.sh (Note: please edit the folder path accordingly) for compressing each video in datasets/<dataset_name>/JPEGImages.

Unzip downloaded mask files to datasets.

The datasets directory structure will be arranged as: (Note: please check it carefully)

datasets
   |- davis
      |- JPEGImages
         |- <video_name>.zip
         |- <video_name>.zip
      |- test_masks
         |- <video_name>
            |- 00000.png
            |- 00001.png   
      |- train.json
      |- test.json
   |- youtube-vos
      |- JPEGImages
         |- <video_id>.zip
         |- <video_id>.zip
      |- test_masks
         |- <video_id>
            |- 00000.png
            |- 00001.png
      |- train.json
      |- test.json   
   |- zip_file.sh

Evaluation

Run one of the following commands for evaluation:

 # For evaluating E2FGVI model
 python evaluate.py --model e2fgvi --dataset <dataset_name> --data_root datasets/ --ckpt release_model/E2FGVI-CVPR22.pth
 # For evaluating E2FGVI-HQ model
 python evaluate.py --model e2fgvi_hq --dataset <dataset_name> --data_root datasets/ --ckpt release_model/E2FGVI-HQ-CVPR22.pth

You will get scores as paper reported if you evaluate E2FGVI. The scores of E2FGVI-HQ can be found in [Prepare pretrained models].

The scores will also be saved in the results/<model_name>_<dataset_name> directory.

Please --save_results for further evaluating temporal warping error.

Training

Our training configures are provided in train_e2fgvi.json (for E2FGVI) and train_e2fgvi_hq.json (for E2FGVI-HQ).

Run one of the following commands for training:

 # For training E2FGVI
 python train.py -c configs/train_e2fgvi.json
 # For training E2FGVI-HQ
 python train.py -c configs/train_e2fgvi_hq.json

You could run the same command if you want to resume your training.

The training loss can be monitored by running:

tensorboard --logdir release_model                                                   

You could follow this pipeline to evaluate your model.

Results

Quantitative results

quantitative_results

Citation

If you find our repo useful for your research, please consider citing our paper:

@inproceedings{liCvpr22vInpainting,
   title={Towards An End-to-End Framework for Flow-Guided Video Inpainting},
   author={Li, Zhen and Lu, Cheng-Ze and Qin, Jianhua and Guo, Chun-Le and Cheng, Ming-Ming},
   booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
   year={2022}
}

Contact

If you have any question, please feel free to contact us via zhenli1031ATgmail.com or czlu919AToutlook.com.

License

Licensed under a Creative Commons Attribution-NonCommercial 4.0 International for Non-commercial use only. Any commercial use should get formal permission first.

Acknowledgement

This repository is maintained by Zhen Li and Cheng-Ze Lu.

This code is based on STTN, FuseFormer, Focal-Transformer, and MMEditing.

Owner
Media Computing Group @ Nankai University
Media Computing Group at Nankai University, led by Prof. Ming-Ming Cheng.
Media Computing Group @ Nankai University
Comments
  • Solving environment: failed

    Solving environment: failed

    I try installing on both windows and linux however I get Solving environment: failed

    conda env create -f environment.yml Collecting package metadata (repodata.json): done Solving environment: failed

    ResolvePackageNotFound:

    • libffi==3.3=he6710b0_2
    • lcms2==2.12=h3be6417_0
    • matplotlib-base==3.4.2=py37hab158f2_0
    • tornado==6.1=py37h27cfd23_0
    • brotli==1.0.9=he6710b0_2
    • scipy==1.6.2=py37had2a1c9_1
    • bzip2==1.0.8=h7b6447c_0
    • locket==0.2.1=py37h06a4308_1
    • libpng==1.6.37=hbc83047_0
    • ffmpeg==4.2.2=h20bf706_0
    • freetype==2.10.4=h5ab3b9f_0
    • expat==2.4.1=h2531618_2
    • xz==5.2.5=h7b6447c_0
    • ncurses==6.2=he6710b0_1
    • openh264==2.1.0=hd408876_0
    • qt==5.9.7=h5867ecd_1pt150_0
    • pywavelets==1.1.1=py37h7b6447c_2
    • libgfortran-ng==7.5.0=ha8ba4b0_17
    • libwebp-base==1.2.0=h27cfd23_0
    • pcre==8.45=h295c915_0
    • jpeg==9d=h7f8727e_0
    • ca-certificates==2022.2.1=h06a4308_0
    • certifi==2021.10.8=py37h06a4308_2
    • gstreamer==1.14.0=h28cd5cc_2
    • lame==3.100=h7b6447c_0
    • libtiff==4.2.0=h85742a9_0
    • tk==8.6.11=h1ccaba5_0
    • glib==2.69.1=h5202010_0
    • pillow==8.3.1=py37h2c7a002_0
    • libgcc-ng==9.3.0=h5101ec6_17
    • openssl==1.1.1m=h7f8727e_0
    • libstdcxx-ng==9.3.0=hd4cf53a_17
    • fontconfig==2.13.1=h6c09931_0
    • zstd==1.4.9=haebb681_0
    • zlib==1.2.11=h7b6447c_3
    • _openmp_mutex==4.5=1_gnu
    • pyqt==5.9.2=py37h05f1152_2
    • libvpx==1.7.0=h439df22_0
    • libgomp==9.3.0=h5101ec6_17
    • python==3.7.11=h12debd9_0
    • dbus==1.13.18=hb2f20db_0
    • x264==1!157.20191217=h7b6447c_0
    • openjpeg==2.4.0=h3ad879b_0
    • libtasn1==4.16.0=h27cfd23_0
    • lz4-c==1.9.3=h295c915_1
    • cytoolz==0.11.0=py37h7b6447c_0
    • mkl_fft==1.3.0=py37h42c9631_2
    • sqlite==3.36.0=hc218d9a_0
    • gnutls==3.6.15=he1e5248_0
    • icu==58.2=he6710b0_3
    • pytorch==1.5.1=py3.7_cuda9.2.148_cudnn7.6.3_0
    • libgfortran4==7.5.0=ha8ba4b0_17
    • yaml==0.2.5=h7b6447c_0
    • ninja==1.10.2=hff7bd54_1
    • nettle==3.7.3=hbbd107a_1
    • kiwisolver==1.3.1=py37h2531618_0
    • setuptools==58.0.4=py37h06a4308_0
    • libopus==1.3.1=h7b6447c_0
    • libunistring==0.9.10=h27cfd23_0
    • matplotlib==3.4.2=py37h06a4308_0
    • sip==4.19.8=py37hf484d3e_0
    • gmp==6.2.1=h2531618_2
    • pip==21.2.2=py37h06a4308_0
    • numpy-base==1.20.3=py37h74d4b33_0
    • libidn2==2.3.2=h7f8727e_0
    • pyyaml==5.4.1=py37h27cfd23_1
    • libxcb==1.14=h7b6447c_0
    • gst-plugins-base==1.14.0=h8213a91_2
    • ld_impl_linux-64==2.35.1=h7274673_9
    • mkl-service==2.4.0=py37h7f8727e_0
    • libuuid==1.0.3=h7f8727e_2
    • mkl_random==1.2.2=py37h51133e4_0
    • mkl==2021.3.0=h06a4308_520
    • libxml2==2.9.12=h03d6c58_0
    • intel-openmp==2021.3.0=h06a4308_3350
    • numpy==1.20.3=py37hf144106_0
  • Question about learning rate

    Question about learning rate

    你好,感谢您的工作。我有一个关于学习率的问题。我注意到您文章中写到 initial learning rate is 0.0001,reduce at 400k by factor of 10 但在对比工作fuseformer中initial learning rate is 0.01,之后分别在200k,400k和450k时reduce by factor of 10 您是否测试过这二者的区别?是什么让您选择没有follow fuseformer的配置呢? 希望得到您的解答!!!

  • Demo videos to contribute

    Demo videos to contribute

    Hi,

    Thanks for this great repo and project.

    Not really an issue, more a question: I see the demo video section is TBD, would you be interested by some inferenced test videos in the wild for the read me? I am planning to run some anyway, hopefully in the next week or so, let me know and I ll share.

    Would be great to have higher res trained model to produce better quality demo videos too, but I see it is on the book of work.

  • Output encoding settings

    Output encoding settings

    Hello. After a long while of trial and error, I managed to get this software running. It still doesn't run well, giving me OOM with more than 250 frames of 120x144 video. I have an 8GB 3060ti, which should be fine for this, in my opinion. Needing to split tasks many times is a pain, but might be manageable.

    What isn't manageable are the output settings. H.263 is outdated and with tiny input sizes and lengths, lossy is a baffling pick. Maybe I missed a customizing option somewhere? I would like to have lossless h264 or FFV1. In addition, I would like to decide the video's framerate (very important for syncing) and not have the video resized. That causes distortions that look bad.

    Thank you. Looking forward to the high-resolution model.

  • Object masks generation for custom recordings

    Object masks generation for custom recordings

    Amazing paper and results, thanks for this work! I can't wait to see future updates described in Work in Progress section! I'm interested in testing your method for object removal task on my custom videos outside of popular benchmarks. I was wondering if you could recommend any method for producing these object masks - hopefully generating one mask per object in video?

  • About Training

    About Training

    您好呀,最近在阅读此模型的论文及代码调试 我有两个问题想咨询一下您

    1. 有关数据集,以下是我的路径 1

    在我输入命令: :/data/team10/cai/E2FGV$ sh datasets/zip_dir.sh 出现文件不存在:[./datasets/davis/JPEGImages] is not exist. Please check the directory. Done! 我想知道我的路径这样是出错了嘛~ 2.第二个,在训练期间无法获取GPU,(服务器上有6张显卡,我最近一次调用有2张剩余) 屏幕截图 2022-05-18 110907 config['world_size'] = get_world_size()获取的数量为0

  • RuntimeError: modulated_deformable_im2col_impl: implementation for device cuda:0 not found

    RuntimeError: modulated_deformable_im2col_impl: implementation for device cuda:0 not found

    I am getting the following error when I try to run the test script on the following docker image. Am I missing something?

    FROM nvidia/cuda:11.6.0-base-ubuntu18.04
    LABEL maintainer="Ayush Saraf"
    ARG CONDA_PYTHON_VERSION=3
    ARG CONDA_DIR=/opt/conda
    ARG USERNAME=docker
    ARG USERID=1000
    
    # Instal basic utilities
    RUN apt-get update && \
        apt-get install -y --no-install-recommends git wget unzip bzip2 sudo build-essential ca-certificates ffmpeg libsm6 libxext6 && \
        apt-get clean && \
        rm -rf /var/lib/apt/lists/*
    # Install miniconda
    ENV PATH $CONDA_DIR/bin:$PATH
    RUN wget --quiet \
        https://repo.continuum.io/miniconda/Miniconda$CONDA_PYTHON_VERSION-latest-Linux-x86_64.sh && \
        echo 'export PATH=$CONDA_DIR/bin:$PATH' > /etc/profile.d/conda.sh && \
        /bin/bash Miniconda3-latest-Linux-x86_64.sh -b -p $CONDA_DIR && \
        rm -rf /tmp/*
    # Create the user
    RUN useradd --create-home -s /bin/bash --no-user-group -u $USERID $USERNAME && \
        chown $USERNAME $CONDA_DIR -R && \
        adduser $USERNAME sudo && \
        echo "$USERNAME ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
    USER $USERNAME
    WORKDIR /home/$USERNAME
    
    # Conda env
    RUN wget https://raw.githubusercontent.com/MCG-NKU/E2FGVI/master/environment.yml
    RUN conda env create -f environment.yml
    
  • Environment configuration

    Environment configuration

    Hi, I tried to modify the environment for training to resolve the problem incurred by DCN v2 as mentioned in #6 but things did not work out. I could not use your original configuration, which I haven't figured out why. Probably due to driver version of CUDA or something else. I tried to resume from the ckpt before loss curve goes high but did not work. I would like to follow your original configuration. Can you give some information about the system and driver version of your CUDA? Thanks a lot.

  • Not able to reproduce the results listed in the paper with my trained model

    Not able to reproduce the results listed in the paper with my trained model

    I met a problem of mode collapse when step number is larger than 300K, and with the final model I got, I am not able to reproduce the result shown int the paper. Can you give your loss curve? @Paper99

  • Not able to reproduce the results listed in the paper with my trained model

    Not able to reproduce the results listed in the paper with my trained model

    I met a problem of mode collapse when step number is larger than 300K, and with the final model I got, I am not able to reproduce the result shown int the paper. Can you give you loss curve? @MingMingCheng

  • add __init__ for setup

    add __init__ for setup

    Add init file to folder, that can use python setup tool distributed. For example, I can call core.util in my python code, when I build E2FGVI as a module

    import setuptools
    
    setuptools.setup(
        name='e2fgvi',
        version='1.0',
        description='An End-to-End Framework for Flow-Guided Video Inpainting',
        url='https://github.com/MCG-NKU/E2FGVI', 
        packages = setuptools.find_packages('./')
    )
    

    So that I can use like that:

    from E2FGVI.core.utils import to_tensors
    
  • Can replace ModulatedDeformConv2dFunction with another function? I'm having trouble converting to another format to save as torch.jit.save.

    Can replace ModulatedDeformConv2dFunction with another function? I'm having trouble converting to another format to save as torch.jit.save.

    Looks like it's caused by not being able to export ModulatedDeformConv2dFunction

    Traceback (most recent call last):
      File "test6.py", line 370, in <module>
        main_worker()
      File "test6.py", line 294, in main_worker
        traced_model.save("traced_model3.pt")
      File "/Users/mac/opt/anaconda3/envs/e2fgvi36/lib/python3.6/site-packages/torch/jit/_script.py", line 487, in save
        return self._c.save(*args, **kwargs)
    RuntimeError: 
    Could not export Python function call 'ModulatedDeformConv2dFunction'. Remove calls to Python functions before export. Did you forget to add @script or @script_method annotation? If this is a nn.ModuleList, add it to __constants__:
    

    Save code add like this place,

    ...
    with torch.no_grad():
             masked_imgs = selected_imgs * (1 - selected_masks)
             mod_size_h = 60
             mod_size_w = 108
             h_pad = (mod_size_h - h % mod_size_h) % mod_size_h
             w_pad = (mod_size_w - w % mod_size_w) % mod_size_w
             masked_imgs = torch.cat(
                 [masked_imgs, torch.flip(masked_imgs, [3])],
                 3)[:, :, :, :h + h_pad, :]
             masked_imgs = torch.cat(
                 [masked_imgs, torch.flip(masked_imgs, [4])],
                 4)[:, :, :, :, :w + w_pad]
         
             ids=torch.randint(10,(1,)) 
             print(ids.shape) 
             ids[0] =len(neighbor_ids)
             print(ids.item()) 
             # Jit.trace seems all values with tensor. so change the second integer  with tensor. And inpaint forward, use
             # l_t = num_local_frames.item()
             traced_model = torch.jit.trace(model, (masked_imgs,ids) ) 
             
             #torch.save(traced_model,"traced_model2.pt")
             traced_model.save("traced_model3.pt")
             exit()
    
  • Lossless output + more common default FPS + option to disable preview

    Lossless output + more common default FPS + option to disable preview

    I'm making this pull request to add lossless quality outputs (Using the FFV1 video codec) and to offer the option to skip video preview. I also changed the default FPS to be a more common 24000/1001 and fixed a typo (datset --> dataset).

  • VRAM limitations, static mask, output?

    VRAM limitations, static mask, output?

    Hello. Thank you for publishing your new resolution-unlocked model. I have a couple issues to report, though.

    1. I have 8 GB of VRAM to use, but get OOM with just 500 frames of 144x120 size video. Is your software trying to load all of them into VRAM at once? After all, 200 frames works. Could I set a cap on the amount of frames loaded onto VRAM at once, so that it doesn't return OOM? I'm trying to process an input that isn't a GIF.

    2. When I have a non-moving mask, could I just point to one PNG? It's an extra step of work to duplicate the mask into thousands of files.

    3. Are you planning on making the output settings more advanced, for example by using FFmpeg output instead of cv2 video writer? Even without the distortion from resizing to 432x240, the outputs aren't the quality I'd like them to be.

  • add web demo/model to Huggingface

    add web demo/model to Huggingface

    Hi, would you be interested in adding E2FGVI to Hugging Face? The Hub offers free hosting, and it would make your work more accessible and visible to the rest of the ML community. Models/datasets/spaces(web demos) can be added to a user account or organization similar to github.

    Example from other organizations: Keras: https://huggingface.co/keras-io Microsoft: https://huggingface.co/microsoft Facebook: https://huggingface.co/facebook

    Example spaces with repos: github: https://github.com/salesforce/BLIP Spaces: https://huggingface.co/spaces/salesforce/BLIP

    github: https://github.com/facebookresearch/omnivore Spaces: https://huggingface.co/spaces/akhaliq/omnivore

    and here are guides for adding spaces/models/datasets to your org

    How to add a Space: https://huggingface.co/blog/gradio-spaces how to add models: https://huggingface.co/docs/hub/adding-a-model uploading a dataset: https://huggingface.co/docs/datasets/upload_dataset.html

    Please let us know if you would be interested and if you have any questions, we can also help with the technical implementation.

Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation
Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

?? Depth-Aware Generative Adversarial Network for Talking Head Video Generation (CVPR 2022) ?? If DaGAN is helpful in your photos/projects, please hel

Jun 29, 2022
This is an official implementation of the CVPR2022 paper "Blind2Unblind: Self-Supervised Image Denoising with Visible Blind Spots".

Blind2Unblind: Self-Supervised Image Denoising with Visible Blind Spots Blind2Unblind Citing Blind2Unblind @inproceedings{wang2022blind2unblind, tit

Jun 23, 2022
The official codes of our CVPR2022 paper: A Differentiable Two-stage Alignment Scheme for Burst Image Reconstruction with Large Shift
The official codes of our CVPR2022 paper: A Differentiable Two-stage Alignment Scheme for Burst Image Reconstruction with Large Shift

TwoStageAlign The official codes of our CVPR2022 paper: A Differentiable Two-stage Alignment Scheme for Burst Image Reconstruction with Large Shift Pa

Jun 16, 2022
Code for the CVPR2022 paper "Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity"
Code for the CVPR2022 paper

Introduction This is an official release of the paper "Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity" (arxiv link). Abstrac

Jun 16, 2022
Source code for CVPR2022 paper "Abandoning the Bayer-Filter to See in the Dark"
Source code for CVPR2022 paper

Abandoning the Bayer-Filter to See in the Dark (CVPR 2022) Paper: https://arxiv.org/abs/2203.04042 (Arxiv version) This code includes the training and

Jun 27, 2022
PSTR: End-to-End One-Step Person Search With Transformers (CVPR2022)
PSTR: End-to-End One-Step Person Search With Transformers (CVPR2022)

PSTR (CVPR2022) This code is an official implementation of "PSTR: End-to-End One-Step Person Search With Transformers (CVPR2022)". End-to-end one-step

Jun 21, 2022
CVPR2022 paper "Dense Learning based Semi-Supervised Object Detection"
CVPR2022 paper

[CVPR2022] DSL: Dense Learning based Semi-Supervised Object Detection DSL is the first work on Anchor-Free detector for Semi-Supervised Object Detecti

Jun 30, 2022
[CVPR2022] Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos
[CVPR2022] Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos

Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos Created by Muheng Li, Lei Chen, Yueqi Duan, Zhilan Hu, Jianjiang Feng, Jie

Jun 30, 2022
Group R-CNN for Point-based Weakly Semi-supervised Object Detection (CVPR2022)
Group R-CNN for Point-based Weakly Semi-supervised Object Detection (CVPR2022)

Group R-CNN for Point-based Weakly Semi-supervised Object Detection (CVPR2022) By Shilong Zhang*, Zhuoran Yu*, Liyang Liu*, Xinjiang Wang, Aojun Zhou,

Jun 23, 2022
Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis (CVPR2022)
Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis (CVPR2022)

Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis Multi-View Consistent Generative Adversarial Networks for 3D-aware

Jul 4, 2022
TCTrack: Temporal Contexts for Aerial Tracking (CVPR2022)
TCTrack: Temporal Contexts for Aerial Tracking (CVPR2022)

TCTrack: Temporal Contexts for Aerial Tracking (CVPR2022) Ziang Cao and Ziyuan Huang and Liang Pan and Shiwei Zhang and Ziwei Liu and Changhong Fu In

Jun 21, 2022
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding (CVPR2022)
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding (CVPR2022)

Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding by Qiaole Dong*, Chenjie Cao*, Yanwei Fu Paper and Supple

Jun 26, 2022
FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset (CVPR2022)
FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset (CVPR2022)

FaceVerse FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset Lizhen Wang, Zhiyuan Chen, Tao Yu, Chenguang

Jul 1, 2022
Towards Implicit Text-Guided 3D Shape Generation (CVPR2022)
 Towards Implicit Text-Guided 3D Shape Generation (CVPR2022)

Towards Implicit Text-Guided 3D Shape Generation Towards Implicit Text-Guided 3D Shape Generation (CVPR2022) Code for the paper [Towards Implicit Text

Jun 20, 2022
Video Frame Interpolation with Transformer (CVPR2022)

VFIformer Official PyTorch implementation of our CVPR2022 paper Video Frame Interpolation with Transformer Dependencies python >= 3.8 pytorch >= 1.8.0

Jun 21, 2022
[CVPR2022] Representation Compensation Networks for Continual Semantic Segmentation
[CVPR2022] Representation Compensation Networks for Continual Semantic Segmentation

RCIL [CVPR2022] Representation Compensation Networks for Continual Semantic Segmentation Chang-Bin Zhang1, Jia-Wen Xiao1, Xialei Liu1, Ying-Cong Chen2

Jun 16, 2022
A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022)
A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022)

A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022) https://arxiv.org/abs/2203.09388 Jianqi Ma, Zheto

Jun 23, 2022
Unsupervised Domain Adaptation for Nighttime Aerial Tracking (CVPR2022)
Unsupervised Domain Adaptation for Nighttime Aerial Tracking (CVPR2022)

Unsupervised Domain Adaptation for Nighttime Aerial Tracking (CVPR2022) Junjie Ye, Changhong Fu, Guangze Zheng, Danda Pani Paudel, and Guang Chen. Uns

Jun 24, 2022
CVPR2022 (Oral) - Rethinking Semantic Segmentation: A Prototype View
CVPR2022 (Oral) - Rethinking Semantic Segmentation: A Prototype View

Rethinking Semantic Segmentation: A Prototype View Rethinking Semantic Segmentation: A Prototype View, Tianfei Zhou, Wenguan Wang, Ender Konukoglu and

Jul 2, 2022