[CVPR 2022 Oral] MixFormer: End-to-End Tracking with Iterative Mixed Attention

MixFormer

The official implementation of the CVPR 2022 paper MixFormer: End-to-End Tracking with Iterative Mixed Attention

PWC

PWC

[Models and Raw results] (Google Driver) [Models and Raw results] (Baidu Driver: hmuv)

MixFormer_Framework

News

[Mar 21, 2022]

  • MixFormer is accepted to CVPR2022.
  • We release Code, models and raw results.

[Mar 29, 2022]

  • Our paper is selected for an oral presentation.

Highlights

New transformer tracking framework

MixFormer is composed of a target-search mixed attention (MAM) based backbone and a simple corner head, yielding a compact tracking pipeline without an explicit integration module.

End-to-end, Positional-embedding-free, multi-feature-aggregation-free

Mixformer is an end-to-end tracking framework without post-processing. Compared with other transformer trackers, MixFormer doesn's use positional embedding, attentional mask and multi-layer feature aggregation strategy.

Strong performance

Tracker VOT2020 (EAO) LaSOT (NP) GOT-10K (AO) TrackingNet (NP)
MixFormer 0.555 79.9 70.7 88.9
ToMP101* (CVPR2022) - 79.2 - 86.4
SBT-large* (CVPR2022) 0.529 - 70.4 -
SwinTrack* (Arxiv2021) - 78.6 69.4 88.2
Sim-L/14* (Arxiv2022) - 79.7 69.8 87.4
STARK (ICCV2021) 0.505 77.0 68.8 86.9
KeepTrack (ICCV2021) - 77.2 - -
TransT (CVPR2021) 0.495 73.8 67.1 86.7
TrDiMP (CVPR2021) - - 67.1 83.3
Siam R-CNN (CVPR2020) - 72.2 64.9 85.4
TREG (Arxiv2021) - 74.1 66.8 83.8

Install the environment

Use the Anaconda

conda create -n mixformer python=3.6
conda activate mixformer
bash install_pytorch17.sh

Data Preparation

Put the tracking datasets in ./data. It should look like:

${MixFormer_ROOT}
 -- data
     -- lasot
         |-- airplane
         |-- basketball
         |-- bear
         ...
     -- got10k
         |-- test
         |-- train
         |-- val
     -- coco
         |-- annotations
         |-- train2017
     -- trackingnet
         |-- TRAIN_0
         |-- TRAIN_1
         ...
         |-- TRAIN_11
         |-- TEST

Set project paths

Run the following command to set paths for this project

python tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir .

After running this command, you can also modify paths by editing these two files

lib/train/admin/local.py  # paths about training
lib/test/evaluation/local.py  # paths about testing

Train MixFormer

Training with multiple GPUs using DDP. More details of other training settings can be found at tracking/train_mixformer.sh

# MixFormer
bash tracking/train_mixformer.sh

Test and evaluate MixFormer on benchmarks

  • LaSOT/GOT10k-test/TrackingNet/OTB100/UAV123. More details of test settings can be found at tracking/test_mixformer.sh
bash tracking/test_mixformer.sh
  • VOT2020
    Before evaluating "MixFormer+AR" on VOT2020, please install some extra packages following external/AR/README.md. Also, the VOT toolkit is required to evaluate our tracker. To download and instal VOT toolkit, you can follow this tutorial. For convenience, you can use our example workspaces of VOT toolkit under external/vot20/ by setting trackers.ini.
cd external/vot20/<workspace_dir>
vot evaluate --workspace . MixFormerPython
# generating analysis results
vot analysis --workspace . --nocache

Run MixFormer on your own video

bash tracking/run_video_demo.sh

Compute FLOPs/Params and test speed

bash tracking/profile_mixformer.sh

Visualize attention maps

bash tracking/vis_mixformer_attn.sh

vis_attn

Model Zoo and raw results

The trained models and the raw tracking results are provided in the [Models and Raw results] (Google Driver) or [Models and Raw results] (Baidu Driver: hmuv).

Contact

Yutao Cui: [email protected]

Cheng Jiang: [email protected]

Acknowledgments

  • Thanks for PyTracking Library and STARK Library, which helps us to quickly implement our ideas.
  • We use the implementation of the CvT from the official repo CvT.
Owner
Multimedia Computing Group, Nanjing University
Multimedia Computing Group, Nanjing University
Comments
  • multi-layer feature aggregation strategy and long-term tracking

    multi-layer feature aggregation strategy and long-term tracking

    Thanks for sharing the excellent work! I have two small questions. First, as mentioned in your paper, the multi-layer feature aggregation strategy is commonly used in other trackers (e.g., SiamRPN++, STARK). The one in SiamRPN++ is understandable, but the one in STARK is confusing. STARK seems to only use the last stride=16 features for prediction. I would like to know what is the main difference between MixFormer and STARK in this regard? Second, have you tested the MixFormer on the VOT long-term dataset? STARK performs well on long-term tracking, and it feels like MixFormer could work better.

  • Is this a typo?

    Is this a typo?

    In line751, it should be named online_template not template, or just am I misunderstanding? https://github.com/MCG-NJU/MixFormer/blob/0c2663d3afbce0da138d5b42bc7f28667d077ba3/lib/models/mixformer/mixformer.py#L746-L756

  • repeat tracker initialize?

    repeat tracker initialize?

    First of all, I thanks for your clean and high-quality codes. But in https://github.com/MCG-NJU/MixFormer/blob/219bd14704ec217919c3b1eb310940769546c2d6/external/AR/pytracking/VOT2020_super_only_mask_384_HP/mixformer_alpha_seg_class.py#L32-L43 I find two times of tracker.initialize. I think initialize is just a setting step (not online update step), why do we need two times of that?

  • An error was encountered while testing

    An error was encountered while testing

    Thank you for your outstanding work. I reproduce your code, there is an error:

    {'model': 'mixformer_online_22k.pth.tar', 'search_area_scale': 4.5, 'max_score_decay': 1.0, 'vis_attn': 1} test config: {'MODEL': {'HEAD_TYPE': 'CORNER', 'HIDDEN_DIM': 384, 'NUM_OBJECT_QUERIES': 1, 'POSITION_EMBEDDING': 'sine', 'PREDICT_MASK': False, 'BACKBONE': {'PRETRAINED': True, 'PRETRAINED_PATH': '', 'INIT': 'trunc_norm', 'NUM_STAGES': 3, 'PATCH_SIZE': [7, 3, 3], 'PATCH_STRIDE': [4, 2, 2], 'PATCH_PADDING': [2, 1, 1], 'DIM_EMBED': [64, 192, 384], 'NUM_HEADS': [1, 3, 6], 'DEPTH': [1, 4, 16], 'MLP_RATIO': [4.0, 4.0, 4.0], 'ATTN_DROP_RATE': [0.0, 0.0, 0.0], 'DROP_RATE': [0.0, 0.0, 0.0], 'DROP_PATH_RATE': [0.0, 0.0, 0.1], 'QKV_BIAS': [True, True, True], 'CLS_TOKEN': [False, False, False], 'POS_EMBED': [False, False, False], 'QKV_PROJ_METHOD': ['dw_bn', 'dw_bn', 'dw_bn'], 'KERNEL_QKV': [3, 3, 3], 'PADDING_KV': [1, 1, 1], 'STRIDE_KV': [2, 2, 2], 'PADDING_Q': [1, 1, 1], 'STRIDE_Q': [1, 1, 1], 'FREEZE_BN': True}, 'PRETRAINED_STAGE1': True, 'NLAYER_HEAD': 3, 'HEAD_FREEZE_BN': True}, 'TRAIN': {'TRAIN_SCORE': True, 'SCORE_WEIGHT': 1.0, 'LR': 0.0001, 'WEIGHT_DECAY': 0.0001, 'EPOCH': 30, 'LR_DROP_EPOCH': 20, 'BATCH_SIZE': 32, 'NUM_WORKER': 8, 'OPTIMIZER': 'ADAMW', 'BACKBONE_MULTIPLIER': 0.1, 'GIOU_WEIGHT': 2.0, 'L1_WEIGHT': 5.0, 'DEEP_SUPERVISION': False, 'FREEZE_STAGE0': False, 'PRINT_INTERVAL': 50, 'VAL_EPOCH_INTERVAL': 5, 'GRAD_CLIP_NORM': 0.1, 'SCHEDULER': {'TYPE': 'step', 'DECAY_RATE': 0.1}}, 'DATA': {'SAMPLER_MODE': 'trident_pro', 'MEAN': [0.485, 0.456, 0.406], 'STD': [0.229, 0.224, 0.225], 'MAX_SAMPLE_INTERVAL': [200], 'TRAIN': {'DATASETS_NAME': ['GOT10K_vottrain', 'LASOT', 'COCO17', 'TRACKINGNET'], 'DATASETS_RATIO': [1, 1, 1, 1], 'SAMPLE_PER_EPOCH': 60000}, 'VAL': {'DATASETS_NAME': ['GOT10K_votval'], 'DATASETS_RATIO': [1], 'SAMPLE_PER_EPOCH': 10000}, 'SEARCH': {'SIZE': 320, 'FACTOR': 5.0, 'CENTER_JITTER': 4.5, 'SCALE_JITTER': 0.5}, 'TEMPLATE': {'SIZE': 128, 'FACTOR': 2.0, 'NUMBER': 2, 'CENTER_JITTER': 0, 'SCALE_JITTER': 0}}, 'TEST': {'TEMPLATE_FACTOR': 2.0, 'TEMPLATE_SIZE': 128, 'SEARCH_FACTOR': 5.0, 'SEARCH_SIZE': 320, 'EPOCH': 40, 'UPDATE_INTERVALS': {'LASOT': [200], 'GOT10K_TEST': [10], 'TRACKINGNET': [25], 'VOT20': [10], 'VOT20LT': [200], 'OTB': [6], 'UAV': [200]}, 'ONLINE_SIZES': {'LASOT': [2], 'GOT10K_TEST': [2], 'TRACKINGNET': [1], 'VOT20': [5], 'VOT20LT': [3], 'OTB': [3], 'UAV': [1]}}} search_area_scale: 4.5 Evaluating 1 trackers on 1 sequences Tracker: mixformer_online baseline None , Sequence: Basketball Warning: Pretrained CVT weights are not loaded head channel: 384 Online size is: 3 Update interval is: 6 max score decay = 1.0 Error while processing rearrange-reduction pattern "b (h w) c -> b c h w". Input tensor shape: torch.Size([1, 1, 2048, 64]). Additional info: {'h': 32, 'w': 32}. Expected 3 dimensions, got 4 Done

    How to solve this problem?

  • Con not compile Precise RoI Pooling library

    Con not compile Precise RoI Pooling library

    {'model': 'mixformer_online_22k.pth.tar', 'update_interval': 25, 'online_sizes': 3, 'search_area_scale': 4.5, 'max_score_decay': 1.0, 'vis_attn': 0} test config: {'MODEL': {'HEAD_TYPE': 'CORNER', 'HIDDEN_DIM': 384, 'NUM_OBJECT_QUERIES': 1, 'POSITION_EMBEDDING': 'sine', 'PREDICT_MASK': False, 'BACKBONE': {'PRETRAINED': True, 'PRETRAINED_PATH': '', 'INIT': 'trunc_norm', 'NUM_STAGES': 3, 'PATCH_SIZE': [7, 3, 3], 'PATCH_STRIDE': [4, 2, 2], 'PATCH_PADDING': [2, 1, 1], 'DIM_EMBED': [64, 192, 384], 'NUM_HEADS': [1, 3, 6], 'DEPTH': [1, 4, 16], 'MLP_RATIO': [4.0, 4.0, 4.0], 'ATTN_DROP_RATE': [0.0, 0.0, 0.0], 'DROP_RATE': [0.0, 0.0, 0.0], 'DROP_PATH_RATE': [0.0, 0.0, 0.1], 'QKV_BIAS': [True, True, True], 'CLS_TOKEN': [False, False, False], 'POS_EMBED': [False, False, False], 'QKV_PROJ_METHOD': ['dw_bn', 'dw_bn', 'dw_bn'], 'KERNEL_QKV': [3, 3, 3], 'PADDING_KV': [1, 1, 1], 'STRIDE_KV': [2, 2, 2], 'PADDING_Q': [1, 1, 1], 'STRIDE_Q': [1, 1, 1], 'FREEZE_BN': True}, 'PRETRAINED_STAGE1': True, 'NLAYER_HEAD': 3, 'HEAD_FREEZE_BN': True}, 'TRAIN': {'TRAIN_SCORE': True, 'SCORE_WEIGHT': 1.0, 'LR': 0.0001, 'WEIGHT_DECAY': 0.0001, 'EPOCH': 30, 'LR_DROP_EPOCH': 20, 'BATCH_SIZE': 32, 'NUM_WORKER': 8, 'OPTIMIZER': 'ADAMW', 'BACKBONE_MULTIPLIER': 0.1, 'GIOU_WEIGHT': 2.0, 'L1_WEIGHT': 5.0, 'DEEP_SUPERVISION': False, 'FREEZE_STAGE0': False, 'PRINT_INTERVAL': 50, 'VAL_EPOCH_INTERVAL': 5, 'GRAD_CLIP_NORM': 0.1, 'SCHEDULER': {'TYPE': 'step', 'DECAY_RATE': 0.1}}, 'DATA': {'SAMPLER_MODE': 'trident_pro', 'MEAN': [0.485, 0.456, 0.406], 'STD': [0.229, 0.224, 0.225], 'MAX_SAMPLE_INTERVAL': [200], 'TRAIN': {'DATASETS_NAME': ['GOT10K_vottrain', 'LASOT', 'COCO17', 'TRACKINGNET'], 'DATASETS_RATIO': [1, 1, 1, 1], 'SAMPLE_PER_EPOCH': 60000}, 'VAL': {'DATASETS_NAME': ['GOT10K_votval'], 'DATASETS_RATIO': [1], 'SAMPLE_PER_EPOCH': 10000}, 'SEARCH': {'SIZE': 320, 'FACTOR': 5.0, 'CENTER_JITTER': 4.5, 'SCALE_JITTER': 0.5}, 'TEMPLATE': {'SIZE': 128, 'FACTOR': 2.0, 'NUMBER': 2, 'CENTER_JITTER': 0, 'SCALE_JITTER': 0}}, 'TEST': {'TEMPLATE_FACTOR': 2.0, 'TEMPLATE_SIZE': 128, 'SEARCH_FACTOR': 5.0, 'SEARCH_SIZE': 320, 'EPOCH': 40, 'UPDATE_INTERVALS': {'LASOT': [200], 'GOT10K_TEST': [10], 'TRACKINGNET': [25], 'VOT20': [10], 'VOT20LT': [200], 'OTB': [6], 'UAV': [200]}, 'ONLINE_SIZES': {'LASOT': [2], 'GOT10K_TEST': [2], 'TRACKINGNET': [1], 'VOT20': [5], 'VOT20LT': [3], 'OTB': [3], 'UAV': [1]}}} search_area_scale: 4.5 Warning: Pretrained CVT weights are not loaded head channel: 384 Online size is: 3 Update interval is: 25 max score decay = 1.0 Using C:\Users\210\AppData\Local\torch_extensions\torch_extensions\Cache as PyTorch extensions root... C:\Users\210\anaconda3\envs\mixformer1\lib\site-packages\torch\utils\cpp_extension.py:274: UserWarning: Error checking compiler version for cl: 'utf-8' codec can't decode byte 0xd3 in position 0: invalid continuation byte warnings.warn('Error checking compiler version for {}: {}'.format(compiler, error)) Detected CUDA files, patching ldflags Emitting ninja build file C:\Users\210\AppData\Local\torch_extensions\torch_extensions\Cache_prroi_pooling\build.ninja... Building extension module _prroi_pooling... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) 1.10.2 Loading extension module _prroi_pooling... Traceback (most recent call last): File "tracking..\external\PreciseRoIPooling\pytorch\prroi_pool\functional.py", line 33, in _import_prroi_pooling verbose=True File "C:\Users\210\anaconda3\envs\mixformer1\lib\site-packages\torch\utils\cpp_extension.py", line 980, in load keep_intermediates=keep_intermediates) File "C:\Users\210\anaconda3\envs\mixformer1\lib\site-packages\torch\utils\cpp_extension.py", line 1196, in _jit_compile return _import_module_from_library(name, build_directory, is_python_module) File "C:\Users\210\anaconda3\envs\mixformer1\lib\site-packages\torch\utils\cpp_extension.py", line 1543, in _import_module_from_library file, path, description = imp.find_module(module_name, [path]) File "C:\Users\210\anaconda3\envs\mixformer1\lib\imp.py", line 297, in find_module raise ImportError(_ERR_MSG.format(name), name=name) ImportError: No module named '_prroi_pooling'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "tracking/video_demo.py", line 53, in main() File "tracking/video_demo.py", line 49, in main args.save_results, tracker_params=tracker_params) File "tracking/video_demo.py", line 21, in run_video tracker.run_video(videofilepath=videofile, optional_box=optional_box, debug=debug, save_results=save_results) File "tracking..\lib\test\evaluation\tracker.py", line 228, in run_video out = tracker.track(frame) File "tracking..\lib\test\tracker\mixformer_online.py", line 135, in track out_dict, _ = self.network.forward_test(search, run_score_head=True) File "tracking..\lib\models\mixformer\mixformer_online.py", line 850, in forward_test out, outputs_coord_new = self.forward_head(search, template, run_score_head, gt_bboxes) File "tracking..\lib\models\mixformer\mixformer_online.py", line 875, in forward_head out_dict.update({'pred_scores': self.score_branch(search, template, gt_bboxes).view(-1)}) File "C:\Users\210\anaconda3\envs\mixformer1\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "tracking..\lib\models\mixformer\mixformer_online.py", line 798, in forward search_box_feat = rearrange(self.search_prroipool(search_feat, target_roi), 'b c h w -> b (h w) c') File "C:\Users\210\anaconda3\envs\mixformer1\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "tracking..\external\PreciseRoIPooling\pytorch\prroi_pool\prroi_pool.py", line 28, in forward return prroi_pool2d(features, rois, self.pooled_height, self.pooled_width, self.spatial_scale) File "tracking..\external\PreciseRoIPooling\pytorch\prroi_pool\functional.py", line 44, in forward _prroi_pooling = _import_prroi_pooling() File "tracking..\external\PreciseRoIPooling\pytorch\prroi_pool\functional.py", line 36, in _import_prroi_pooling raise ImportError('Can not compile Precise RoI Pooling library.') ImportError: Can not compile Precise RoI Pooling library.

    Please help me! Thanks very much

  • About Score Prediction Module (SPM) and MixFormer-1k

    About Score Prediction Module (SPM) and MixFormer-1k

    Hi, Thanks for your work. I have some questions about your paper:

    1. Have you ever tried to use the score prediction module of STARK (MLP) instead of the SPM proposed in your paper? I am curious about the performance difference between SPM and using MLP directly.
    2. The MixFormer-1k model seems to be trained with all dataset, not just the GOT10k, which is different from your paper (it is unreasonable that MixFormer-1k performs better than MixFormer-GOT if MixFormer-1k is also trained with GOT10k only). Is it fair to use it for comparison on GOT10k test?
  • How to train SPM in stage2?

    How to train SPM in stage2?

    Thank you for your excellent work. I have some questions about the training process of SPM.

    I encounter a problem when I use the script in train_mixformer.sh to train SPM module. python tracking/train.py --script mixformer_online --config baseline --save_dir /mysavepath --mode multiple --nproc_per_node 1 --stage1_model /mylatest checkpoint trained in the first stage

    But the logs show that it seems that the program has loaded wrong checkpoint because there are so many missing keys

    missing keys: ['score_branch.score_token', 'score_branch.score_head.layers.0.weight', 'score_branch.score_head.layers.0.bias', 'score_branch.score_head.layers.1.weight', 'score_branch.score_head.layers.1.bias', 'score_branch.score_head.layers.2.weight', 'score_branch.score_head.layers.2.bias', 'score_branch.proj_q.0.weight', 'score_branch.proj_q.0.bias', 'score_branch.proj_q.1.weight', 'score_branch.proj_q.1.bias', 'score_branch.proj_k.0.weight', 'score_branch.proj_k.0.bias', 'score_branch.proj_k.1.weight', 'score_branch.proj_k.1.bias', 'score_branch.proj_v.0.weight', 'score_branch.proj_v.0.bias', 'score_branch.proj_v.1.weight', 'score_branch.proj_v.1.bias', 'score_branch.proj.0.weight', 'score_branch.proj.0.bias', 'score_branch.proj.1.weight', 'score_branch.proj.1.bias', 'score_branch.norm1.weight', 'score_branch.norm1.bias', 'score_branch.norm2.0.weight', 'score_branch.norm2.0.bias', 'score_branch.norm2.1.weight', 'score_branch.norm2.1.bias'] unexpected keys: [] Loading pretrained mixformer weights done.

    I am really confused about how to train the SPM module correctly.

    I am appreciate if you can give me some advices.

    The whole log shows below:

    error logs.txt

  • About update MixedAttention operation

    About update MixedAttention operation

    Thank you for open source such an excellent work!

    The original version before the update was to separate Q, K, and V into templates, online templates, and search ranges, the templates and online templates were separately performed Attention, as shown in the following:

     # template attention
        k1 = torch.cat([k_t, k_ot], dim=2)
        v1 = torch.cat([v_t, v_ot], dim=2)
        attn_score = torch.einsum('bhlk,bhtk->bhlt', [q_t, k1]) * self.scale
        attn = F.softmax(attn_score, dim=-1)
        attn = self.attn_drop(attn)
        x_t = torch.einsum('bhlt,bhtv->bhlv', [attn, v1])
        x_t = rearrange(x_t, 'b h t d -> b t (h d)')
    
      # online template attention
        k2 = torch.cat([k_t, k_ot], dim=2)
        v2 = torch.cat([v_t, v_ot], dim=2)
        attn_score = torch.einsum('bhlk,bhtk->bhlt', [q_ot, k2]) * self.scale
        attn = F.softmax(attn_score, dim=-1)
        attn = self.attn_drop(attn)
        x_ot = torch.einsum('bhlt,bhtv->bhlv', [attn, v2])
        x_ot = rearrange(x_ot, 'b h t d -> b t (h d)')
    

    Especially in the calculation of attn_score, this part is calculated by q_t and k1, q_ot and k2 (both k1 and k2 are templates concatenated with online templates):

    attn_score = torch.einsum('bhlk,bhtk->bhlt', [q_t, k1]) * self.scale

    attn_score = torch.einsum('bhlk,bhtk->bhlt', [q_ot, k2]) * self.scale

    And now the updated version is that the template and the online template are merged together to execute Attention at the beginning, as shown below:

    attn_score = torch.einsum('bhlk,bhtk->bhlt', [q_mt, k_mt]) * self.scale

    I would like to ask whether the two ways of calculating templates and online templates Attention before and after the update are equivalent?

  • Do you get stuck on the first dataset when you run it? Evaluating 1 trackers on 1 sequences Tracker: mixformer_online baseline None , Sequence: Basketball Warning: Pretrained CVT weights are not loaded head channel: 384 It's just like this, it's been running all night

    Do you get stuck on the first dataset when you run it? Evaluating 1 trackers on 1 sequences Tracker: mixformer_online baseline None , Sequence: Basketball Warning: Pretrained CVT weights are not loaded head channel: 384 It's just like this, it's been running all night

    When I tested it, it was stuck below, and it was still the same after running for a night. How can I solve it? Evaluating 1 trackers on 1 sequences Tracker: mixformer_online baseline None , Sequence: Basketball Warning: Pretrained CVT weights are not loaded head channel: 384

  • "trident_pro" sample mode

    Hi, why the template_frame_ids_extra could be invisible (line 316) when the sample mode is set to "trident_pro"?

    https://github.com/MCG-NJU/MixFormer/blob/90a6a9c9a9c874f56904796bab1ddf158948d4e3/lib/train/data/sampler.py#L300-L325

  • Can I get guideline path?

    Can I get guideline path?

    I want to test model, but i got this error

    RuntimeError: YOU HAVE NOT SETUP YOUR local.py!!!

    Only i want to test pretrained model, this local.py path need?

  • MixFormer that is trained on GOT10K without pretrained weights  seems collapse?

    MixFormer that is trained on GOT10K without pretrained weights seems collapse?

    Hi, we ran the miformer experiments without pretrained cvt weights on got10k, using the default configuration. The results show that after 200 epochs, the AO on the got10k test set is only 0.096. It is not clear where the problem occurred. Do you have valuable advice?

Official code for "End-to-End Optimization of Scene Layout" -- including VAE, Diff Render, SPADE for colorization (CVPR 2020 Oral)
Official code for

End-to-End Optimization of Scene Layout Code release for: End-to-End Optimization of Scene Layout CVPR 2020 (Oral) Project site, Bibtex For help conta

May 31, 2022
[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning [CVPR'21, Oral] By Zhicheng Huang*, Zhaoyang Zeng*, Yupan H

Jun 25, 2022
Official repository for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'21, Oral Presentation)
Official repository for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'21, Oral Presentation)

Official PyTorch Implementation for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'2021, Oral Presentation) HOTR: End-to-

Jun 27, 2022
[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation Prerequisite Please create and activate the following conda envrionment. To r

Jul 4, 2022
(CVPR 2022) A minimalistic mapless end-to-end stack for joint perception, prediction, planning and control for self driving.
(CVPR 2022) A minimalistic mapless end-to-end stack for joint perception, prediction, planning and control for self driving.

LAV Learning from All Vehicles Dian Chen, Philipp Krähenbühl CVPR 2022 (also arXiV 2203.11934) This repo contains code for paper Learning from all veh

Jul 1, 2022
🐤 Nix-TTS: An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation

?? Nix-TTS An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation Rendi Chevi, Radityo Eko Prasojo, Alham Fikri Aji

Jul 5, 2022
Quasi-Dense Similarity Learning for Multiple Object Tracking, CVPR 2021 (Oral)
Quasi-Dense Similarity Learning for Multiple Object Tracking, CVPR 2021 (Oral)

Quasi-Dense Tracking This is the offical implementation of paper Quasi-Dense Similarity Learning for Multiple Object Tracking. We present a trailer th

Jun 30, 2022
Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch
Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch

Perceiver - Pytorch Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch Install $ pip install perceiver-pytorch Usage

Jun 29, 2022
Implementation of Perceiver, General Perception with Iterative Attention in TensorFlow
Implementation of Perceiver, General Perception with Iterative Attention in TensorFlow

Perceiver This Python package implements Perceiver: General Perception with Iterative Attention by Andrew Jaegle in TensorFlow. This model builds on t

Jun 1, 2022
Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral)

DSA^2 F: Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral) This repo is the official imp

Jun 9, 2022
[CVPR 2021] Official PyTorch Implementation for "Iterative Filter Adaptive Network for Single Image Defocus Deblurring"
[CVPR 2021] Official PyTorch Implementation for

IFAN: Iterative Filter Adaptive Network for Single Image Defocus Deblurring Checkout for the demo (GUI/Google Colab)! The GUI version might occasional

Jun 27, 2022
[CVPR 2022 Oral] Rethinking Minimal Sufficient Representation in Contrastive Learning

Rethinking Minimal Sufficient Representation in Contrastive Learning PyTorch implementation of Rethinking Minimal Sufficient Representation in Contras

Jun 21, 2022
(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry
(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry Official implementation of the paper Multi-View Depth Est

Jun 29, 2022
[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

TubeDETR: Spatio-Temporal Video Grounding with Transformers Website • STVG Demo • Paper This repository provides the code for our paper. This includes

Jul 3, 2022
Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".
Official code for the CVPR 2022 (oral) paper

nvdiffrec Joint optimization of topology, materials and lighting from multi-view image observations as described in the paper Extracting Triangular 3D

Jun 26, 2022
[CVPR 2022 Oral] Crafting Better Contrastive Views for Siamese Representation Learning
[CVPR 2022 Oral] Crafting Better Contrastive Views for Siamese Representation Learning

Crafting Better Contrastive Views for Siamese Representation Learning (CVPR 2022 Oral) 2022-03-29: The paper was selected as a CVPR 2022 Oral paper! 2

Jul 6, 2022
Code for "Neural 3D Scene Reconstruction with the Manhattan-world Assumption" CVPR 2022 Oral
Code for

News 05/10/2022 To make the comparison on ScanNet easier, we provide all quantitative and qualitative results of baselines here, including COLMAP, COL

Jun 27, 2022
(CVPR 2022 Oral) Official implementation for "Surface Representation for Point Clouds"
(CVPR 2022 Oral) Official implementation for

RepSurf - Surface Representation for Point Clouds [CVPR 2022 Oral] By Haoxi Ran* , Jun Liu, Chengjie Wang ( * : corresponding contact) The pytorch off

Jul 6, 2022