Code and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders

MultiMAE: Multi-modal Multi-task Masked Autoencoders

Roman Bachmann*, David Mizrahi*, Andrei Atanov, Amir Zamir

Website | arXiv | BibTeX

Open in Colab Hugging Face Spaces

Official PyTorch implementation and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders.

We introduce Multi-modal Multi-task Masked Autoencoders (MultiMAE), an efficient and effective pre-training strategy for Vision Transformers. Given a small random sample of visible patches from multiple modalities, the MultiMAE pre-training objective is to reconstruct the masked-out regions. Once pre-trained, a single MultiMAE encoder can then be used for both single-modal and multi-modal downstream transfer, yielding competitive to or significantly better results than the baselines.

Catalog

  • Pre-trained models
  • MultiMAE pre-training code
  • ImageNet-1K classification fine-tuning code
  • Semantic segmentation fine-tuning code (single-modal & multi-modal)
  • Depth estimation fine-tuning code
  • Taskonomy fine-tuning code
  • Colab & Hugging Face demos

Pre-trained models

We provide the weights of our pre-trained MultiMAE ViT-B model, in MultiViT (multi-modal) format and timm (RGB-only) format.

For comparison, we also provide the weights of a MAE ViT-B model that we pre-trained using the official MAE codebase following the recommended settings.

Method Arch. Pre-training
modalities
Pre-training
epochs
Weights
(MultiViT)
Weights
(timm)
Config
MAE ViT-B RGB 1600 download download See MAE
MultiMAE ViT-B RGB+D+S 1600 download download link

These pre-trained models can then be fine-tuned using this codebase to reach the following performance:

Method Classif. (@1) Semantic Segmentation (mIoU) Depth (δ1)
ImageNet-1K
(RGB)
ADE20K
(RGB)
Hypersim
(RGB / D / RGB + D)
NYUv2
(RGB / D / RGB + D)
NYUv2
(RGB)
Sup. (DeiT) 81.8 45.8 33.9 - - 50.1 - - 80.7
MAE 83.3 46.2 36.5 - -
50.8 - - 85.1
MultiMAE 83.3 46.2 37.0 38.5 47.6 52.0 41.4 56.0 86.4

Model formats

We provide pre-trained weights in two different formats: the single-modal ViT / timm format, which is compatible with other popular ViT repositories (e.g., timm, DINO, MAE), and the multi-modal MultiMAE / MultiViT format, which is used throughout this codebase for multi-modal pre-training and fine-tuning. See multimae/multimae.py for the documentation and implementation of MultiMAE / MultiViT.

You can convert between these formats using the provided vit2multimae_converter.py and multimae2vit_converter.py scripts.

Usage

Set-up

See SETUP.md for set-up instructions.

Pre-training

See PRETRAINING.md for pre-training instructions.

Fine-tuning

See FINETUNING.md for fine-tuning instructions.

Demo & visualizations

For interactive demos, please see our website. Open our Colab notebook to play around with the visualization code, or simply upload an image to our Hugging Face Spaces demo.

Acknowledgement

This repository is built using the timm, DeiT, DINO, MoCo v3, BEiT, MAE-priv, and MAE repositories.

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.

Citation

If you find this repository helpful, please consider citing our work:

@article{bachmann2022multimae,
  author    = {Roman Bachmann and David Mizrahi and Andrei Atanov and Amir Zamir},
  title     = {{MultiMAE}: Multi-modal Multi-task Masked Autoencoders},
  journal   = {arXiv preprint arXiv:2204.01678},
  year      = {2022},
}
Owner
Visual Intelligence & Learning Lab, Swiss Federal Institute of Technology (EPFL)
VILAB
Visual Intelligence & Learning Lab, Swiss Federal Institute of Technology (EPFL)
Comments
  •  The reason why depth map should be divided 2**16 ?

    The reason why depth map should be divided 2**16 ?

    Thank you for your great Multi-MAE, We observed that in https://github.com/EPFL-VILAB/MultiMAE/blob/main/utils/datasets.py line 96, you use img = torch.Tensor(np.array(task_dict[task]) / 2 ** 16). Can you tell me the reason why depth map should be divided 2**16 ? Is there any problems without this operation?

  • add web demo/model to Huggingface

    add web demo/model to Huggingface

    Hi, would you be interested in adding MultiMAE to Hugging Face? The Hub offers free hosting, and it would make your work more accessible and visible to the rest of the ML community. Models/datasets/spaces(web demos) can be added to a user account or organization similar to github.

    Example from other organizations: Keras: https://huggingface.co/keras-io Microsoft: https://huggingface.co/microsoft Facebook: https://huggingface.co/facebook

    Example spaces with repos: github: https://github.com/salesforce/BLIP Spaces: https://huggingface.co/spaces/salesforce/BLIP

    github: https://github.com/facebookresearch/omnivore Spaces: https://huggingface.co/spaces/akhaliq/omnivore

    and here are guides for adding spaces/models/datasets to your org

    How to add a Space: https://huggingface.co/blog/gradio-spaces how to add models: https://huggingface.co/docs/hub/adding-a-model uploading a dataset: https://huggingface.co/docs/datasets/upload_dataset.html

    Please let us know if you would be interested and if you have any questions, we can also help with the technical implementation.

  • Linear probing results

    Linear probing results

    Hey, Thank you for providing the code for the paper. The paper is really interesting and the project page is very well done!

    I was wondering whether you've tested the performance of linear probing on the RGB image when trained with all 3 modalities. The results of the original MAE paper were not very good, it is interesting to understand if the additional supervision creates better representations that translate into better linear probing scores.

    Thanks, Eliahu

  • Query about semseg domain in pre-training

    Query about semseg domain in pre-training

    Hi, I have successful made the pesudo labels and trained ‘rgb’ in/out-domain multimae model.

    But when I trained model with 'rgb-semseg' in/out-domain, I met an error in multimae/input_adapters.py line 232

    # Create patches [B, C, H, W] -> [B, (H*W), C]
    x_patch = rearrange(self.proj(x), 'b d nh nw -> b (nh nw) d')
    

    The full log is log.txt. x.size() is [batchsize, 64, 56, 56] before line 232. I can't find out what's wrong.

    What's more, I don't know why the pseudo semeg label image resize into 1/4 (that is 224*224->56*56)in utils/datasets.py line 105

    # Convert to Tensor
    for task in task_dict:
        if task in ['depth']:
            img = torch.Tensor(np.array(task_dict[task]) / 2 ** 16)
            img = img.unsqueeze(0)  # 1 x H x W
        elif task in ['rgb']:
            img = TF.to_tensor(task_dict[task])
            img = TF.normalize(img, mean=self.rgb_mean, std=self.rgb_std)
        elif task in ['semseg', 'semseg_coco']:
            # TODO: add this to a config instead
            # Rescale to 0.25x size (stride 4)
            scale_factor = 0.25
            img = task_dict[task].resize((int(self.input_size * scale_factor), int(self.input_size * scale_factor)))
            # Using pil_to_tensor keeps it in uint8, to_tensor converts it to float (rescaled to [0, 1])
            img = TF.pil_to_tensor(img).to(torch.long).squeeze(0)
    

    and then use nn.Conv2d in multimae/input_adapters.py line 198

    if self.interpolate_class_emb:
        self.proj = nn.Sequential(
            nn.Upsample(scale_factor=(1 / self.P_H, 1 / self.P_W),
                        mode='bilinear'),  # Actually a downsample operation
            nn.Conv2d(in_channels=self.dim_class_emb, out_channels=self.dim_tokens,
                        kernel_size=1, stride=1),
        )
    else:
        self.proj = nn.Conv2d(
            in_channels=self.dim_class_emb, out_channels=self.dim_tokens,
            kernel_size=(self.P_H, self.P_W), stride=(self.P_H, self.P_W)
        )
    )
    

    Thank you for any help.

  • Example usage of regular MAE Weights

    Example usage of regular MAE Weights

    Hey awesome work! I am trying to figure out how to modify the demo notebook to use the regular MAE instead of multiMAE. In particular i comment out all depth and semseg info but the resulting image infilling looks corrupted. Could you by chance share an example of proper usage of the regular MAE weights? Thanks so much for the help!

  • Some doubts about pseudo labels

    Some doubts about pseudo labels

    Hi, I am pseudo-tagging the imagenet-1k, and encountering some difficulties.

    Firstly, I wonder what would happen if the classes of semeg are more than 255? How to use one channel depth png image to represent them? (Although COCO datasets is only 80 classes, the imagenet is more than 255 classes when fine-tuning)

    Secondly, on the example of Colab notebook, the rgb2depth model of DPT could not input any size of imagenet pictures. How could we save all the pseudo labels down before the data augmentation cutting it into 224*224? We need to align the original images with the pseudo labeled image should we?

    Thank you for any help.

  • making 'mask_valid' folder in evaluate

    making 'mask_valid' folder in evaluate

    Hello! I just want to setup this nice work quickly, and got a problem with inference. I used 'run_finetuning_depth.py' trying to see the code running well, also followed 'setup.md' to download NYUv2 dataset and structure folders. But got not found error 'mask_valid' folder. How can I make the 'mask_valid' folder for evaluate only?

  • Facing issues in pretraning the code on custom dataset

    Facing issues in pretraning the code on custom dataset

    Hi,

    I am trying to pretrain the code on Celeb-HQ dataset and I sucessfully created respective grayscale depth maps(PNG) and grayscale segmentation(PNG) for pretraining. However, when i try to train "OMP_NUM_THREADS=1 torchrun --nproc_per_node=8 run_pretraining_multimae.py --config cfgs/pretrain/multimae-b_98_rgb+-depth-semseg_1600e.yaml --data_path /home/gargatik/gargatik/Datasets/copy/multimae/train"

    I am facing the issue:


    Start_______________________

    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [92,0,0], thread: [30,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [92,0,0], thread: [31,0,0] Assertion srcIndex < srcSelectDimSize failed. Traceback (most recent call last): File "run_pretraining_multimae.py", line 585, in main(opts) File "run_pretraining_multimae.py", line 414, in main train_stats = train_one_epoch( File "run_pretraining_multimae.py", line 501, in train_one_epoch preds, masks = model( File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1008, in forward output = self._run_ddp_forward(*inputs, **kwargs) File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 969, in _run_ddp_forward return module_to_run(*inputs[0], **kwargs[0]) File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/mnt/train-data-3-ssd/gargatik/inpaint_proj/MultiMAE/multimae/multimae.py", line 312, in forward input_task_tokens = { File "/mnt/train-data-3-ssd/gargatik/inpaint_proj/MultiMAE/multimae/multimae.py", line 313, in domain: self.input_adaptersdomain File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/mnt/train-data-3-ssd/gargatik/inpaint_proj/MultiMAE/multimae/input_adapters.py", line 232, in forward x_patch = rearrange(self.proj(x), 'b d nh nw -> b (nh nw) d') File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 457, in forward return self._conv_forward(input, self.weight, self.bias) File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 453, in _conv_forward return F.conv2d(input, weight, bias, self.stride, RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR terminate called after throwing an instance of 'c10::CUDAError' what(): CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Exception raised from createEvent at ../aten/src/ATen/cuda/CUDAEvent.h:166 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x3e (0x7f1c685031ee in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/lib/libc10.so) frame #1: + 0xf3c2d (0x7f1caad91c2d in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so) frame #2: + 0xf6f6e (0x7f1caad94f6e in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so) frame #3: + 0x463418 (0x7f1cba0f6418 in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #4: c10::TensorImpl::release_resources() + 0x175 (0x7f1c684ea7a5 in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/lib/libc10.so) frame #5: + 0x35f2f5 (0x7f1cb9ff22f5 in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #6: + 0x679288 (0x7f1cba30c288 in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #7: THPVariable_subclass_dealloc(_object) + 0x2d5 (0x7f1cba30c655 in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #8: /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python() [0x5ccad3] frame #9: /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python() [0x5d270c] frame #10: /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python() [0x5ec780] frame #11: /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python() [0x5441f8] frame #12: /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python() [0x54424a] frame #13: /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python() [0x54424a] frame #14: PyDict_SetItemString + 0x536 (0x5d1686 in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python) frame #15: PyImport_Cleanup + 0x79 (0x684619 in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python) frame #16: Py_FinalizeEx + 0x7f (0x67f8af in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python) frame #17: Py_RunMain + 0x32d (0x6b70fd in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python) frame #18: Py_BytesMain + 0x2d (0x6b736d in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python) frame #19: __libc_start_main + 0xf3 (0x7f1cd8fc10b3 in /lib/x86_64-linux-gnu/libc.so.6) frame #20: _start + 0x2e (0x5fa5ce in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python)

    Traceback (most recent call last): File "run_pretraining_multimae.py", line 585, in main(opts) File "run_pretraining_multimae.py", line 414, in main train_stats = train_one_epoch( File "run_pretraining_multimae.py", line 501, in train_one_epoch preds, masks = model( File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1008, in forward output = self._run_ddp_forward(*inputs, **kwargs) File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 969, in _run_ddp_forward return module_to_run(*inputs[0], **kwargs[0]) File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/mnt/train-data-3-ssd/gargatik/inpaint_proj/MultiMAE/multimae/multimae.py", line 312, in forward input_task_tokens = { File "/mnt/train-data-3-ssd/gargatik/inpaint_proj/MultiMAE/multimae/multimae.py", line 313, in domain: self.input_adaptersdomain File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/mnt/train-data-3-ssd/gargatik/inpaint_proj/MultiMAE/multimae/input_adapters.py", line 232, in forward x_patch = rearrange(self.proj(x), 'b d nh nw -> b (nh nw) d') File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 457, in forward return self._conv_forward(input, self.weight, self.bias) File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 453, in _conv_forward return F.conv2d(input, weight, bias, self.stride, RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.

    import torch torch.backends.cuda.matmul.allow_tf32 = False torch.backends.cudnn.benchmark = True torch.backends.cudnn.deterministic = False torch.backends.cudnn.allow_tf32 = True data = torch.randn([256, 64, 56, 56], dtype=torch.half, device='cuda', requires_grad=True).to(memory_format=torch.channels_last) net = torch.nn.Conv2d(64, 768, kernel_size=[4, 4], padding=[0, 0], stride=[4, 4], dilation=[1, 1], groups=1) net = net.cuda().half().to(memory_format=torch.channels_last) out = net(data) out.backward(torch.randn_like(out)) torch.cuda.synchronize()

    ConvolutionParams data_type = CUDNN_DATA_HALF padding = [0, 0, 0] stride = [4, 4, 0] dilation = [1, 1, 0] groups = 1 deterministic = false allow_tf32 = true input: TensorDescriptor 0xc853ff10 type = CUDNN_DATA_HALF nbDims = 4 dimA = 256, 64, 56, 56, strideA = 200704, 1, 3584, 64, output: TensorDescriptor 0xc8540270 type = CUDNN_DATA_HALF nbDims = 4 dimA = 256, 768, 14, 14, strideA = 150528, 1, 10752, 768, weight: FilterDescriptor 0x819c34f0 type = CUDNN_DATA_HALF tensor_format = CUDNN_TENSOR_NHWC nbDims = 4 dimA = 768, 64, 4, 4, Pointer addresses: input: 0x7f12aa000000 output: 0x7f12ca000000 weight: 0x7f13d9200c00

    terminate called after throwing an instance of 'c10::CUDAError' what(): CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Exception raised from createEvent at ../aten/src/ATen/cuda/CUDAEvent.h:166 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x3e (0x7f147c2811ee in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/lib/libc10.so) frame #1: + 0xf3c2d (0x7f14beb0fc2d in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so) frame #2: + 0xf6f6e (0x7f14beb12f6e in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so) frame #3: + 0x463418 (0x7f14cde74418 in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #4: c10::TensorImpl::release_resources() + 0x175 (0x7f147c2687a5 in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/lib/libc10.so) frame #5: + 0x35f2f5 (0x7f14cdd702f5 in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #6: + 0x679288 (0x7f14ce08a288 in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #7: THPVariable_subclass_dealloc(_object*) + 0x2d5 (0x7f14ce08a655 in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #8: /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python() [0x5ccad3] frame #9: /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python() [0x5d270c] frame #10: /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python() [0x5ec780] frame #11: /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python() [0x5441f8] frame #12: /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python() [0x54424a] frame #13: /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python() [0x54424a] frame #14: PyDict_SetItemString + 0x536 (0x5d1686 in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python) frame #15: PyImport_Cleanup + 0x79 (0x684619 in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python) frame #16: Py_FinalizeEx + 0x7f (0x67f8af in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python) frame #17: Py_RunMain + 0x32d (0x6b70fd in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python) frame #18: Py_BytesMain + 0x2d (0x6b736d in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python) frame #19: __libc_start_main + 0xf3 (0x7f14ecd3f0b3 in /lib/x86_64-linux-gnu/libc.so.6) frame #20: _start + 0x2e (0x5fa5ce in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python)

    WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 182198 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 182199 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 182200 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 182202 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 182203 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 182204 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 182205 closing signal SIGTERM ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 3 (pid: 182201) of binary: /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python Traceback (most recent call last): File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/torchrun", line 8, in sys.exit(main()) File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 345, in wrapper return f(*args, **kwargs) File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main run(args) File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run elastic_launch( File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

    run_pretraining_multimae.py FAILED

    Failures: <NO_OTHER_FAILURES>

    Root Cause (first observed failure): [0]: time : 2022-07-09_16:11:15 host : Norwalk rank : 3 (local_rank: 3) exitcode : -6 (pid: 182201) error_file: <N/A> traceback : Signal 6 (SIGABRT) received by PID 182201


    END______________

    Thanks for the help

  • Query about data preparation for finetuning for nyuv2-depth

    Query about data preparation for finetuning for nyuv2-depth

    Hi, I think the following correction holds: Line 357 in https://github.com/EPFL-VILAB/MultiMAE/blob/main/run_finetuning_depth.py should be dataset_train = build_regression_dataset(args, data_path=args.train_data_path, transform=train_transform) instead of dataset_train = build_regression_dataset(args, data_path=args.data_path, transform=train_transform) or the argument train_data_path should be changed to data_path.

    Apart from that, I am trying to recreate your results on NYUv2 for depth. but the dataset preparation instructions are not clear from the instructions in SETUP. As explained about the folder structure , where should the GT be when finetuning for depth and evaluating. Apart from that, mask_valid for fine-tuning? RuntimeError: Found 0 logs in subfolders of: /tmp-network/user/varora/multimae/multimae_data/train/mask_valid

  • Query regarding the output adapter heads

    Query regarding the output adapter heads

    Hi, Thank you for the interesting work and the extensive experiments. Your depth results are based on the DPT head in the paper. In the colab, you use the spatial adapter head for inference. I was wondering if your fine-tuning results with the spatial adapter head were better/worse than the DPT head? Was the intention to implement this spatial head more to test a pure transformer based head (compared to DPT's convolution based refineNet like approach?)?

    Thank you.

  • about run_finetuning_semseg.py

    about run_finetuning_semseg.py

    HI, I find that:

    in MultiMae/run_finetuning_semseg.py line735

    seg_pred_argmax = seg_pred[:num_classes].argmax(dim=1) 
    

    I think it should be

    seg_pred_argmax = seg_pred[:,:num_classes,:,:].argmax(dim=1) 
    
  • how to download and prepare NYUv2

    how to download and prepare NYUv2

    Thanks for releasing this code.

    I'm trying to reproduce fine-tuning for use NYUv2, but the downloaded data doesn't match the extension of the IMG_EXTENSIONS

    I need to know what pretreatment I should prepare.

  • Problem during evaluate the pretrained model

    Problem during evaluate the pretrained model

    Problem

    Hi! I encountered some problem while just try to evaluate this model with same config as Demo on Colab.

    Environment

    Ubuntu 22.04 CUDA Kernel 10.1 CUDA Runtime 11.3 Pytorch 1.12.0

    Terminal

    /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [0,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [1,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [2,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [3,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [4,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [5,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [6,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [7,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [8,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [9,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [10,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [11,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [12,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [13,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [14,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [15,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [16,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [17,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [18,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [19,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [20,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [21,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [22,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [23,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [24,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [25,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [26,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [27,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [28,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [29,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [30,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [31,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. Traceback (most recent call last): File "/home/jxr/anaconda3/envs/python/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/jxr/anaconda3/envs/python/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/jxr/.vscode-server/extensions/ms-python.python-2022.12.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module> cli.main() File "/home/jxr/.vscode-server/extensions/ms-python.python-2022.12.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main run() File "/home/jxr/.vscode-server/extensions/ms-python.python-2022.12.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file runpy.run_path(target, run_name="__main__") File "/home/jxr/.vscode-server/extensions/ms-python.python-2022.12.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path return _run_module_code(code, init_globals, run_name, File "/home/jxr/.vscode-server/extensions/ms-python.python-2022.12.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code _run_code(code, mod_globals, init_globals, File "/home/jxr/.vscode-server/extensions/ms-python.python-2022.12.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code exec(code, run_globals) File "/home/jxr/3D-MultiMAE/MultiMAE/try_model.py", line 118, in <module> preds, masks = multimae.forward( File "/home/jxr/3D-MultiMAE/MultiMAE/multimae/multimae.py", line 350, in forward encoder_tokens = self.encoder(input_tokens) File "/home/jxr/anaconda3/envs/python/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/jxr/anaconda3/envs/python/lib/python3.10/site-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/home/jxr/anaconda3/envs/python/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/jxr/3D-MultiMAE/MultiMAE/multimae/multimae_utils.py", line 230, in forward x = x + self.drop_path(self.attn(self.norm1(x))) File "/home/jxr/anaconda3/envs/python/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/jxr/3D-MultiMAE/MultiMAE/multimae/multimae_utils.py", line 175, in forward attn = (q @ k.transpose(-2, -1)) * self.scale RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training [Arxiv] VideoMAE: Masked Autoencoders are Data-Efficient Learne

Sep 24, 2022
PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval (M2HSE) PyTorch code fo

Sep 20, 2022
Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners This repository is built upon BEiT, thanks very much! Now, we on

Sep 21, 2022
PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-supervised ViT.

MAE for Self-supervised ViT Introduction This is an unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-sup

Sep 5, 2022
An pytorch implementation of Masked Autoencoders Are Scalable Vision Learners
 An pytorch implementation of Masked Autoencoders Are Scalable Vision Learners

An pytorch implementation of Masked Autoencoders Are Scalable Vision Learners This is a coarse version for MAE, only make the pretrain model, the fine

Sep 5, 2022
Re-implememtation of MAE (Masked Autoencoders Are Scalable Vision Learners) using PyTorch.

mae-repo PyTorch re-implememtation of "masked autoencoders are scalable vision learners". In this repo, it heavily borrows codes from codebase https:/

Dec 14, 2021
Implementation of PyTorch-based multi-task pre-trained models

mtdp Library containing implementation related to the research paper "Multi-task pre-training of deep neural networks for digital pathology" (Mormont

Sep 4, 2022
A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

CLIP4CMR A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval The original data and pre-calculate

Jan 12, 2022
A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.
A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.

Awesome Pretrained StyleGAN2 A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution. Note the readme is a

Sep 20, 2022
The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

PRIMER The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization. PRIMER is a pre-trained model for mu

Sep 12, 2022
Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

t5-japanese Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts. The following is a list of models that

Dec 13, 2021
FocusFace: Multi-task Contrastive Learning for Masked Face Recognition
FocusFace: Multi-task Contrastive Learning for Masked Face Recognition

FocusFace This is the official repository of "FocusFace: Multi-task Contrastive Learning for Masked Face Recognition" accepted at IEEE International C

Sep 25, 2022
Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

U2Fusion Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal (VIS-IR, medical), multi

Sep 17, 2022
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning

This is a release of our VIMPAC paper to illustrate the implementations. The pretrained checkpoints and scripts will be soon open-sourced in HuggingFace transformers.

Sep 15, 2022
[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception
[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

Versatile Multi-Modal Pre-Training for Human-Centric Perception Fangzhou Hong1  Liang Pan1  Zhongang Cai1,2,3  Ziwei Liu1* 1S-Lab, Nanyang Technologic

Sep 14, 2022
Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"
Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled -

Ancient Greek BERT The first and only available Ancient Greek sub-word BERT model! State-of-the-art post fine-tuning on Part-of-Speech Tagging and Mor

Apr 5, 2022
Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

ERICA Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive L

Sep 2, 2022
CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

UC2 UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training Mingyang Zhou, Luowei Zhou, Shuohang Wang, Yu Cheng, Linjie Li, Zhou Yu,

Jun 28, 2022