Official MegEngine implementation of CREStereo(CVPR 2022 Oral).

[CVPR 2022] Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation

This repository contains MegEngine implementation of our paper:

hydrussoftware

Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation
Jiankun Li, Peisen Wang, Pengfei Xiong, Tao Cai, Ziwei Yan, Lei Yang, Jiangyu Liu, Haoqiang Fan, Shuaicheng Liu
CVPR 2022

arXiv | BibTeX

Datasets

The Proposed Dataset

Download

There are two ways to download the dataset(~400GB) proposed in our paper:

  • Download using shell scripts dataset_download.sh
sh dataset_download.sh

the dataset will be downloaded and extracted in ./stereo_trainset/crestereo

  • Download from BaiduCloud here(Extraction code: aa3g) and extract the tar files manually.

Disparity Format

The disparity is saved as .png uint16 format which can be loaded using opencv imread function:

def get_disp(disp_path):
    disp = cv2.imread(disp_path, cv2.IMREAD_UNCHANGED)
    return disp.astype(np.float32) / 32

Other Public Datasets

Other public datasets we use including

Dependencies

CUDA Version: 10.1, Python Version: 3.6.9

  • MegEngine v1.8.2
  • opencv-python v3.4.0
  • numpy v1.18.1
  • Pillow v8.4.0
  • tensorboardX v2.1
python3 -m pip install -r requirements.txt

We also provide docker to run the code quickly:

docker run --gpus all -it -v /tmp:/tmp ylmegvii/crestereo
shotwell /tmp/disparity.png

Inference

Download the pretrained MegEngine model from here and run:

python3 test.py --model_path path_to_mge_model --left img/test/left.png --right img/test/right.png --size 1024x1536 --output disparity.png

Training

Modify the configurations in cfgs/train.yaml and run the following command:

python3 train.py

You can launch a TensorBoard to monitor the training process:

tensorboard --logdir ./train_log

and navigate to the page at http://localhost:6006 in your browser.

Acknowledgements

Part of the code is adapted from previous works:

We thank all the authors for their awesome repos.

Citation

If you find the code or datasets helpful in your research, please cite:

@misc{Li2022PracticalSM,
      title={Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation},
      author={Jiankun Li and Peisen Wang and Pengfei Xiong and Tao Cai and Ziwei Yan and Lei Yang and Jiangyu Liu and Haoqiang Fan and Shuaicheng Liu},
      year={2022},
      eprint={2203.11483},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Owner
MEGVII Research
Power Human with AI. 持续创新拓展认知边界 非凡科技成就产品价值
MEGVII Research
Comments
  • Is CUDA 11.6 supported?

    Is CUDA 11.6 supported?

    This is a really promising project, congratulations and thanks for releasing it!

    I'm trying to run the test script with your Eth3d model and this command: python3 test.py --model_path path_to_mge_model --left img/test/left.png --right img/test/right.png --size 1024x1536 --output disparity.png

    But the code hangs up and doesn't return from this line in extractor.py:82: self.conv2 = M.Conv2d(128, output_dim, kernel_size=1)

    which is called form load_model in test.py:15 model = Model(max_disp=256, mixed_precision=False, test_mode=True)

    My GPU is NVIDIA RTX A6000 and the CUDA version on the system is v11.6

  • Results on Holopix50k dataset

    Results on Holopix50k dataset

    Hello! Thank you for sharing the codes and the model. I tested the pre-trained model on Holopix50k test dataset, but didn't get similar results that you showed on the paper. If I would like to run crestereo_eth3d.mge model on this dataset, does it require different parameter setting or pre-preprocessing? How I can get the similar results on Holopix50k dataset? Any advice would be very helpful. Thank you in advance! 0001 0002 0007 0008

  • Did you obtain results on Holopix50k with published model?

    Did you obtain results on Holopix50k with published model?

    I've tried to run published model with few images from Holopix50k and got awful results. Can you please tell how to obtain results similar to paper? Another model / another preprocessing?

  • TypeError: pad() got an unexpected keyword argument 'pad_witdth' in test.py

    TypeError: pad() got an unexpected keyword argument 'pad_witdth' in test.py

    Good job! May I ask a question?

    I tried to run the test.py on a V100 with the Cuda version being 10.2. The data is from ./img, and I set the size being 1280*720, the same as the original size. But I meet the following error:

    File "CREStereo/nets/corr.py", line 42, in get_correlation (0, 0), (0, 0), (pady, pady), (padx, padx)), mode="replicate") TypeError: pad() got an unexpected keyword argument 'pad_witdth'

    It means that I may use the wrong type, but I checked the code and did not find the problems: `

    def pad( src: Tensor, pad_width: Tuple[Tuple[int, int], ...], mode: str = "constant", constant_value: float = 0.0, ) -> Tensor: r"""Pads the input tensor.

    Args:
        pad_width: A tuple. Each element in the tuple is the tuple of 2-elements,
            the 2 elements represent the padding size on both sides of the current dimension, ``(front_offset, back_offset)``
        mode: One of the following string values. Default: ``'constant'``
    
            * ``'constant'``: Pads with a constant value.
            * ``'reflect'``: Pads with the reflection of the tensor mirrored on the first and last values of the tensor along each axis.
            * ``'replicate'``: Pads with the edge values of tensor.
        constant_val: Fill value for ``'constant'`` padding. Default: 0
    
    Examples:
        >>> import numpy as np
        >>> inp = Tensor([[1., 2., 3.],[4., 5., 6.]])
        >>> inp
        Tensor([[1. 2. 3.]
         [4. 5. 6.]], device=xpux:0)
        >>> F.nn.pad(inp, pad_width=((1, 1),), mode="constant")
    

    `

    I used the right Tuple type, but something wrong happened.

  • MegEngine 1.9.0 causes test.py error

    MegEngine 1.9.0 causes test.py error

    I have been playing around a bit with the code (thank you so much, by the way. Having heaps of fun with it) and found out that MegEngine 1.9.0 causes test.py to die with the following output:

    Images resized: 1024x1536
    Model Forwarding...
    Traceback (most recent call last):
      File "test.py", line 94, in <module>
        pred = inference(left_img, right_img, model_func, n_iter=20)
      File "test.py", line 45, in inference
        pred_flow_dw2 = model(imgL_dw2, imgR_dw2, iters=n_iter, flow_init=None)
      File "/usr/local/lib/python3.6/dist-packages/megengine/module/module.py", line 149, in __call__
        outputs = self.forward(*inputs, **kwargs)
      File "/home/dgxmartin/workspace/CREStereo/nets/crestereo.py", line 210, in forward
        align_corners=True,
      File "/usr/local/lib/python3.6/dist-packages/megengine/functional/vision.py", line 663, in interpolate
        [wscale, Tensor([0, 0], dtype="float32", device=inp.device)], axis=0
      File "/usr/local/lib/python3.6/dist-packages/megengine/functional/tensor.py", line 405, in concat
        (result,) = apply(builtin.Concat(axis=axis, comp_node=device.to_c()), *inps)
    TypeError: py_apply expects tensor as inputs
    

    For the time being the MegEngine version should be set to exactly 1.8.2

  • What datasets are used for pretraining?

    What datasets are used for pretraining?

    The pretrained model works amazingly well on the real-life photos! What datasets are used for pretraining? Can you please provide the training details of the pretrained model? Thanks!

  • Update requirements.txt to MegEngine v1.9.1

    Update requirements.txt to MegEngine v1.9.1

    function.Pad may lead to some weird NaN in MegEngine v1.8.2, MegEngine v1.9.0 resolve this but brings more problems, which is pointed out in https://github.com/megvii-research/CREStereo/pull/14 .

    The most recent release v1.9.1 resolves all of these problems, updates MegEngine version constraint to v1.9.1 or later

  • WRN Not FormattedTensorValue input for AttachGrad op: AttachGradValue{key=grad_1}

    WRN Not FormattedTensorValue input for AttachGrad op: AttachGradValue{key=grad_1}

    Thank you for the excellent work! I got some problem I finetune the model using own data. Howerer it got stuck in step 2 flow_predictions = model(left, right) after one optimizer.step().clear_grad(), the network can not inference any image. I use gdb to debug and find it would be stuck in random layers in the network forward....

    I check that my data is correct. Even using same data the model got stuck after one optimizer.step().clear_grad() Do you have any suggestions?

    I upgrade mgengine 1.9.1 -> 1.11.1 the model can train without stuck. However, it print when doing optimizer.step().clear_grad() at first time:

    WRN Not FormattedTensorValue input for AttachGrad op: AttachGradValue{key=grad_1}, (49342:49342) Handle{ptr=0x5616b860dd58, name="update_block.encoder.conv.bias"}

    the para update abnormal, the result are worse. Does anyone meet the same problem or has any suggestion?

  • the GPU memory is too large

    the GPU memory is too large

    @zsc Thank you for your sharing! As your paper said, you can train with batch size 16 on 8 2080TI GPUs when you use the pytorch framework. But when I want to train your network, the GPU memory is large as 8.5G with batch size 1. So what is the problem?

  • CREStereo not able to run inside thread with Python

    CREStereo not able to run inside thread with Python

    I do not seem to be able to run inference with CREStereo inside of a thread using python's threading module. Below is a minimal example using the test.py script from this repo. It loads the pretrained model and runs inference in a child thread(lines 96-98). Also attached is the error that appears when this is run: CREStereo_thread_error

    import os
    
    import megengine as mge
    import megengine.functional as F
    import argparse
    import numpy as np
    import cv2
    
    from nets import Model
    
    #NOTE: added threading import statement
    import threading
    
    def load_model(model_path):
        print("Loading model:", os.path.abspath(model_path))
        pretrained_dict = mge.load(model_path)
        model = Model(max_disp=256, mixed_precision=False, test_mode=True)
    
        model.load_state_dict(pretrained_dict["state_dict"], strict=True)
    
        model.eval()
        return model
    
    
    def inference(left, right, model, n_iter=20):
        imgL = left.transpose(2, 0, 1)
        imgR = right.transpose(2, 0, 1)
        imgL = np.ascontiguousarray(imgL[None, :, :, :])
        imgR = np.ascontiguousarray(imgR[None, :, :, :])
    
        imgL = mge.tensor(imgL).astype("float32")
        imgR = mge.tensor(imgR).astype("float32")
    
        imgL_dw2 = F.nn.interpolate(
            imgL,
            size=(imgL.shape[2] // 2, imgL.shape[3] // 2),
            mode="bilinear",
            align_corners=True,
        )
        imgR_dw2 = F.nn.interpolate(
            imgR,
            size=(imgL.shape[2] // 2, imgL.shape[3] // 2),
            mode="bilinear",
            align_corners=True,
        )
        pred_flow_dw2 = model(imgL_dw2, imgR_dw2, iters=n_iter, flow_init=None)
    
        pred_flow = model(imgL, imgR, iters=n_iter, flow_init=pred_flow_dw2)
        pred_disp = F.squeeze(pred_flow[:, 0, :, :]).numpy()
    
        return pred_disp
    
    
    if __name__ == "__main__":
        parser = argparse.ArgumentParser(description="A demo to run CREStereo.")
        parser.add_argument(
            "--model_path",
            default="crestereo_eth3d.mge",
            help="The path of pre-trained MegEngine model.",
        )
        parser.add_argument(
            "--left", default="img/test/left.png", help="The path of left image."
        )
        parser.add_argument(
            "--right", default="img/test/right.png", help="The path of right image."
        )
        parser.add_argument(
            "--size",
            default="1024x1536",
            help="The image size for inference. Te default setting is 1024x1536. \
                            To evaluate on ETH3D Benchmark, use 768x1024 instead.",
        )
        parser.add_argument(
            "--output", default="disparity.png", help="The path of output disparity."
        )
        args = parser.parse_args()
    
        assert os.path.exists(args.model_path), "The model path do not exist."
        assert os.path.exists(args.left), "The left image path do not exist."
        assert os.path.exists(args.right), "The right image path do not exist."
    
        model_func = load_model(args.model_path)
        left = cv2.imread(args.left)
        right = cv2.imread(args.right)
    
        assert left.shape == right.shape, "The input images have inconsistent shapes."
    
        in_h, in_w = left.shape[:2]
    
        print("Images resized:", args.size)
        eval_h, eval_w = [int(e) for e in args.size.split("x")]
        left_img = cv2.resize(left, (eval_w, eval_h), interpolation=cv2.INTER_LINEAR)
        right_img = cv2.resize(right, (eval_w, eval_h), interpolation=cv2.INTER_LINEAR)
    
        #NOTE: put inference in a thread here
        inference_thread = threading.Thread(target=inference, args=(left_img, right_img, model_func,))
        inference_thread.start()
        inference_thread.join()
    
  • Model size and number of params?

    Model size and number of params?

    Hey, so good job you have done!

    Have you ever compared the model size and number of parameters with other SOTA works, such as LEAStereo, RAFT-Stereo etc? Seems your model very smart.

  • testing result is better in RGB format than default BGR format?

    testing result is better in RGB format than default BGR format?

    I am testing the provided model. By default, the input is in BGR format since it uses cv2.imread. I found that if the images are converted to RGB format cv2.COLOR_BGR2RGB, the depth map is even better. I checked the training code, it reads images using cv2.imread. So I am wondering why it is the case. Does the author or anyone else see similar phenomena?

(CVPR 2022 Oral) Official implementation for "Surface Representation for Point Clouds"
(CVPR 2022 Oral) Official implementation for

RepSurf - Surface Representation for Point Clouds [CVPR 2022 Oral] By Haoxi Ran* , Jun Liu, Chengjie Wang ( * : corresponding contact) The pytorch off

Nov 26, 2022
Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".
Official code for the CVPR 2022 (oral) paper

nvdiffrec Joint optimization of topology, materials and lighting from multi-view image observations as described in the paper Extracting Triangular 3D

Nov 29, 2022
[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation Prerequisite Please create and activate the following conda envrionment. To r

Nov 25, 2022
[CVPR 2022 Oral] Rethinking Minimal Sufficient Representation in Contrastive Learning

Rethinking Minimal Sufficient Representation in Contrastive Learning PyTorch implementation of Rethinking Minimal Sufficient Representation in Contras

Nov 23, 2022
(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry
(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry Official implementation of the paper Multi-View Depth Est

Nov 30, 2022
[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

TubeDETR: Spatio-Temporal Video Grounding with Transformers Website • STVG Demo • Paper This repository provides the code for our paper. This includes

Nov 30, 2022
[CVPR 2022 Oral] Crafting Better Contrastive Views for Siamese Representation Learning
[CVPR 2022 Oral] Crafting Better Contrastive Views for Siamese Representation Learning

Crafting Better Contrastive Views for Siamese Representation Learning (CVPR 2022 Oral) 2022-03-29: The paper was selected as a CVPR 2022 Oral paper! 2

Nov 30, 2022
Code for "Neural 3D Scene Reconstruction with the Manhattan-world Assumption" CVPR 2022 Oral
Code for

News 05/10/2022 To make the comparison on ScanNet easier, we provide all quantitative and qualitative results of baselines here, including COLMAP, COL

Dec 1, 2022
[CVPR 2022 Oral] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation
[CVPR 2022 Oral] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

EPro-PnP EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation In CVPR 2022 (Oral). [paper] Hanshen

Nov 30, 2022
The Pytorch code of "Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification", CVPR 2022 (Oral).
The Pytorch code of

DeepBDC for few-shot learning        Introduction In this repo, we provide the implementation of the following paper: "Joint Distribution Matters: Dee

Nov 29, 2022
[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception
[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

Versatile Multi-Modal Pre-Training for Human-Centric Perception Fangzhou Hong1  Liang Pan1  Zhongang Cai1,2,3  Ziwei Liu1* 1S-Lab, Nanyang Technologic

Nov 29, 2022
Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORAL)
Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORAL)

Scribble-Supervised LiDAR Semantic Segmentation Dataset and code release for the paper Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORA

Dec 5, 2022
Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral)
Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral)

Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral) This is the official implementat

Dec 1, 2022
[CVPR 2022] PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (Oral)
[CVPR 2022] PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (Oral)

PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision Kehong Gong*, Bingbing Li*, Jianfeng Zhang*, Ta

Nov 29, 2022
[CVPR 2022 Oral] MixFormer: End-to-End Tracking with Iterative Mixed Attention
[CVPR 2022 Oral] MixFormer: End-to-End Tracking with Iterative Mixed Attention

MixFormer The official implementation of the CVPR 2022 paper MixFormer: End-to-End Tracking with Iterative Mixed Attention [Models and Raw results] (G

Dec 2, 2022
Temporally Efficient Vision Transformer for Video Instance Segmentation, CVPR 2022, Oral
Temporally Efficient Vision Transformer for Video Instance Segmentation, CVPR 2022, Oral

Temporally Efficient Vision Transformer for Video Instance Segmentation Temporally Efficient Vision Transformer for Video Instance Segmentation (CVPR

Nov 21, 2022
Focal Sparse Convolutional Networks for 3D Object Detection (CVPR 2022, Oral)
Focal Sparse Convolutional Networks for 3D Object Detection (CVPR 2022, Oral)

Focal Sparse Convolutional Networks for 3D Object Detection (CVPR 2022, Oral) This is the official implementation of Focals Conv (CVPR 2022), a new sp

Nov 25, 2022
Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

IC-Conv This repository is an official implementation of the paper Inception Convolution with Efficient Dilation Search. Getting Started Download Imag

Nov 20, 2022
Official PyTorch implementation of RobustNet (CVPR 2021 Oral)
Official PyTorch implementation of RobustNet (CVPR 2021 Oral)

RobustNet (CVPR 2021 Oral): Official Project Webpage Codes and pretrained models will be released soon. This repository provides the official PyTorch

Dec 5, 2022