A PyTorch-Based Framework for Deep Learning in Computer Vision

TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision

@misc{you2019torchcv,
    author = {Ansheng You and Xiangtai Li and Zhen Zhu and Yunhai Tong},
    title = {TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision},
    howpublished = {\url{https://github.com/donnyyou/torchcv}},
    year = {2019}
}

This repository provides source code for most deep learning based cv problems. We'll do our best to keep this repository up-to-date. If you do find a problem about this repository, please raise an issue or submit a pull request.

- Semantic Flow for Fast and Accurate Scene Parsing
- Code and models: https://github.com/lxtGH/SFSegNets

Implemented Papers

  • Image Classification

    • VGG: Very Deep Convolutional Networks for Large-Scale Image Recognition
    • ResNet: Deep Residual Learning for Image Recognition
    • DenseNet: Densely Connected Convolutional Networks
    • ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
    • ShuffleNet V2: Practical Guidelines for Ecient CNN Architecture Design
    • Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search
  • Semantic Segmentation

    • DeepLabV3: Rethinking Atrous Convolution for Semantic Image Segmentation
    • PSPNet: Pyramid Scene Parsing Network
    • DenseASPP: DenseASPP for Semantic Segmentation in Street Scenes
    • Asymmetric Non-local Neural Networks for Semantic Segmentation
    • Semantic Flow for Fast and Accurate Scene Parsing
  • Object Detection

    • SSD: Single Shot MultiBox Detector
    • Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
    • YOLOv3: An Incremental Improvement
    • FPN: Feature Pyramid Networks for Object Detection
  • Pose Estimation

    • CPM: Convolutional Pose Machines
    • OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
  • Instance Segmentation

    • Mask R-CNN
  • Generative Adversarial Networks

    • Pix2pix: Image-to-Image Translation with Conditional Adversarial Nets
    • CycleGAN: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks.

QuickStart with TorchCV

Now only support Python3.x, pytorch 1.3.

pip3 install -r requirements.txt
cd lib/exts
sh make.sh

Performances with TorchCV

All the performances showed below fully reimplemented the papers' results.

Image Classification

  • ImageNet (Center Crop Test): 224x224
Model Train Test Top-1 Top-5 BS Iters Scripts
ResNet50 train val 77.54 93.59 512 30W ResNet50
ResNet101 train val 78.94 94.56 512 30W ResNet101
ShuffleNetV2x0.5 train val 60.90 82.54 1024 40W ShuffleNetV2x0.5
ShuffleNetV2x1.0 train val 69.71 88.91 1024 40W ShuffleNetV2x1.0
DFNetV1 train val 70.99 89.68 1024 40W DFNetV1
DFNetV2 train val 74.22 91.61 1024 40W DFNetV2

Semantic Segmentation

  • Cityscapes (Single Scale Whole Image Test): Base LR 0.01, Crop Size 769
Model Backbone Train Test mIOU BS Iters Scripts
PSPNet 3x3-Res101 train val 78.20 8 4W PSPNet
DeepLabV3 3x3-Res101 train val 79.13 8 4W DeepLabV3
  • ADE20K (Single Scale Whole Image Test): Base LR 0.02, Crop Size 520
Model Backbone Train Test mIOU PixelACC BS Iters Scripts
PSPNet 3x3-Res50 train val 41.52 80.09 16 15W PSPNet
DeepLabv3 3x3-Res50 train val 42.16 80.36 16 15W DeepLabV3
PSPNet 3x3-Res101 train val 43.60 81.30 16 15W PSPNet
DeepLabv3 3x3-Res101 train val 44.13 81.42 16 15W DeepLabV3

Object Detection

  • Pascal VOC2007/2012 (Single Scale Test): 20 Classes
Model Backbone Train Test mAP BS Epochs Scripts
SSD300 VGG16 07+12_trainval 07_test 0.786 32 235 SSD300
SSD512 VGG16 07+12_trainval 07_test 0.808 32 235 SSD512
Faster R-CNN VGG16 07_trainval 07_test 0.706 1 15 Faster R-CNN

Pose Estimation

  • OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

Instance Segmentation

  • Mask R-CNN

Generative Adversarial Networks

  • Pix2pix
  • CycleGAN

DataSets with TorchCV

TorchCV has defined the dataset format of all the tasks which you could check in the subdirs of data. Following is an example dataset directory trees for training semantic segmentation. You could preprocess the open datasets with the scripts in folder data/seg/preprocess

Dataset
    train
        image
            00001.jpg/png
            00002.jpg/png
            ...
        label
            00001.png
            00002.png
            ...
    val
        image
            00001.jpg/png
            00002.jpg/png
            ...
        label
            00001.png
            00002.png
            ...

Commands with TorchCV

Take PSPNet as an example. ("tag" could be any string, include an empty one.)

  • Training
cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh train tag
  • Resume Training
cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh train tag
  • Validate
cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh val tag
  • Testing:
cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh test tag

Demos with TorchCV

Example output of VGG19-OpenPose

Example output of VGG19-OpenPose

Comments
  • mIoU of SFNet is much lower than 78.9%

    mIoU of SFNet is much lower than 78.9%

    I trained SFNet with default setting of torchcv/scripts/seg/cityscapes/run_sfnet_res18_cityscapes.sh except for NGPUS (I changed NGPUS from 8 to 4, and --train_batch_size from 2 to 4). But I got about 73% of mIoU for single scale inference. It may be because pretrained model, 3x3resnet18-imagenet.pth, is not given. Although I also trained SFNet (ResNet-101) using pretrained model, 3x3resnet101-imagenet.pth, given by this repository, I got about 78% of mIoU for multi-scale inference.

    How can I reproduce the paper results?

  • yolov3在coco2017数据集上跑出错

    yolov3在coco2017数据集上跑出错

    作者你好,看资料应该是北京大学的,中文肯定是好的,我就用中文了。 我想用yolov3来跑coco2017的数据,看项目的结构需要写一个scripts文件,我也写了scripts文件。但是在加了之后开始训练的时候出现了错误。 scripts文件 #!/usr/bin/env bash

    nvidia-smi PYTHON="python"

    export PYTHONPATH="/home/dezheng/work/torchcv":$PYTHONPATH

    cd ../../../

    DATA_DIR="/home/dezheng/work/torchcv/data/torchcv_coco" MODEL_NAME="yolov3" LOSS_TYPE="yolov3loss" CHECKPOINTS_NAME="yolov3_darknet_coco_det"$2 PRETRAINED_MODEL="./pretrained_models/yolov3_darknet_caffe_pretrained.pth" HYPES_FILE='hypes/det/coco/yolov3_darknet_coco_det.json'

    LOG_DIR="./log/det/coco/" LOG_FILE="${LOG_DIR}${CHECKPOINTS_NAME}.log"

    if [[ ! -d ${LOG_DIR} ]]; then echo ${LOG_DIR}" not exists!!!" mkdir -p ${LOG_DIR} fi

    if [[ "$1"x == "train"x ]]; then ${PYTHON} -u main.py --hypes ${HYPES_FILE} --phase train --log_to_file n --gpu 0 --cudnn n
    --data_dir ${DATA_DIR} --loss_type ${LOSS_TYPE} --model_name ${MODEL_NAME}
    --checkpoints_name ${CHECKPOINTS_NAME} --pretrained ${PRETRAINED_MODEL} 2>&1 | tee ${LOG_FILE}

    最后报错如下: Traceback (most recent call last): File "main.py", line 186, in runner = method_selector.select_det_method() File "/home/dezheng/work/torchcv/methods/method_selector.py", line 93, in select_det_method return DET_METHOD_DICTkey File "/home/dezheng/work/torchcv/methods/det/yolov3.py", line 35, in init self.det_data_loader = DataLoader(configer) File "/home/dezheng/work/torchcv/datasets/det/data_loader.py", line 23, in init self.aug_train_transform = pil_aug_trans.PILAugCompose(self.configer, split='train') File "/home/dezheng/work/torchcv/datasets/tools/pil_aug_transforms.py", line 1187, in init self.transforms[trans] = PIL_AUGMENTATIONS_DICTtrans TypeError: init() got an unexpected keyword argument 'center_jitter' 这个怎么改呢?

  • Question about the inference speed of SFNet

    Question about the inference speed of SFNet

    Hi, thanks for your great work and contribution. I'm interested in your work SFNet and trained a model (SFNet with ResNet18) and achieved a comparable validation result. However, I'm confused about how to reach the inference speed presented in your paper (18/26FPS) with this codebase. I inserted a timer to segmentation_test.py and obtain the inference speed of SFNet (single scale, ResNet18): 0.143s/image, which is much slower than that mentioned in the paper. Could you provide some clues to obtain the inference speed presented in the paper? (without TensorRT)

  • assert isinstance(dc, DataContainer), type(dc) AssertionError: <class 'torch.Tensor'> when running test

    assert isinstance(dc, DataContainer), type(dc) AssertionError: when running test

    Traceback (most recent call last): File "main.py", line 185, in Controller.test(runner) File "/home/jerry/torchcv/runner/tools/controller.py", line 81, in test runner.test(test_dir, out_dir) File "/home/jerry/torchcv/runner/seg/fcn_segmentor_test.py", line 48, in test total_logits = self.ss_test(data_dict) File "/home/jerry/torchcv/runner/seg/fcn_segmentor_test.py", line 85, in ss_test data_dict = self.blob_helper.get_blob(in_data_dict, scale=1.0) File "/home/jerry/torchcv/runner/tools/blob_helper.py", line 25, in get_blob for image, meta in zip(DCHelper.tolist(data_dict['img']), DCHelper.tolist(data_dict['meta'])): File "/home/jerry/torchcv/tools/helper/dc_helper.py", line 19, in tolist assert isinstance(dc, DataContainer), type(dc) AssertionError: <class 'torch.Tensor'>

    When I run test on my own dataset, the error is thrown.

  • About DenseASPP

    About DenseASPP

    Hi, I want try to train denseaspp_model on your code. Can I just change the .sh? Or, could you have the script about denseaspp on cityscapes? Thank you.

  • Can't download Semantic Segmentation trained models

    Can't download Semantic Segmentation trained models

    When I try to download trained models of DeepLabv3, there is a 404 error. Is there anything wrong to the hyperlink? Hope this problem to be corrected, THX :D

  • KeyError when loading ssd512 pretrained model

    KeyError when loading ssd512 pretrained model

    Hi, donnyyou. After downloading the pretrained .pth ssd512 model, I run bash run_ssd512_vgg16_voc_det.sh train tag Then I got:

    2019-05-23 10:23:17,616 INFO [vgg512_ssd.py, 60] Loading pretrained model:./pretrained_models/ssd_vgg512_voc_0.808.pth 2019-05-23 10:23:20,996 INFO [vgg512_ssd.py, 63] Pretrained Keys: dict_keys(['config_dict', 'state_dict']) 2019-05-23 10:23:20,997 INFO [vgg512_ssd.py, 65] Model Keys: odict_keys(['features.0.weight', 'features.0.bias', 'features.2.weight', 'features.2.bias', 'features.5.weight', 'features.5.bias', 'features.7.weight', 'features.7.bias', 'features.10.weight', 'features.10.bias', 'features.12.weight', 'features.12.bias', 'features.14.weight', 'features.14.bias', 'features.17.weight', 'features.17.bias', 'features.19.weight', 'features.19.bias', 'features.21.weight', 'features.21.bias', 'features.24.weight', 'features.24.bias', 'features.26.weight', 'features.26.bias', 'features.28.weight', 'features.28.bias', 'features.31.weight', 'features.31.bias', 'features.33.weight', 'features.33.bias']) 2019-05-23 10:23:20,997 INFO [vgg512_ssd.py, 75] Matched Keys: dict_keys([]) 2019-05-23 10:23:21,067 ERROR [configer.py, 69] ssd_detection_layer.py, 16 KeyError: ('gt', 'num_anchor_list').

    Since your ssd script is 404 not found, I modify the faster rcnn script for ssd as below:

    #!/usr/bin/env bash

    #check the enviroment info nvidia-smi PYTHON="python"

    export PYTHONPATH="/home/ruijin/Work/python/torchcv-master":$PYTHONPATH

    cd ../../../

    DATA_DIR="/home/donny/DataSet/VOC07_DET" MODEL_NAME="vgg512_ssd" LOSS_TYPE="ssd_multibox_loss" CHECKPOINTS_NAME="ssd_vgg16_voc_det"$2 PRETRAINED_MODEL="./pretrained_models/ssd_vgg512_voc_0.808.pth" HYPES_FILE='hypes/det/voc/ssd512_vgg16_voc_det.json'

    LOG_DIR="./log/det/voc/" LOG_FILE="${LOG_DIR}${CHECKPOINTS_NAME}.log"

    if [[ ! -d ${LOG_DIR} ]]; then echo ${LOG_DIR}" not exists!!!" mkdir -p ${LOG_DIR} fi

    if [[ "$1"x == "train"x ]]; then ${PYTHON} -u main.py --hypes ${HYPES_FILE} --phase train --log_to_file n --gpu 0 --cudnn n
    --data_dir ${DATA_DIR} --loss_type ${LOSS_TYPE} --model_name ${MODEL_NAME}
    --checkpoints_name ${CHECKPOINTS_NAME} --pretrained ${PRETRAINED_MODEL} 2>&1 | tee ${LOG_FILE}

    elif [[ "$1"x == "resume"x ]]; then ${PYTHON} -u main.py --hypes ${HYPES_FILE} --phase train --log_to_file n --gpu 0 --cudnn n
    --data_dir ${DATA_DIR} --loss_type ${LOSS_TYPE} --model_name ${MODEL_NAME}
    --resume_continue y --resume ./checkpoints/cityscapes/${CHECKPOINTS_NAME}_latest.pth
    --checkpoints_name ${CHECKPOINTS_NAME} --pretrained ${PRETRAINED_MODEL} 2>&1 | tee -a ${LOG_FILE}

    elif [[ "$1"x == "debug"x ]]; then ${PYTHON} -u main.py --hypes ${HYPES_FILE} --phase debug --gpu 0 --log_to_file n 2>&1 | tee ${LOG_FILE}

    elif [[ "$1"x == "val"x ]]; then ${PYTHON} -u main.py --hypes ${HYPES_FILE} --phase test --log_to_file n --model_name ${MODEL_NAME}
    --phase test --gpu 0 --resume ./checkpoints/cityscapes/${CHECKPOINTS_NAME}_latest.pth
    --test_dir ${DATA_DIR}/val/image --out_dir val 2>&1 | tee -a ${LOG_FILE} cd metrics/det/ ${PYTHON} -u voc_evaluator.py --hypes "../../../"${HYPES_FILE}
    --json_dir ../../../out/results/voc/test_dir/${CHECKPOINTS_NAME}/val/label
    --gt_dir ${DATA_DIR}/val/label 2>&1 | tee -a "../../"${LOG_FILE}

    else echo "$1"x" is invalid..." fi

    Hope you can help me. Thanks!

  • No module named 'extensions.layers.nms.src.cython_nms'

    No module named 'extensions.layers.nms.src.cython_nms'

    Hi,

    Great job! It is a great framework to learn pytorch and latest networks.

    But when I am training the openpose network, it shows No module named 'extensions.layers.nms.src.cython_nms'. And I check with the files, finding the file is missing.

    Is there anything I am missing?

  • yolov3 loss error

    yolov3 loss error

    in utils/layers/det/yolo_detection_layer.py : I believe line 42(if self.configer.get('phase') != 'debug':) should be removed, because values passed to BCE loss should be in the range of (0, 1) in the training phase.

  • Question about

    Question about "ignore_index": 19

    In hypes/seg/cityscape/*_cityscape_seg.json, "ignore_index": 19 But in datasets/seg/fs_data_loader.py encoded_labelmap = np.ones(shape=(shape[0], shape[1]), dtype=np.float32) * 255 so labels treated as void will be equel to 255. My question is that should ignore_index be equel to 255? Looking forward to your reply.

  • where is the pretrained.pth file?

    where is the pretrained.pth file?

    In the .sh file, I find a parameter PRETRAINED_MODEL refering to a pretrained file. But I don't find this file in repo. So where is the pretrained.pth file?

  • RuntimeError: Given groups=1, weight of size [512, 1024, 3, 3], expected input[1, 256, 64, 64] to have 1024 channels, but got 256 channels instead

    RuntimeError: Given groups=1, weight of size [512, 1024, 3, 3], expected input[1, 256, 64, 64] to have 1024 channels, but got 256 channels instead

    RuntimeError: Given groups=1, weight of size [512, 1024, 3, 3], expected input[1, 256, 64, 64] to have 1024 channels, but got 256 channels instead 请问训练时,这个问题如何解决

  • AttributeError: 'RandomSampler' object has no attribute 'num_samples'

    AttributeError: 'RandomSampler' object has no attribute 'num_samples'

    data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, size), bs=batch_size) try: learn = ConvLearner.pretrained(arch, data, precompute=True)
    except: learn = ConvLearner.pretrained(arch, data, precompute=True)

    This was the code. Could anyone help me to address this issue!

  • SFNET: what's difference between x,fpn_dsn?

    SFNET: what's difference between x,fpn_dsn?

    https://github.com/donnyyou/torchcv/blob/1875b7905f4aa0aed7d6cf49ae0ed4512d5f6ae7/model/seg/nets/sfnet.py#L189

    而且,ResSFNet 和 AlignHead 都有self.fpn_out,是否是重复的?(虽然这两个fpn_out有一点点差别)

  • train error

    train error

    RuntimeError: Given groups=1, weight of size [512, 1024, 3, 3], expected input[8, 256, 64, 64] to have 1024 channels, but got 256 channels instead
    
    

    Command;

    python main.py --config_file configs/seg/cityscapes/sfnet_res18_cityscapes_seg.conf --gpu 0
    
Computer Vision Script to recognize first person motion, developed as final project for the course "Machine Learning and Deep Learning"
Computer Vision Script to recognize first person motion, developed as final project for the course

Overview of The Code BaseColab/MLDL_FPAR.pdf: it contains the full explanation of our work Base Colab: it contains the base colab used to perform all

Jul 16, 2022
This is the code repository for the paper A hierarchical semantic segmentation framework for computer-vision-based bridge column damage detection

Bridge-damage-segmentation This is the code repository for the paper A hierarchical semantic segmentation framework for computer-vision-based bridge c

Sep 20, 2022
An Agnostic Computer Vision Framework - Pluggable to any Training Library: Fastai, Pytorch-Lightning with more to come
An Agnostic Computer Vision Framework - Pluggable to any Training Library: Fastai, Pytorch-Lightning with more to come

IceVision is the first agnostic computer vision framework to offer a curated collection with hundreds of high-quality pre-trained models from torchvision, MMLabs, and soon Pytorch Image Models. It orchestrates the end-to-end deep learning workflow allowing to train networks with easy-to-use robust high-performance libraries such as Pytorch-Lightning and Fastai

Nov 29, 2022
Spiking Neural Network for Computer Vision using SpikingJelly framework and Pytorch-Lightning

Spiking Neural Network for Computer Vision using SpikingJelly framework and Pytorch-Lightning

Oct 20, 2022
TorchOk - The toolkit for fast Deep Learning experiments in Computer Vision

TorchOk - The toolkit for fast Deep Learning experiments in Computer Vision

Dec 6, 2022
Deep Learning for Computer Vision final project

Deep Learning for Computer Vision final project

Nov 30, 2021
Monk is a low code Deep Learning tool and a unified wrapper for Computer Vision.
Monk is a low code Deep Learning tool and a unified wrapper for Computer Vision.

Monk - A computer vision toolkit for everyone Why use Monk Issue: Want to begin learning computer vision Solution: Start with Monk's hands-on study ro

Nov 26, 2022
Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks
Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks

Amazon Forest Computer Vision Satellite Image tagging code using PyTorch / Keras Here is a sample of images we had to work with Source: https://www.ka

Nov 7, 2022
Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks
Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks

Amazon Forest Computer Vision Satellite Image tagging code using PyTorch / Keras Here is a sample of images we had to work with Source: https://www.ka

Nov 7, 2022
A framework for analyzing computer vision models with simulated data

3DB: A framework for analyzing computer vision models with simulated data Paper Quickstart guide Blog post Installation Follow instructions on: https:

Nov 26, 2022
QTool: A Low-bit Quantization Toolbox for Deep Neural Networks in Computer Vision

This project provides abundant choices of quantization strategies (such as the quantization algorithms, training schedules and empirical tricks) for quantizing the deep neural networks into low-bit counterparts.

Aug 22, 2022
LeafSnap replicated using deep neural networks to test accuracy compared to traditional computer vision methods.

Deep-Leafsnap Convolutional Neural Networks have become largely popular in image tasks such as image classification recently largely due to to Krizhev

Nov 27, 2022
Computer vision - fun segmentation experience using classic and deep tools :)
Computer vision - fun segmentation experience using classic and deep tools :)

Computer_Vision_Segmentation_Fun Segmentation of Images and Video. Tools: pytorch Models: Classic model - GrabCut Deep model - Deeplabv3_resnet101 Flo

Dec 18, 2021
It's final year project of Diploma Engineering. This project is based on Computer Vision.

Face-Recognition-Based-Attendance-System It's final year project of Diploma Engineering. This project is based on Computer Vision. Brief idea about ou

Nov 2, 2022
A neuroanatomy-based augmented reality experience powered by computer vision. Features 3D visuals of the Atlas Brain Map slices.

Brain Augmented Reality (AR) A neuroanatomy-based augmented reality experience powered by computer vision that features 3D visuals of the Atlas Brain

Oct 6, 2022
Mar 24, 2022
Build fully-functioning computer vision models with PyTorch
Build fully-functioning computer vision models with PyTorch

Detecto is a Python package that allows you to build fully-functioning computer vision and object detection models with just 5 lines of code. Inferenc

Dec 5, 2022
Open Source Differentiable Computer Vision Library for PyTorch
Open Source Differentiable Computer Vision Library for PyTorch

Kornia is a differentiable computer vision library for PyTorch. It consists of a set of routines and differentiable modules to solve generic computer

Dec 2, 2022
Pytorch implementation of the DeepDream computer vision algorithm
Pytorch implementation of the DeepDream computer vision algorithm

deep-dream-in-pytorch Pytorch (https://github.com/pytorch/pytorch) implementation of the deep dream (https://en.wikipedia.org/wiki/DeepDream) computer

Dec 5, 2022