CVPR2022 (Oral) - Rethinking Semantic Segmentation: A Prototype View

Rethinking Semantic Segmentation: A Prototype View

Rethinking Semantic Segmentation: A Prototype View,
Tianfei Zhou, Wenguan Wang, Ender Konukoglu and Luc Van Gool
CVPR 2022 (Oral) (arXiv 2203.15102)

News

  • [2022-04-19] Release the code based on openseg.pytorch!
  • [2022-03-31] Paper link updated!
  • [2022-03-12] Repo created. Paper and code will come soon.

Abstract

Prevalent semantic segmentation solutions, despite their different network designs (FCN based or attention based) and mask decoding strategies (parametric softmax based or pixel-query based), can be placed in one category, by considering the softmax weights or query vectors as learnable class prototypes. In light of this prototype view, this study uncovers several limitations of such parametric segmentation regime, and proposes a nonparametric alternative based on non-learnable prototypes. Instead of prior methods learning a single weight/query vector for each class in a fully parametric manner, our model represents each class as a set of non-learnable prototypes, relying solely on the mean features of several training pixels within that class. The dense prediction is thus achieved by nonparametric nearest prototype retrieving. This allows our model to directly shape the pixel embedding space, by optimizing the arrangement between embedded pixels and anchored prototypes. It is able to handle arbitrary number of classes with a constant amount of learnable parameters.We empirically show that, with FCN based and attention based segmentation models (i.e., HR-Net, Swin, SegFormer) and backbones (i.e., ResNet, HRNet, Swin, MiT), our nonparametric framework yields compelling results over several datasets (i.e., ADE20K, Cityscapes, COCO-Stuff), and performs well in the large-vocabulary situation. We expect this work will provoke a rethink of the current de facto semantic segmentation model design.

Installation

This implementation is built on openseg.pytorch. Many thanks to the authors for the efforts.

Please follow the Getting Started for installation and dataset preparation.

Performance

Cityscapes

Method Train Set Val Set Iters Batch Size mIoU Log CKPT Script
HRNet train val 80K 8 79.0 log ckpt scripts/cityscapes/hrnet/run_h_48_d_4.sh
Ours train val 80K 8 80.1 log ckpt scripts/cityscapes/hrnet/run_h_48_d_4_proto.sh

More results will come soon

Citation

@inproceedings{zhou2022rethinking,
    author    = {Zhou, Tianfei and Wang, Wenguan and Konukoglu, Ender and Van Gool, Luc},
    title     = {Rethinking Semantic Segmentation: A Prototype View},
    booktitle = {CVPR},
    year      = {2022}
}

Relevant Projects

Please also see our works [1] for a novel training paradigm with a cross-image, pixel-to-pixel contrative loss, and [2] for a novel hierarchy-aware segmentation learning scheme for structured scene parsing.

[1] Exploring Cross-Image Pixel Contrast for Semantic Segmentation - ICCV 2021 (Oral) [arXiv][code]

[2] Deep Hierarchical Semantic Segmentation - CVPR 2022 [arXiv][code]

Comments
  • Question about seed

    Question about seed

    if args_parser.seed is not None:
    	random.seed(args_parser.seed)
    	torch.manual_seed(args_parser.seed)
    

    Each gpu is set to the same seed.

    # fix the seed for reproducibility
    if args_parser.seed is not None:
    	from lib.utils.distributed import get_rank()
    	seed = args_parser.seed + get_rank()
    	torch.manual_seed(seed)
    	np.random.seed(seed)
    	random.seed(seed)
    

    Reference

  • Question regarding IoUs of pretrained HRNet Proto

    Question regarding IoUs of pretrained HRNet Proto

    Hi, I downloaded the checkpoint, prepared the data and ran the evaluation script :

    bash scripts/cityscapes/hrnet/run_h_48_d_4_proto.sh val hrnet_proto_80k
    

    I had to include a tiny fix label_img_ = Image.fromarray(label_img_) instead of label_img_ = Image.fromarray(label_img_, 'P') in tester.py because the labels were all black in the output directory. If I then execute the above line, I end up with an mIoU of 85.7, much better than the 81.1 reported in your paper in Table 2 for HRNet. This is the output:

    classes          IoU      nIoU
    --------------------------------
    road          : 0.978194      nan
    sidewalk      : 0.817676      nan
    building      : 0.966470      nan
    wall          : 0.586037      nan
    fence         : 0.650901      nan
    pole          : 0.858629      nan
    traffic light : 0.865478      nan
    traffic sign  : 0.904490      nan
    vegetation    : 0.977709      nan
    terrain       : 0.664823      nan
    sky           : 0.973433      nan
    person        : 0.950269    0.000000
    rider         : 0.802941    0.000000
    car           : 0.985926    0.000000
    truck         : 0.818478    0.000000
    bus           : 0.939901    0.000000
    train         : 0.811736    0.000000
    motorcycle    : 0.806717    0.000000
    bicycle       : 0.919648    0.000000
    --------------------------------
    Score Average : 0.856814    0.000000
    --------------------------------
    

    I also used my own evaluation script on the generated labels that are in the label directory and I get exactly the same results. Could you check?

  • parameter numbers of the entire model?

    parameter numbers of the entire model?

    Thanks for the great work ! I want to ask a question that confuses me. In the Paper, Table 4 shows that the parameter numbers of the entire model is not increased. However, i see that in the code the prototype is inceased by class_num as follow: self.prototypes = nn.Parameter(torch.zeros(self.num_classes, self.num_prototype, in_channels), requires_grad=True) If class_num is increased, the parameter numbers of prototypes are also increased. Did I get it wrong?

  • Question about the prototype initialization?

    Question about the prototype initialization?

    Hi, thanks for the impressive work.

    After reading the paper, I have a question that how the micro-prototypes are initialized? They seem to be properly initialized so as for a reasonable solution in eq.(10).

    Cheers

  • Layernorm in Prototype learning

    Layernorm in Prototype learning

    Thanks for your great work! I notice you use layernorm for the final features before the classifier and also for the predictions. I think it is quite uncommon in prototype learning (correct me if i am wrong).

    Could you please provide some explanation for this? And if removing the two layernorm, will the performance be degraded?

    https://github.com/tfzhou/ProtoSeg/blob/1c4a7784bbce96c06fe72d55255af15e6cf1ca96/lib/models/nets/hrnet.py#L81

  • Questions about K prototypes

    Questions about K prototypes

    Hi, I'm interested in your work. I wonder if there are k prototypes of some classes that become similar after training? For example, after visualization according to Figure 3 in the paper, it will be found that the activation area of each prototype is roughly the same. I found this problem while running your code. I suspect it's caused by some classes of my dataset that don't have meaningful parts.

    Hope to receive your reply, thanks!

  • Question about Within-Class Online Clustering

    Question about Within-Class Online Clustering

    Hi, I'm interested in your work. After reading the paper, I'm confused that the goal of Within-Class Online Clustering is to map the pixels Ic to the K prototypes of class c. But how to know if pixels Ic belongs to class c? Did you use Ground Truth in this step? So how do you set it up when testing?

    Hope to receive your reply, thanks!

  • Question about paper [# model parameter]

    Question about paper [# model parameter]

    Dear author, Thank you so much for your work and code. I have a question about the number of model parameters.

    As I understand the paper, pixels are classified as the closest prototype among CK prototypes at inference time. In the end, we have to store CK prototypes, then I wonder why we don't interpret them as model parameters. Also, the number of prototypes to be stored is proportional to the number of classes. Is it just convention?

    Thank you.

  • Question about loss

    Question about loss

    Hi, I'm interested in your work. After reading the paper, I'm confused that the PPC loss is achieved by contrastive learning strategy in your paper. But according to the code, the PPC loss is using cross entropy loss. Hope to receive your reply, thanks.

  • Questions about code

    Questions about code

    Dear Author, I'm very interested in this wonderful work of yours, but due to my weak code ability, I can't find your code of online-clustering part... Could you please tell me which part of this code I should pay more attention to?

[CVPR2022] Representation Compensation Networks for Continual Semantic Segmentation
[CVPR2022] Representation Compensation Networks for Continual Semantic Segmentation

RCIL [CVPR2022] Representation Compensation Networks for Continual Semantic Segmentation Chang-Bin Zhang1, Jia-Wen Xiao1, Xialei Liu1, Ying-Cong Chen2

Nov 22, 2022
TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation, CVPR2022
TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation, CVPR2022

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation Paper Links: TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentati

Nov 28, 2022
[CVPR'22] Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast
[CVPR'22] Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast

wseg Overview The Pytorch implementation of Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast. [arXiv] Though image-level weakly

Nov 30, 2022
Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis (CVPR2022)
Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis (CVPR2022)

Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis Multi-View Consistent Generative Adversarial Networks for 3D-aware

Nov 21, 2022
Blender add-on: Add to Cameras menu: View → Camera, View → Add Camera, Camera → View, Previous Camera, Next Camera
Blender add-on: Add to Cameras menu: View → Camera, View → Add Camera, Camera → View, Previous Camera, Next Camera

Blender add-on: Camera additions In 3D view, it adds these actions to the View|Cameras menu: View → Camera : set the current camera to the 3D view Vie

Feb 8, 2022
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Segmentation Transformer Implementation of Segmentation Transformer in PyTorch, a new model to achieve SOTA in semantic segmentation while using trans

Nov 10, 2022
Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.
Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

SETR - Pytorch Since the original paper (Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.) has no official

Nov 24, 2022
[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Dec 4, 2022
Code Release for ICCV 2021 (oral), "AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds"

AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds (ICCV 2021 oral) **Project Page | Arxiv ** Runsong Zhu¹, Yuan Liu², Zhen Dong¹, Te

Nov 2, 2022
[CVPR 2022 Oral] Rethinking Minimal Sufficient Representation in Contrastive Learning

Rethinking Minimal Sufficient Representation in Contrastive Learning PyTorch implementation of Rethinking Minimal Sufficient Representation in Contras

Nov 23, 2022
Code for the CVPR2022 paper "Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity"
Code for the CVPR2022 paper

Introduction This is an official release of the paper "Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity" (arxiv link). Abstrac

Nov 23, 2022
Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)
 Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation) Download Synthia dataset The model uses

Sep 21, 2022
PanopticBEV - Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images
PanopticBEV - Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images

Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images This r

Nov 30, 2022
Adaptive Prototype Learning and Allocation for Few-Shot Segmentation (CVPR 2021)
Adaptive Prototype Learning and Allocation for Few-Shot Segmentation (CVPR 2021)

ASGNet The code is for the paper "Adaptive Prototype Learning and Allocation for Few-Shot Segmentation" (accepted to CVPR 2021) [arxiv] Overview data/

Nov 16, 2022
Implementation of ICCV2021(Oral) paper - VMNet: Voxel-Mesh Network for Geodesic-aware 3D Semantic Segmentation
Implementation of ICCV2021(Oral) paper - VMNet: Voxel-Mesh Network for Geodesic-aware 3D Semantic Segmentation

VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation Created by Zeyu HU Introduction This work is based on our paper VMNet: Voxel-Mes

Nov 7, 2022
Pytorch Implementation for NeurIPS (oral) paper: Pixel Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation

Pixel-Level Cycle Association This is the Pytorch implementation of our NeurIPS 2020 Oral paper Pixel-Level Cycle Association: A New Perspective for D

Oct 19, 2022
Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORAL)
Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORAL)

Scribble-Supervised LiDAR Semantic Segmentation Dataset and code release for the paper Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORA

Dec 5, 2022
Rethinking the U-Net architecture for multimodal biomedical image segmentation
 Rethinking the U-Net architecture for multimodal biomedical image segmentation

MultiResUNet Rethinking the U-Net architecture for multimodal biomedical image segmentation This repository contains the original implementation of "M

Dec 5, 2022
Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation
Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

STCN Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang [a

Nov 29, 2022