Pre-Trained Image Processing Transformer (IPT)

Pre-Trained Image Processing Transformer (IPT)

By Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu, Yiping Deng, Zhenhua Liu, Siwei Ma, Chunjing Xu, Chao Xu, Wen Gao. [arXiv]

We study the low-level computer vision task (such as denoising, super-resolution and deraining) and develop a new pre-trained model, namely, image processing transformer (IPT). We present to utilize the well-known ImageNet benchmark for generating a large amount of corrupted image pairs. The IPT model is trained on these images with multi-heads and multi-tails. The pre-trained model can therefore efficiently employed on desired task after fine-tuning. With only one pre-trained model, IPT outperforms the current state-of-the-art methods on various low-level benchmarks.

MindSpore Code

Requirements

  • python 3
  • pytorch == 1.4.0
  • torchvision

Dataset

The benchmark datasets can be downloaded as follows:

For super-resolution:

Set5, Set14, B100, Urban100.

For denoising:

CBSD68, Urban100.

For deraining:

Rain100L.

The result images are converted into YCbCr color space. The PSNR is evaluated on the Y channel only.

Script Description

This is the inference script of IPT, you can following steps to finish the test of image processing tasks, like SR, denoise and derain, via the corresponding pretrained models.

Script Parameter

For details about hyperparameters, see option.py.

Evaluation

Pretrained models

The pretrained models are available in google drive

Evaluation Process

Inference example: For SR x2,x3,x4:

python main.py --dir_data $DATA_PATH --pretrain $MODEL_PATH --data_test Set5+Set14+B100+Urban100 --scale $SCALE

For Denoise 30,50:

python main.py --dir_data $DATA_PATH --pretrain $MODEL_PATH --data_test CBSD68+Urban100 --scale 1 --denoise --sigma $NOISY_LEVEL

For derain:

python main.py --dir_data $DATA_PATH --pretrain $MODEL_PATH --scale 1 --derain

Results

  • Detailed results on image super-resolution task.
Method Scale Set5 Set14 B100 Urban100
VDSR X2 37.53 33.05 31.90 30.77
EDSR X2 38.11 33.92 32.32 32.93
RCAN X2 38.27 34.12 32.41 33.34
RDN X2 38.24 34.01 32.34 32.89
OISR-RK3 X2 38.21 33.94 32.36 33.03
RNAN X2 38.17 33.87 32.32 32.73
SAN X2 38.31 34.07 32.42 33.1
HAN X2 38.27 34.16 32.41 33.35
IGNN X2 38.24 34.07 32.41 33.23
IPT (ours) X2 38.37 34.43 32.48 33.76
Method Scale Set5 Set14 B100 Urban100
VDSR X3 33.67 29.78 28.83 27.14
EDSR X3 34.65 30.52 29.25 28.80
RCAN X3 34.74 30.65 29.32 29.09
RDN X3 34.71 30.57 29.26 28.80
OISR-RK3 X3 34.72 30.57 29.29 28.95
RNAN X3 34.66 30.52 29.26 28.75
SAN X3 34.75 30.59 29.33 28.93
HAN X3 34.75 30.67 29.32 29.10
IGNN X3 34.72 30.66 29.31 29.03
IPT (ours) X3 34.81 30.85 29.38 29.49
Method Scale Set5 Set14 B100 Urban100
VDSR X4 31.35 28.02 27.29 25.18
EDSR X4 32.46 28.80 27.71 26.64
RCAN X4 32.63 28.87 27.77 26.82
SAN X4 32.64 28.92 27.78 26.79
RDN X4 32.47 28.81 27.72 26.61
OISR-RK3 X4 32.53 28.86 27.75 26.79
RNAN X4 32.49 28.83 27.72 26.61
HAN X4 32.64 28.90 27.80 26.85
IGNN X4 32.57 28.85 27.77 26.84
IPT (ours) X4 32.64 29.01 27.82 27.26
  • Super-resolution result

  • Denoising result

  • Derain result

Citation

@misc{chen2020pre,
      title={Pre-Trained Image Processing Transformer}, 
      author={Chen, Hanting and Wang, Yunhe and Guo, Tianyu and Xu, Chang and Deng, Yiping and Liu, Zhenhua and Ma, Siwei and Xu, Chunjing and Xu, Chao and Gao, Wen},
      year={2021},
      eprint={2012.00364},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgement

Owner
HUAWEI Noah's Ark Lab
Working with and contributing to the open source community in data mining, artificial intelligence, and related fields.
HUAWEI Noah's Ark Lab
Comments
  • confused about the function `forward_chop`

    confused about the function `forward_chop`

    Thank you for providing the wonderful codes. However, I am confused about the function forward_chop in model/__init.py. It seems that this function unfolds the input image into several patches and then feeds those patches into the IPT model, but I didn't find any detailed explanation in the paper or as comments in the code. For example, what does shave mean here? If I want to unfold the input image into non-overlapped patches, how should I do it?

    Thank you.

  • When you released the code !

    When you released the code !

    Dear noah : Thank you very much for your work!I looked at your paper and was struck by the beauty of it! I want to know When you will released the code! THANK YOU!

  • 初学者复现出现 PSNR: nan (Best: nan @epoch 1)

    初学者复现出现 PSNR: nan (Best: nan @epoch 1)

    I input : python main.py --dir_data C:/Users/Lenovo/Desktop/Pretrained-IPT-main/data_test/benchmark/ --pretrain C:/Users/Lenovo/Desktop/Pretrained-IPT-main/pretrained_model/IPT_sr2.pt --data_test Set5+Set14+B100+Urban100 --scale 2

    出现了 Making model... Preparing loss function: 1.000 * L1

    Evaluation: 0it [00:00, ?it/s]

    [Set5 x2] PSNR: nan (Best: nan @epoch 1) 0it [00:00, ?it/s] 0it [00:12, ?it/s]

    [Set14 x2] PSNR: nan (Best: nan @epoch 1) 0it [00:00, ?it/s]

    [B100 x2] PSNR: nan (Best: nan @epoch 1) 0it [00:00, ?it/s]

    [Urban100 x2] PSNR: nan (Best: nan @epoch 1) Forward: 49.39s

  • For all tasks, PSNR is calculated only for y, rather than RGB.

    For all tasks, PSNR is calculated only for y, rather than RGB.

    Thank you for the nice work and for sharing the pre-trained model!

    I found an error in the implementation of PSNR calculation.

    "For all tasks, PSNR is calculated only for y, rather than RGB values."

    The below function has "y" as the default argument for cal_type, and for all tasks, no argument for cal_type is given.

    https://github.com/huawei-noah/Pretrained-IPT/blob/a8f63ddda41498b38c5322be48308dc8d005e526/utility.py#L168-L184

    For example, this is the way to calculate PSNR for denoising task.

    https://github.com/huawei-noah/Pretrained-IPT/blob/a8f63ddda41498b38c5322be48308dc8d005e526/trainer.py#L63-L75

    If we use cal_type="y", it gives 32.415 dB for gaussian denoising (simga=30) with CBSDS68 dataset.

    But, if we give the other argument, it gives 30.748 dB which is much lower than the reported value in the paper.

    Furthermore, if we save the result and calculate PSNR using skimage library, it gives around 30.75 dB, too.

    Could you check the PSNR values?

  • Dataset links

    Dataset links

    Thank you for sharing this great repo!

    I found that some links about datasets are not valid now. And some links are directed to some project pages, instead of downloadable links?

    It would be great if you can update the links and let users directly download the datasets from those links. Thank you.

  • About training data, ImageNet

    About training data, ImageNet

    Hello,

    Thanks for your great work of pretraining in low-level vision.

    Could you please tell me which subset of ImageNet is used for pretraining, ImagNet-2012?

  • 不能使用--save_results 命令

    不能使用--save_results 命令

    当使用save_result命令会出现错误 Evaluation: Traceback (most recent call last): File "main.py", line 37, in main() File "main.py", line 33, in main t.test() File "C:\Users\Lenovo\Desktop\Pretrained-IPT-main\trainer.py", line 34, in test if self.args.save_results: self.ckp.begin_background() File "C:\Users\Lenovo\Desktop\Pretrained-IPT-main\utility.py", line 144, in begin_background for p in self.process: p.start() File "E:\python3.7\lib\multiprocessing\process.py", line 112, in start self._popen = self._Popen(self) File "E:\python3.7\lib\multiprocessing\context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "E:\python3.7\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj) File "E:\python3.7\lib\multiprocessing\popen_spawn_win32.py", line 65, in init reduction.dump(process_obj, to_child) File "E:\python3.7\lib\multiprocessing\reduction.py", line 59, in dump ForkingPickler(file, protocol).dump(obj)

    AttributeError: Can't pickle local object 'checkpoint.begin_background..bg_target'

    C:\Users\Lenovo\Desktop\Pretrained-IPT-main>WARNING: Ignoring invalid distribution -ip (e:\python3.7\lib\site-packages) Traceback (most recent call last): File "", line 1, in File "E:\python3.7\lib\multiprocessing\spawn.py", line 105, in spawn_main exitcode = _main(fd) File "E:\python3.7\lib\multiprocessing\spawn.py", line 115, in _main self = reduction.pickle.load(from_parent)

    EOFError: Ran out of input

  • RuntimeError: CUDA out of memory. Tried to allocate 118.00 MiB (GPU 0; 2.00 GiB total capacity; 1.19 GiB already allocated; 0 bytes free; 1.27 GiB reserved  in total by PyTorch)

    RuntimeError: CUDA out of memory. Tried to allocate 118.00 MiB (GPU 0; 2.00 GiB total capacity; 1.19 GiB already allocated; 0 bytes free; 1.27 GiB reserved in total by PyTorch)

    输入python main.py --dir_data test_data --pretrain model/IPT_sr2.pt --data_test Set5 --scale 1 出现了 Making model... Preparing loss function: 1.000 * L1 Evaluation: 0%| | 0/5 [00:00<?, ?it/s] RuntimeError: CUDA out of memory. Tried to allocate 118.00 MiB (GPU 0; 2.00 GiB total capacity; 1.19 GiB already allocated; 0 bytes free; 1.27 GiB reserved in total by PyTorch) 但是只有一个显卡,该怎么处理呢

  • what size of input image are the 33G FLOPs calculated

    what size of input image are the 33G FLOPs calculated

    Hi: Your work is really amazing,and I have some question after reading the paper: You mentioned in the article, "The whole IPT has 114M parameters and 33G FLOPs". What size of input image are the 33G FLOPs calculated here?

  • 测试sr任务问题

    测试sr任务问题

    在测试sr x2任务时我将scale设置为2,PSNR结果是正确的 但我测试sr x3时,将scale 设置为3的时候出现了 RuntimeError: Error(s) in loading state_dict for ipt: size mismatch for tail.0.0.0.weight: copying a param with shape torch.Size([256, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([576, 64, 3, 3]). size mismatch for tail.0.0.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([576]). 测试sr x4 时,将scale 设置为4,但是PSNR仅为12.448

    这是为什么呢

  • 测试sr任务问题

    测试sr任务问题

    在测试sr x2任务时我将scale设置为2,PSNR结果是正确的 但我测试sr x3时,将scale 设置为3的时候出现了 RuntimeError: Error(s) in loading state_dict for ipt: size mismatch for tail.0.0.0.weight: copying a param with shape torch.Size([256, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([576, 64, 3, 3]). size mismatch for tail.0.0.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([576]). 测试sr x4 时,将scale 设置为4,但是PSNR仅为12.448

    这是为什么呢

  • The code may not be consistent with the formula

    The code may not be consistent with the formula

    In the paper, the description of the encoder in Chapter 3.1. IPT architecture is:

    $y_0 = [ E_{p1} + f_{p1} , E_{p2} + f_{p2} , \dots, E_{pN} + f_{pN} ],$ $q_i = k_i = v_i = LN(y_{i-1}),$ $y_i^{\prime} = MSA(q_i, k_i, v_i) + y_{i-1},$ $\cdots$

    however, in the class TransformerEncoderLayer of the python file model/ipt.py, the code writes:

    class TransformerEncoderLayer(nn.Module):
    
        def __init__(self, d_model, nhead, dim_feedforward=2048, dropout=0.1, no_norm = False,
                     activation="relu"):
            ...
    
        def with_pos_embed(self, tensor, pos):
            return tensor if pos is None else tensor + pos
        
        def forward(self, src, pos = None):
            src2 = self.norm1(src)                                         # here
            q = k = self.with_pos_embed(src2, pos)             # here
            src2 = self.self_attn(q, k, src2)                            # here
            src = src + self.dropout1(src2[0])
            src2 = self.norm2(src)
            src2 = self.linear2(self.dropout(self.activation(self.linear1(src2))))
            src = src + self.dropout2(src2)
            return src
    

    It's more likely the formula should be: $v_i=LN([ f_{p1} , f_{p2} , \dots, f_{pN} ]),$ $y_0 = [ E_{p1} + f_{p1} , E_{p2} + f_{p2} , \dots, E_{pN} + f_{pN} ],$ $q_i = k_i = LN(y_{i-1}),$

    , OR may they are equivalent ? I'm confused. Thanks.

  • 训练过程

    训练过程

    第一阶段: Pre-training 第二阶段: Finetuning on the specific task

    但在第一阶段时是要训练multi-heads,multi-tails; 训练时一个batch 只是随机选一种task 的pair 数据送入到model中,利用反向传播来更新相应的head,tail,和body;其中是不是需要设置,在训练A task时,其他 task 所对应的heads,tails是保持不变的(不会被更新的)

    第二阶段:只保留相应的task的head 和tail,其他的heads和tails是直接丢弃的

    这个过程想确认一下

Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

t5-japanese Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts. The following is a list of models that

Dec 13, 2021
A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.
A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.

Awesome Pretrained StyleGAN2 A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution. Note the readme is a

Nov 29, 2022
This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).
This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

Dynamic-Vision-Transformer (Pytorch) This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT). Not All Ima

Oct 22, 2022
🐥A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI
🐥A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI

PyTorch implementation of OpenAI's Finetuned Transformer Language Model This is a PyTorch implementation of the TensorFlow code provided with OpenAI's

Nov 29, 2022
CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

CPT This repository contains code and checkpoints for CPT. CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Gener

Nov 30, 2022
PyTorch implementation of the Transformer in Post-LN (Post-LayerNorm) and Pre-LN (Pre-LayerNorm).
PyTorch implementation of the Transformer in Post-LN (Post-LayerNorm) and Pre-LN (Pre-LayerNorm).

Transformer-PyTorch A PyTorch implementation of the Transformer from the paper Attention is All You Need in both Post-LN (Post-LayerNorm) and Pre-LN (

Feb 27, 2022
Pre-trained model, code, and materials from the paper "Impact of Adversarial Examples on Deep Learning Models for Biomedical Image Segmentation" (MICCAI 2019).
Pre-trained model, code, and materials from the paper

Adaptive Segmentation Mask Attack This repository contains the implementation of the Adaptive Segmentation Mask Attack (ASMA), a targeted adversarial

Jul 4, 2022
PyTorch implementation of CVPR 2020 paper (Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence) and pre-trained model on ImageNet dataset

Reference-Based-Sketch-Image-Colorization-ImageNet This is a PyTorch implementation of CVPR 2020 paper (Reference-Based Sketch Image Colorization usin

Jul 28, 2022
CLIP (Contrastive Language–Image Pre-training) trained on Indonesian data

CLIP-Indonesian CLIP (Radford et al., 2021) is a multimodal model that can connect images and text by training a vision encoder and a text encoder joi

Mar 10, 2022
Industrial Image Anomaly Localization Based on Gaussian Clustering of Pre-trained Feature
Industrial Image Anomaly Localization Based on Gaussian Clustering of Pre-trained Feature

Industrial Image Anomaly Localization Based on Gaussian Clustering of Pre-trained Feature Q. Wan, L. Gao, X. Li and L. Wen, "Industrial Image Anomaly

Sep 4, 2022
Annotate datasets with a semi-trained or fully trained YOLOv5 model

YOLOv5 Auto Annotator Annotate datasets with a semi-trained or fully trained YOLOv5 model Prerequisites Ubuntu >=20.04 Python >=3.7 System dependencie

May 14, 2022
Pre-trained NFNets with 99% of the accuracy of the official paper

NFNet Pytorch Implementation This repo contains pretrained NFNet models F0-F6 with high ImageNet accuracy from the paper High-Performance Large-Scale

Nov 24, 2022
Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models
Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models

merged_depth runs (1) AdaBins, (2) DiverseDepth, (3) MiDaS, (4) SGDepth, and (5) Monodepth2, and calculates a weighted-average per-pixel absolute dept

Nov 21, 2022
source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT
source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT

LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval This repository contains source code and pre-trained/fine-tun

Nov 14, 2022
Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

InversePrompting Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting Code: The code is provided in the "chinese_ip"

Dec 4, 2022
Official repository for "PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long Text Generation"

pair-emnlp2020 Official repository for the paper: Xinyu Hua and Lu Wang: PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long

Oct 13, 2022
《K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters》(2020)

K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters This repository is the implementation of the paper "K-Adapter: Infusing Knowledge

Dec 2, 2022
Pytorch implementation of our paper under review — Lottery Jackpots Exist in Pre-trained Models

Lottery Jackpots Exist in Pre-trained Models (Paper Link) Requirements Python >= 3.7.4 Pytorch >= 1.6.1 Torchvision >= 0.4.1 Reproduce the Experiment

Jun 28, 2022
Chinese clinical named entity recognition using pre-trained BERT model

Chinese clinical named entity recognition (CNER) using pre-trained BERT model Introduction Code for paper Chinese clinical named entity recognition wi

Nov 23, 2022