U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection

U2-Net: U Square Net

The official repo for our paper U2-Net(U square net) published in Pattern Recognition 2020:

U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection

Xuebin Qin, Zichen Zhang, Chenyang Huang, Masood Dehghan, Osmar R. Zaiane and Martin Jagersand

Contact: xuebin[at]ualberta[dot]ca

Updates !!!

(2021-May-5) Thank AK391 for sharing his Gradio Web Demo of U2-Net.

gradio_web_demo

(2021-Apr-29) Thanks Jonathan Benavides Vallejo for releasing his App LensOCR: Extract Text & Image, which uses U2-Net for extracting the image foreground.

LensOCR APP

(2021-Apr-18) Thanks Andrea Scuderi for releasing his App Clipping Camera, which is an U2-Net driven realtime camera app and "is able to detect relevant object from the scene and clip them to apply fancy filters".

Clipping Camera APP

(2021-Mar-17) Dennis Bappert re-trained the U2-Net model for human portrait matting. The results look very promising and he also provided the details of the training process and data generation(and augmentation) strategy, which are inspiring.

(2021-Mar-11) Dr. Tim developed a video version rembg for removing video backgrounds using U2-Net. The awesome demo results can be found on YouTube.

(2021-Mar-02) We found some other interesting applications of our U2-Net including MOJO CUT, Real-Time Background Removal on Iphone, Video Background Removal, Another Online Portrait Generation Demo on AWS, AI Scissor.

(2021-Feb-15) We just released an online demo http://profu.ai for the portrait generation. Please feel free to give it a try and provide any suggestions or comments.
Profuai

(2021-Feb-06) Recently, some people asked the problem of using U2-Net for human segmentation, so we trained another example model for human segemntation based on Supervisely Person Dataset.

(1) To run the human segmentation model, please first downlowd the u2net_human_seg.pth model weights into ./saved_models/u2net_human_seg/.
(2) Prepare the to-be-segmented images into the corresponding directory, e.g. ./test_data/test_human_images/.
(3) Run the inference by command: python u2net_human_seg_test.py and the results will be output into the corresponding dirctory, e.g. ./test_data/u2net_test_human_images_results/
Notes: Due to the labeling accuracy of the Supervisely Person Dataset, the human segmentation model (u2net_human_seg.pth) here won't give you hair-level accuracy. But it should be more robust than u2net trained with DUTS-TR dataset on general human segmentation task. It can be used for human portrait segmentation, human body segmentation, etc.

Human Image Segmentation
Human Video Human Video Results

(2020-Dec-28) Some interesting applications and useful tools based on U2-Net:
(1) Xiaolong Liu developed several very interesting applications based on U2-Net including Human Portrait Drawing(As far as I know, Xiaolong is the first one who uses U2-Net for portrait generation), image matting and so on.
(2) Vladimir Seregin developed an interesting tool, NN based lineart, for comparing the portrait results of U2-Net and that of another popular model, ArtLine, developed by Vijish Madhavan.
(3) Daniel Gatis built a python tool, Rembg, for image backgrounds removal based on U2-Net. I think this tool will greatly facilitate the application of U2-Net in different fields.

(2020-Nov-21) Recently, we found an interesting application of U2-Net for human portrait drawing. Therefore, we trained another model for this task based on the APDrawingGAN dataset.

Sample Results: Kids

Sample Results: Ladies

Sample Results: Men

Usage for portrait generation

  1. Clone this repo to local
git clone https://github.com/NathanUA/U-2-Net.git
  1. Download the u2net_portrait.pth from GoogleDrive or Baidu Pan(提取码:chgd)model and put it into the directory: ./saved_models/u2net_portrait/.

  2. Run on the testing set.
    (1) Download the train and test set from APDrawingGAN. These images and their ground truth are stitched side-by-side (512x1024). You need to split each of these images into two 512x512 images and put them into ./test_data/test_portrait_images/portrait_im/. You can also download the split testing set on GoogleDrive.
    (2) Running the inference with command python u2net_portrait_test.py will ouptut the results into ./test_data/test_portrait_images/portrait_results.

  3. Run on your own dataset.
    (1) Prepare your images and put them into ./test_data/test_portrait_images/your_portrait_im/. To obtain enough details of the protrait, human head region in the input image should be close to or larger than 512x512. The head background should be relatively clear.
    (2) Run the prediction by command python u2net_portrait_demo.py will outputs the results to ./test_data/test_portrait_images/your_portrait_results/.
    (3) The difference between python u2net_portrait_demo.py and python u2net_portrait_test.py is that we added a simple face detection step before the portrait generation in u2net_portrait_demo.py. Because the testing set of APDrawingGAN are normalized and cropped to 512x512 for including only heads of humans, while our own dataset may varies with different resolutions and contents. Therefore, the code python u2net_portrait_demo.py will detect the biggest face from the given image and then crop, pad and resize the ROI to 512x512 for feeding to the network. The following figure shows how to take your own photos for generating high quality portraits.

(2020-Sep-13) Our U2-Net based model is the 6th in MICCAI 2020 Thyroid Nodule Segmentation Challenge.

(2020-May-18) The official paper of our U2-Net (U square net) (PDF in elsevier(free until July 5 2020), PDF in arxiv) is now available. If you are not able to access that, please feel free to drop me an email.

(2020-May-16) We fixed the upsampling issue of the network. Now, the model should be able to handle arbitrary input size. (Tips: This modification is to facilitate the retraining of U2-Net on your own datasets. When using our pre-trained model on SOD datasets, please keep the input size as 320x320 to guarantee the performance.)

(2020-May-16) We highly appreciate Cyril Diagne for building this fantastic AR project: AR Copy and Paste using our U2-Net (Qin et al, PR 2020) and BASNet(Qin et al, CVPR 2019). The demo video in twitter has achieved over 5M views, which is phenomenal and shows us more application possibilities of SOD.

U2-Net Results (176.3 MB)

U<sup>2</sup>-Net Results

Our previous work: BASNet (CVPR 2019)

Required libraries

Python 3.6
numpy 1.15.2
scikit-image 0.14.0
python-opencv PIL 5.2.0
PyTorch 0.4.0
torchvision 0.2.1
glob

Usage for salient object detection

  1. Clone this repo
git clone https://github.com/NathanUA/U-2-Net.git
  1. Download the pre-trained model u2net.pth (176.3 MB) from GoogleDrive or Baidu Pan 提取码: pf9k or u2netp.pth (4.7 MB) from GoogleDrive or Baidu Pan 提取码: 8xsi and put it into the dirctory './saved_models/u2net/' and './saved_models/u2netp/'

  2. Cd to the directory 'U-2-Net', run the train or inference process by command: python u2net_train.py or python u2net_test.py respectively. The 'model_name' in both files can be changed to 'u2net' or 'u2netp' for using different models.

We also provide the predicted saliency maps (u2net results,u2netp results) for datasets SOD, ECSSD, DUT-OMRON, PASCAL-S, HKU-IS and DUTS-TE.

U2-Net Architecture

U<sup>2</sup>-Net architecture

Quantitative Comparison

Quantitative Comparison

Quantitative Comparison

Qualitative Comparison

Qualitative Comparison

Citation

@InProceedings{Qin_2020_PR,
title = {U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection},
author = {Qin, Xuebin and Zhang, Zichen and Huang, Chenyang and Dehghan, Masood and Zaiane, Osmar and Jagersand, Martin},
journal = {Pattern Recognition},
volume = {106},
pages = {107404},
year = {2020}
}
Owner
Xuebin Qin
Postdoctoral Fellow at University of Alberta Canada, Studying on object detection, segmentation, visual tracking, etc.
Xuebin Qin
Comments
  • RuntimeError: expected dtype Half but got dtype Long

    RuntimeError: expected dtype Half but got dtype Long

    I am trying to use this model for binary segmentation.

    When i pass the Mask as Tensor to muti_bce_loss_fusion I get this error:

        547 def muti_bce_loss_fusion(d0, d1, d2, d3, d4, d5, d6, labels_v):
        548     print(d0.shape)
    --> 549     loss0 = bce_loss(d0, labels_v)
        550     loss1 = bce_loss(d1, labels_v)
        551     loss2 = bce_loss(d2, labels_v)
    
    ~/anaconda3/envs/seg/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
        556             result = self._slow_forward(*input, **kwargs)
        557         else:
    --> 558             result = self.forward(*input, **kwargs)
        559         for hook in self._forward_hooks.values():
        560             hook_result = hook(self, input, result)
    
    ~/anaconda3/envs/seg/lib/python3.7/site-packages/torch/nn/modules/loss.py in forward(self, input, target)
        518 
        519     def forward(self, input, target):
    --> 520         return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)
        521 
        522 
    
    ~/anaconda3/envs/seg/lib/python3.7/site-packages/torch/nn/functional.py in binary_cross_entropy(input, target, weight, size_average, reduce, reduction)
       2415 
       2416     return torch._C._nn.binary_cross_entropy(
    -> 2417         input, target, weight, reduction_enum)
       2418 
       2419 
    
    RuntimeError: expected dtype Half but got dtype Long
    

    What is the format of the output of the model? What is the expected format for the label?

    How could I use this model for binary segmentation?

  • Differences between original model and CoreML model

    Differences between original model and CoreML model

    Hi!

    Thank you for your great work with u2net, its awesome!

    I have converted the original u2netp model to CoreML to use as .mlmodel file in iOS. It works great but when I test some images I see some differences from executing the original model in python:

    Original image:

    0riginal

    Original model:

    python

    iOS model:

    iphone

    I have tested both models with friends pics and sometimes the difference is huge: python model detects all the body but iOS detects only the face.

    Do you know what is happening? Thank you!

  • This repo is amazing!

    This repo is amazing!

    I am in the works of doing a web site that has this running in AWS lambda. Front is really simple with Vue.js, upload image --> send to backend --> get it back after this transformation . I have it 100% working locally with Flask and most of the problems with AWS Lambda are solved.

    I will post here a link to the website once I am ready (Hopefully this week)

    image

  • Results without fringe

    Results without fringe

    Hi @NathanUA,

    I have a library that makes use of your model.

    @alfonsmartinez opened an issue about the model result, please, take a look at here: https://github.com/danielgatis/rembg/issues/14

    Can you figure out how I can achieve this result without the black fringe?

    thanks.

  • How to increase model capacity for training on a larger dataset?

    How to increase model capacity for training on a larger dataset?

    First of all thanks for the amazing work on U-2-net. Now i am trying to train the model from scratch on my own dataset of 60k images which is larger than your dataset. I would like to know how i can increase the model capacity to be able to train on such a dataset.

    I have considered replacing the standard rebnconv blocks with residuals as suggested in another issue. What other options i could try? I understand that i need to make the architecture deeper, does this mean that i should make RSU-8 or RSU-9 blocks by adding more convolution layers?

  • input size and crop

    input size and crop

    Thanks a lot for you awesome performing model! I'm wondering about scaling and random crop, for training you first scale and then crop to 288x288 and thus the tensor has this size (288), what role does then scaling play here and why you talk about 320x320 as input size instead of 288x288?

    RescaleT(320), 
    RandomCrop(288),
    
    

    With your latest model update, upscaling supports different ratios, as it looks like for me, or is only squarish input supported or e.g. 640x480 as well?

  • The reproduction results can not meet the paper standard

    The reproduction results can not meet the paper standard

    According to the requirements of the paper, I flip the DUTS-TR data set offline horizontally, and then train. The code was not modified. epoch_num=450,The number of iterations is 600k to 800K,to test the DUTS-TE test set. However, using your evaluation code, the results Mae and MAXF are not as good as those in the paper.

    What's my problem?

  • Reproducing results

    Reproducing results

    I am trying to reproduce your results but having some uninspiring signs at the start. I start with your model with all the settings as you stated in the paper except that I use 15% of the train data as validation every epoch and my batch size is 8. The validation loss after 50 epochs or so stops decreasing and there emerges a noticeable gap between train and validation. I trained for more 40 epochs but validation did not fall lower, it is almost twice of that of the train loss.

    The model seems to be overfitting to me. A lower batch-size than yours should cause more regularisation so that should not be the issue.

    Can you please give me some advice on how to interpret this and if I should keep going? I know i am not using 100% of data like you but 85% should be suboptimial but similar. Can you share your training curves or anything as such?

  • False Result in Prediction

    False Result in Prediction

    @xuebinqin Thanks for nice work . I have trained the u2net model on the custom created person data set . It works perfectly on the images which have person inside it. But it also mask, some parts of those images which have only background , person does not exist inside it . How I can tackle the false result of prediction . I have more than 5k person images with variety of background in the training set . Should I need to add simple background(without person inside it) images into train set. Please suggest me some way to remove false results . thanks

  •  请问MICCAI 2020 Thyroid Nodule Segmentation Challenge比赛的方案方便开源吗?

    请问MICCAI 2020 Thyroid Nodule Segmentation Challenge比赛的方案方便开源吗?

    首先感谢大佬对u^2net的开源! 经过一段时间的使用,我发现u^2net小目标的分割效果没有大目标的好,请问大佬有什么改善的思路吗? 同时想请问下大佬有考虑对MICCAI 2020 Thyroid Nodule Segmentation Challenge的比赛方案开源吗?现在比赛都是无脑堆efficientnet backbone,很高兴能看到有独立思考的U^2net取得好成绩,也想学习下大佬的思路对模型进行魔改! 谢谢大佬~

  • train and test on more than one images at once?

    train and test on more than one images at once?

    for i, data in enumerate(salobj_dataloader), tells us that model is being trained on one image at one point of time, how can we train on test on multiple images for removing background from a large number of images.

  • Bump pillow from 8.1.1 to 9.3.0

    Bump pillow from 8.1.1 to 9.3.0

    Bumps pillow from 8.1.1 to 9.3.0.

    Release notes

    Sourced from pillow's releases.

    9.3.0

    https://pillow.readthedocs.io/en/stable/releasenotes/9.3.0.html

    Changes

    ... (truncated)

    Changelog

    Sourced from pillow's changelog.

    9.3.0 (2022-10-29)

    • Limit SAMPLESPERPIXEL to avoid runtime DOS #6700 [wiredfool]

    • Initialize libtiff buffer when saving #6699 [radarhere]

    • Inline fname2char to fix memory leak #6329 [nulano]

    • Fix memory leaks related to text features #6330 [nulano]

    • Use double quotes for version check on old CPython on Windows #6695 [hugovk]

    • Remove backup implementation of Round for Windows platforms #6693 [cgohlke]

    • Fixed set_variation_by_name offset #6445 [radarhere]

    • Fix malloc in _imagingft.c:font_setvaraxes #6690 [cgohlke]

    • Release Python GIL when converting images using matrix operations #6418 [hmaarrfk]

    • Added ExifTags enums #6630 [radarhere]

    • Do not modify previous frame when calculating delta in PNG #6683 [radarhere]

    • Added support for reading BMP images with RLE4 compression #6674 [npjg, radarhere]

    • Decode JPEG compressed BLP1 data in original mode #6678 [radarhere]

    • Added GPS TIFF tag info #6661 [radarhere]

    • Added conversion between RGB/RGBA/RGBX and LAB #6647 [radarhere]

    • Do not attempt normalization if mode is already normal #6644 [radarhere]

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

  • How to get the evaluation metrics value?

    How to get the evaluation metrics value?

    Hello, I'm doing research on salient object detection. However, I'm new to the field and deep learning in general. I trained u2net but I don't know how to plot the training curve, and also the values for the evaluation metrics (PR curve, F-measure, MAE, Sm, etc. Could you provide the code to plot the training curve and to get the value for the evaluation metrics, please? Thank you

  • TensorRT conversion produces different outputs

    TensorRT conversion produces different outputs

    Hi, Tried to convert the pretrained model to TensorRT with either torch/TensorRT and torch2trt from Nvidia. Despite using the same precision (FP32) the model outputs differ significantly on large images.

    Was it something that done successfully by anyone yet?

  • Image to image translation

    Image to image translation

    Awesome model. Surprised by its segmentation performance. Is it possible to do image-to-image translation with U-2-Net model? like pix2pix? it seems portrait drawing is what I am looking for, but couldn't find any img2img translation information.

  • Improving results of foreground segmentation by using U2net

    Improving results of foreground segmentation by using U2net

    Hey, Thanks for developing U-2-Net. It works super fast and pretty well. I am currently training it with 50k images ( DUTS- + COCO-dataset). My train loss is around 0.26 at the moment. Although the foreground objects get extracted efficiently. The results are far from perfect. Do you think I can increase quality by further training and reducing the train loss? How can I get perfect results, such as the people from remove.bg? I tried using CascadePSP as a post process step. The results are perfect, but CascadePSP is just way too slow. I need faster processing. I wonder how remove.bg can achieve perfect results with super quick processing. Are there other deep learning models that are faster than CascadePSP? Or is it possible to get better results from U-2-Net? Thank you so much for your insane amount of work and your amazing achievements. If you want to see some of my results, please check my post on stackoverflow. Thank you so much :)

The official repo of the CVPR 2021 paper Group Collaborative Learning for Co-Salient Object Detection .

GCoNet The official repo of the CVPR 2021 paper Group Collaborative Learning for Co-Salient Object Detection . Trained model Download final_gconet.pth

Nov 17, 2022
Code for ACM MM2021 paper "Complementary Trilateral Decoder for Fast and Accurate Salient Object Detection"

CTDNet The PyTorch code for ACM MM2021 paper "Complementary Trilateral Decoder for Fast and Accurate Salient Object Detection" Requirements Python 3.6

Oct 20, 2022
Code for the TIP 2021 Paper "Salient Object Detection with Purificatory Mechanism and Structural Similarity Loss"
Code for the TIP 2021 Paper

PurNet Project for the TIP 2021 Paper "Salient Object Detection with Purificatory Mechanism and Structural Similarity Loss" Abstract Image-based salie

Aug 25, 2022
Code for the ICME 2021 paper "Exploring Driving-Aware Salient Object Detection via Knowledge Transfer"

TSOD Code for the ICME 2021 paper "Exploring Driving-Aware Salient Object Detection via Knowledge Transfer" Usage For training, open train_test, run p

Dec 23, 2021
Official implementation of "Can You Spot the Chameleon? Adversarially Camouflaging Images from Co-Salient Object Detection" in CVPR 2022.

Jadena Official implementation of "Can You Spot the Chameleon? Adversarially Camouflaging Images from Co-Salient Object Detection" in CVPR 2022. arXiv

Nov 29, 2022
MogFace: Towards a Deeper Appreciation on Face Detection
 MogFace: Towards a Deeper Appreciation on Face Detection

MogFace: Towards a Deeper Appreciation on Face Detection Introduction In this repo, we propose a promising face detector, termed as MogFace. Our MogFa

Nov 23, 2022
《DeepViT: Towards Deeper Vision Transformer》(2021)
《DeepViT: Towards Deeper Vision Transformer》(2021)

DeepViT This repo is the official implementation of "DeepViT: Towards Deeper Vision Transformer". The repo is based on the timm library (https://githu

Dec 2, 2022
Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning

We challenge a common assumption underlying most supervised deep learning: that a model makes a prediction depending only on its parameters and the features of a single input. To this end, we introduce a general-purpose deep learning architecture that takes as input the entire dataset instead of processing one datapoint at a time.

Nov 30, 2022
Official repository for the paper "Going Beyond Linear Transformers with Recurrent Fast Weight Programmers"

Recurrent Fast Weight Programmers This is the official repository containing the code we used to produce the experimental results reported in the pape

Nov 15, 2022
[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen*, Kaixiong Zhou*, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang
[Preprint]

Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive Benchmark Study Codes for [Preprint] Bag of Tricks for Training Deeper Graph

Nov 29, 2022
Deeper DCGAN with AE stabilization

AEGeAN Deeper DCGAN with AE stabilization Parallel training of generative adversarial network as an autoencoder with dedicated losses for each stage.

Feb 17, 2022
a reimplementation of Holistically-Nested Edge Detection in PyTorch
a reimplementation of Holistically-Nested Edge Detection in PyTorch

pytorch-hed This is a personal reimplementation of Holistically-Nested Edge Detection [1] using PyTorch. Should you be making use of this work, please

Nov 27, 2022
Code for paper "ASAP-Net: Attention and Structure Aware Point Cloud Sequence Segmentation"
Code for paper

ASAP-Net This project implements ASAP-Net of paper ASAP-Net: Attention and Structure Aware Point Cloud Sequence Segmentation (BMVC2020). Overview We i

Aug 25, 2022
U^2-Net - Portrait matting This repository explores possibilities of using the original u^2-net model for portrait matting.
U^2-Net - Portrait matting This repository explores possibilities of using the original u^2-net model for portrait matting.

U^2-Net - Portrait matting This repository explores possibilities of using the original u^2-net model for portrait matting.

Nov 25, 2022
Neural networks applied in recognizing guitar chords using python, AutoML.NET with C# and .NET Core
Neural networks applied in recognizing guitar chords using python, AutoML.NET with C# and .NET Core

Chord Recognition Demo application The demo application is written in C# with .NETCore. As of July 9, 2020, the only version available is for windows

Oct 22, 2022
U-2-Net: U Square Net - Modified for paired image training of style transfer
U-2-Net: U Square Net - Modified for paired image training of style transfer

U2-Net: U Square Net Modified for paired image training of style transfer This is an unofficial repo making use of the code which was made available b

Oct 3, 2022
RGBD-Net - This repository contains a pytorch lightning implementation for the 3DV 2021 RGBD-Net paper.
RGBD-Net - This repository contains a pytorch lightning implementation for the 3DV 2021 RGBD-Net paper.

[3DV 2021] We propose a new cascaded architecture for novel view synthesis, called RGBD-Net, which consists of two core components: a hierarchical depth regression network and a depth-aware generator network.

May 26, 2022
CFC-Net: A Critical Feature Capturing Network for Arbitrary-Oriented Object Detection in Remote Sensing Images
CFC-Net: A Critical Feature Capturing Network for Arbitrary-Oriented Object Detection in Remote Sensing Images

CFC-Net This project hosts the official implementation for the paper: CFC-Net: A Critical Feature Capturing Network for Arbitrary-Oriented Object Dete

Nov 28, 2022
Receptive Field Block Net for Accurate and Fast Object Detection, ECCV 2018
Receptive Field Block Net for Accurate and Fast Object Detection, ECCV 2018

Receptive Field Block Net for Accurate and Fast Object Detection By Songtao Liu, Di Huang, Yunhong Wang Updatas (2021/07/23): YOLOX is here!, stronger

Nov 25, 2022