[SIGGRAPH'22] StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets

[Project] [PDF] Hugging Face Spaces

This repository contains code for our SIGGRAPH'22 paper "StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets"

by Axel Sauer, Katja Schwarz, and Andreas Geiger.

If you find our code or paper useful, please cite

@InProceedings{Sauer2021ARXIV,
  author    = {Axel Sauer and Katja Schwarz and Andreas Geiger},
  title     = {StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets},
  journal   = {arXiv.org},
  volume    = {abs/2201.00273},
  year      = {2022},
  url       = {https://arxiv.org/abs/2201.00273},
}
Rank on Papers With Code  
PWC PWC
PWC PWC
PWC PWC
PWC PWC
PWC PWC

Related Projects

  • Projected GANs Converge Faster (NeurIPS'21)  -  Official Repo  -  Projected GAN Quickstart
  • StyleGAN-XL + CLIP (Implemented by CasualGANPapers)  -  StyleGAN-XL + CLIP
  • StyleGAN-XL + CLIP (Modified by Katherine Crowson to optimize in W+ space)  -  StyleGAN-XL + CLIP

ToDos

  • Initial code release
  • Add pretrained models (ImageNet{16,32,64,128,256,512,1024}, FFHQ{256,512,1024}, Pokemon{256,512,1024})
  • Add StyleMC for editing
  • Add PTI for inversion

Requirements

  • 64-bit Python 3.8 and PyTorch 1.9.0 (or later). See https://pytorch.org for PyTorch install instructions.
  • CUDA toolkit 11.1 or later.
  • GCC 7 or later compilers. The recommended GCC version depends on your CUDA version; see for example, CUDA 11.4 system requirements.
  • If you run into problems when setting up the custom CUDA kernels, we refer to the Troubleshooting docs of the original StyleGAN3 repo and the following issues: #23.
  • Windows user struggling installing the env might find #10 helpful.
  • Use the following commands with Miniconda3 to create and activate your PG Python environment:
    • conda env create -f environment.yml
    • conda activate sgxl

Data Preparation

For a quick start, you can download the few-shot datasets provided by the authors of FastGAN. You can download them here. To prepare the dataset at the respective resolution, run

python dataset_tool.py --source=./data/pokemon --dest=./data/pokemon256.zip \
  --resolution=256x256 --transform=center-crop

You need to follow our progressive growing scheme to get the best results. Therefore, you should prepare separate zips for each training resolution. You can get the datasets we used in our paper at their respective websites (FFHQ, ImageNet).

Training

For progressive growing, we train a stem on low resolution, e.g., 162 pixels. When the stem is finished, i.e., FID is saturating, you can start training the upper stages; we refer to these as superresolution stages.

Training the stem

Training StyleGAN-XL on Pokemon using 8 GPUs:

python train.py --outdir=./training-runs/pokemon --cfg=stylegan3-t --data=./data/pokemon16.zip \
    --gpus=8 --batch=64 --mirror=1 --snap 10 --batch-gpu 8 --kimg 10000 --syn_layers 10

--batch specifies the overall batch size, --batch-gpu specifies the batch size per GPU. The training loop will automatically accumulate gradients if you use fewer GPUs until the overall batch size is reached.

Samples and metrics are saved in outdir. If you don't want to track metrics, set --metrics=none. You can inspect fid50k_full.json or run tensorboard in training-runs/ to monitor the training progress.

For a class-conditional dataset (ImageNet, CIFAR-10), add the flag --cond True . The dataset needs to contain the class labels; see the StyleGAN2-ADA repo on how to prepare class-conditional datasets.

Training the super-resolution stages

Continuing with pretrained stem:

python train.py --outdir=./training-runs/pokemon --cfg=stylegan3-t --data=./data/pokemon32.zip \
  --gpus=8 --batch=64 --mirror=1 --snap 10 --batch-gpu 8 --kimg 10000 --syn_layers 10 \
  --superres --up_factor 2 --head_layers 7 \
  --path_stem training-runs/pokemon/00000-stylegan3-t-pokemon16-gpus8-batch64/best_model.pkl

--up_factor allows to train several stages at once, i.e., with --up_factor=4 and a 162 stem you can directly train at resolution 642.

If you have enough compute, a good tactic is to train several stages in parallel and then restart the superresolution stage training once in a while. The current stage will then reload its previous stem's best_model.pkl. Performance can sometimes drop at first because of domain shift, but the superresolution stage quickly recovers and improves further.

Training recommendations for datasets other than ImageNet

The default settings are tuned for ImageNet. For smaller datasets (<50k images) or well-curated datasets (FFHQ), you can significantly decrease the model size enabling much faster training. Recommended settings are: --cbase 128 --cmax 128 --syn_layers 4 and for superresolution stages --head_layers 4.

Suppose you want to train as few stages as possible. We recommend training a 32x32 or 64x64 stem, then directly scaling to the final resolution (as described above, you must adjust --up_factor accordingly). However, generally, progressive growing yields better results faster as the throughput is much higher at lower resolutions. This can be seen in this figure by Karras et al., 2017:

Generating Samples & Interpolations

To generate samples and interpolation videos, run

python gen_images.py --outdir=out --trunc=0.7 --seeds=10-15 --batch-sz 1 \
  --network=https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/pokemon256.pkl

and

python gen_video.py --output=lerp.mp4 --trunc=0.7 --seeds=0-31 --grid=4x2 \
  --network=https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/pokemon256.pkl

For class-conditional models, you can pass the class index via --class, a index-to-label dictionary for Imagenet can be found here. For interpolation between classes, provide, e.g., --cls=0-31 to gen_video.py. The list of classes has to be the same length as --seeds.

To generate a conditional sample sheet, run

python gen_class_samplesheet.py --outdir=sample_sheets --trunc=1.0 \
  --network=https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet128.pkl \
  --samples-per-class 4 --classes 0-32 --grid-width 32

For ImageNet models, we enable multi-modal truncation (proposed by Self-Distilled GAN). We generated 600k find 10k cluster centroids via k-means. For a given samples, multi-modal truncation finds the closest centroids and interpolates towards it. To switch from uni-model to multi-modal truncation, pass

--centroids-path=https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet_centroids.npy

No Truncation Uni-Modal Truncation Multi-Modal Truncation

Image Editing

To use our reimplementation of StyleMC, and generate the example above, run

python run_stylemc.py --outdir=stylemc_out \
  --text-prompt "a chimpanzee | laughter | happyness| happy chimpanzee | happy monkey | smile | grin" \
  --seeds 0-256 --class-idx 367 --layers 10-30 --edit-strength 0.75 --init-seed 49 \
  --network=https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet128.pkl \
  --bigger-network https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet1024.pkl

Recommended workflow:

  • Sample images via gen_images.py.
  • Pick a sample and use it as the inital image for stylemc.py by providing --init-seed and --class-idx.
  • Find a direction in style space via --text-prompt.
  • Finetune --edit-strength, --layers, and amount of --seeds.
  • Once you found a good setting, provide a larger model via --bigger-network. The script still optimizes the direction for the smaller model, but uses the bigger model for the final output.

Pretrained Models

We provide the following pretrained models (pass the url as PATH_TO_NETWORK_PKL):

Dataset Res FID PATH
ImageNet 162 0.73 https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet16.pkl
ImageNet 322 1.11 https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet32.pkl
ImageNet 642 1.52 https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet64.pkl
ImageNet 1282 1.77 https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet128.pkl
ImageNet 2562 2.26 https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet256.pkl
ImageNet 5122 2.42 https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet512.pkl
ImageNet 10242 2.51 https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet1024.pkl
CIFAR10 322 1.85 https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/cifar10.pkl
FFHQ 2562 2.19 https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/ffhq256.pkl
FFHQ 5122 2.23 https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/ffhq512.pkl
FFHQ 10242 2.02 https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/ffhq1024.pkl
Pokemon 2562 23.97 https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/pokemon256.pkl
Pokemon 5122 23.82 https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/pokemon512.pkl
Pokemon 10242 25.47 https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/pokemon1024.pkl

Quality Metrics

Per default, train.py tracks FID50k during training. To calculate metrics for a specific network snapshot, run

python calc_metrics.py --metrics=fid50k_full --network=PATH_TO_NETWORK_PKL

To see the available metrics, run

python calc_metrics.py --help

We provide precomputed FID statistics for all pretrained models:

wget https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/gan-metrics.zip
unzip gan-metrics.zip -d dnnlib/

Further Information

This repo builds on the codebase of StyleGAN3 and our previous project Projected GANs Converge Faster.

Comments
  • Error Running Demo

    Error Running Demo

    After following the installation instructions, I get the following error running Cuda 11.6 on an RTX 2080ti

    Traceback (most recent call last):
      File "/home/alex/Spring-2022/CV/GAN/resources/stylegan_xl/train.py", line 332, in <module>
        main()  # pylint: disable=no-value-for-parameter
      File "/home/alex/miniconda3/envs/sgxl/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
        return self.main(*args, **kwargs)
      File "/home/alex/miniconda3/envs/sgxl/lib/python3.9/site-packages/click/core.py", line 1053, in main
        rv = self.invoke(ctx)
      File "/home/alex/miniconda3/envs/sgxl/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/home/alex/miniconda3/envs/sgxl/lib/python3.9/site-packages/click/core.py", line 754, in invoke
        return __callback(*args, **kwargs)
      File "/home/alex/Spring-2022/CV/GAN/resources/stylegan_xl/train.py", line 317, in main
        launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run)
      File "/home/alex/Spring-2022/CV/DogeGAN/resources/stylegan_xl/train.py", line 104, in launch_training
        subprocess_fn(rank=0, c=c, temp_dir=temp_dir)
      File "/home/alex/Spring-2022/CV/GAN/resources/stylegan_xl/train.py", line 49, in subprocess_fn
        training_loop.training_loop(rank=rank, **c)
      File "/home/alex/Spring-2022/CV/GAN/resources/stylegan_xl/training/training_loop.py", line 339, in training_loop
        loss.accumulate_gradients(phase=phase.name, real_img=real_img, real_c=real_c, gen_z=gen_z, gen_c=gen_c, gain=phase.interval, cur_nimg=cur_nimg)
      File "/home/alex/Spring-2022/CV/GAN/resources/stylegan_xl/training/loss.py", line 121, in accumulate_gradients
        loss_Gmain.backward()
      File "/home/alex/miniconda3/envs/sgxl/lib/python3.9/site-packages/torch/_tensor.py", line 363, in backward
        torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
      File "/home/alex/miniconda3/envs/sgxl/lib/python3.9/site-packages/torch/autograd/__init__.py", line 173, in backward
        Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
      File "/home/alex/miniconda3/envs/sgxl/lib/python3.9/site-packages/torch/autograd/function.py", line 253, in apply
        return user_fn(self, *args)
      File "/home/alex/Spring-2022/CV/GAN/resources/stylegan_xl/torch_utils/ops/conv2d_gradfix.py", line 144, in backward
        grad_weight = Conv2dGradWeight.apply(grad_output, input)
      File "/home/alex/Spring-2022/CV/GAN/resources/stylegan_xl/torch_utils/ops/conv2d_gradfix.py", line 173, in forward
        return torch._C._jit_get_operation(name)(weight_shape, grad_output, input, padding, stride, dilation, groups, *flags)
    RuntimeError: No such operator aten::cudnn_convolution_transpose_backward_weight
    
  • Hello, may I ask you how much data you used in your Pokemon image generation project

    Hello, may I ask you how much data you used in your Pokemon image generation project

    Hello, may I apply to you for sharing the data set of POKEMON? I've been working on a Pokemon image generation project recently, but the quality of the generated Pokemon is so poor (FID93) that they are like Cthulhu growling

  • how to prepare imagenet dataset

    how to prepare imagenet dataset

    Hello, your project mentions the use of imagenet dataset, but I have some problems in reproducing it, because there is no usage method for imagenet dataset in datatool. Also, I would like to know how to effectively train the sota results you mentioned in paperswithcode, do you have your training plan? For example, what settings are used for training 16, 32, 64 stages, which are usually written in the yaml file, and how much time it takes to train Imagenet with these settings, if v100/day is used It may be a bit long.

  • code and pretrained models

    code and pretrained models

    Dear StyleGAN_xl team,

    Thank you for your great work. The results are amazing.

    Do you have plan to release the code and pretrained models? When you will release them?

    Thank you for your help.

    Best Wishes,

    Zongze

  • a question

    a question

    when i try to train your great project, something wrong. do you know the reason for RuntimeError: No such operator aten::cudnn_convolution_backward_weight??? thank you very much~

  • Question about normalization

    Question about normalization

    Hi, I notice that the normalization of input images is different from traditional normalization way ( x = (x-mean)/std. ):

    def norm_with_stats(x, stats):
        x_ch0 = torch.unsqueeze(x[:, 0], 1) * (0.5 / stats['mean'][0]) + (0.5 - stats['std'][0]) / stats['mean'][0]
        x_ch1 = torch.unsqueeze(x[:, 1], 1) * (0.5 / stats['mean'][1]) + (0.5 - stats['std'][1]) / stats['mean'][1]
        x_ch2 = torch.unsqueeze(x[:, 2], 1) * (0.5 / stats['mean'][2]) + (0.5 - stats['std'][2]) / stats['mean'][2]
        x = torch.cat((x_ch0, x_ch1, x_ch2), 1)
        return x
    

    Is there any paper or previous work introduces this kind of normalization method?

  • a question about expected training time

    a question about expected training time

    hello, xl-sr. thank you for your great project. when i train the stem, although the resolution is 16, but when i train it on 1-v100, batch_size = 8, got 120sec/kimg is it right?? but when i train the same zip using stylegan2-ada, batch_size = 32, got 8sec/kimg. maybe the stylegan-xl is too slow??? can you answer my question? thank you very much. I really need your help

  • Not resize image to 224 for the ViT-based discriminator

    Not resize image to 224 for the ViT-based discriminator

    https://github.com/autonomousvision/stylegan_xl/blob/819c225e7fd60114cde0bad79f3cb5a4ba8621cd/pg_modules/discriminator.py#L180

    Thank you for the great work! I noticed that in pg_modules/discriminator.py, there is a line `bb_name += f"_{i}" that will make bb_name from "deit_base_distilled_patch16_224" to "deit_base_distilled_patch16_224_0". Then in the forward pass, the input image will not be resized to 244 for the ViT model. Not sure if that is intentional or the image should be resized to 224 (especially for higher resolution such as 512x512 or 1024x1024)

    Thanks again!

  • encoder for stylegn-xl

    encoder for stylegn-xl

    Dear stlegan-xl group,

    Thank you for sharing this great work, i really like it.

    Have you try to train an encoder for the stylegan xl model pretrained on imagenet? Maybe the psp or the e4e encoder?

    Thank you for your help.

    Best Wishes,

    Zongze

  • Order of Data Augmentation and Normalization

    Order of Data Augmentation and Normalization

    Hi, first of all, thank you for your clean code; second, I think I found a (probably not that important) bug. As far as I understand based on the DA paper, one should apply DiffAugment to images when they are in the (-1, 1) range, I'm saying this based on their codebase, they first scale the input from (0, 255) to (-1, 1) in here and then apply their DA function here but in this repository, I think you first scale to (-1, 1) here and then scale to (0, 1) here then normalize based on the pretrained feature extractor (which is correct and it should be applied to (0, 1) images) and then finally you apply DA, isn't it more accurate to first apply DA then move to (0, 1) and then normalize?

  • how to prepare imagenet dataset

    how to prepare imagenet dataset

    hello, thank you for your great work. I can't find the usage of preparing the imagenet dataset in dataset_tool.py now, if i have a imagenet zip, how can i prepare this data. thank you for your help

  • Discriminator pre-train models missing parameters

    Discriminator pre-train models missing parameters

    I am using both the generator and discriminator. Generator works well but the discriminator doesn't. Here is part of my code:

    ////////////load D and G with dnnlib.util.open_url(network_pkl) as f: D = legacy.load_network_pkl(f)['D'] D = D.eval().requires_grad_(False).to(device)

    with dnnlib.util.open_url(network_pkl) as f: G = legacy.load_network_pkl(f)['G_ema'] G = G.eval().requires_grad_(False).to(device)

    img = G.synthesis(w, update_emas=False) D(img, torch.empty([1, G.c_dim], device=device)) ////////////////

    Error happened in the last line of code. I tried different pre-trained models, some time the error is missing AttributeError: 'Mlp' object has no attribute 'drop1', sometimes is missing "mean" and "std" for normalization layer.

    Here is one error message:

    /home/kaixuan/anaconda3/envs/GAN/lib/python3.9/site-packages/torch/functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2228.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] Traceback (most recent call last): File "/home/kaixuan/Projects/MyStyleGAN_XL/D_run_tmp.py", line 136, in generate_images() # pylint: disable=no-value-for-parameter File "/home/kaixuan/anaconda3/envs/GAN/lib/python3.9/site-packages/click/core.py", line 1130, in call return self.main(*args, **kwargs) File "/home/kaixuan/anaconda3/envs/GAN/lib/python3.9/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/home/kaixuan/anaconda3/envs/GAN/lib/python3.9/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/kaixuan/anaconda3/envs/GAN/lib/python3.9/site-packages/click/core.py", line 760, in invoke return __callback(*args, **kwargs) File "/home/kaixuan/anaconda3/envs/GAN/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/kaixuan/Projects/MyStyleGAN_XL/D_run_tmp.py", line 129, in generate_images D(img1, torch.empty([1, G.c_dim], device=device)) File "/home/kaixuan/anaconda3/envs/GAN/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/home/kaixuan/Projects/MyStyleGAN_XL/pg_modules/discriminator.py", line 213, in forward features = feat(x_n) File "/home/kaixuan/anaconda3/envs/GAN/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/home/kaixuan/Projects/MyStyleGAN_XL/pg_modules/projector.py", line 114, in forward out0, out1, out2, out3 = forward_vit(self.pretrained, x) File "/home/kaixuan/Projects/MyStyleGAN_XL/feature_networks/vit.py", line 59, in forward_vit _ = pretrained.model.forward_flex(x) File "/home/kaixuan/Projects/MyStyleGAN_XL/feature_networks/vit.py", line 149, in forward_flex x = blk(x) File "/home/kaixuan/anaconda3/envs/GAN/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/home/kaixuan/anaconda3/envs/GAN/lib/python3.9/site-packages/timm/models/vision_transformer.py", line 230, in forward x = x + self.drop_path(self.mlp(self.norm2(x))) File "/home/kaixuan/anaconda3/envs/GAN/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/home/kaixuan/anaconda3/envs/GAN/lib/python3.9/site-packages/timm/models/layers/mlp.py", line 28, in forward x = self.drop1(x) File "/home/kaixuan/anaconda3/envs/GAN/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1185, in getattr raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'Mlp' object has no attribute 'drop1'

  • How can I change the output of the network?

    How can I change the output of the network?

    Thank you for your excellent work. I am trying to get my hands on StyleganXL and I am trying to get the intermediate features of the pretrained StyleganXL. However, I am a little confused about the loading procedure and the @persistence decorator.

    As I understand, the source code saved in pickle will be used when loading a persistence class. I can refer to the following code to get the pretrained intermediate feature. I should first load the pretrained StyleganXL. Then define a new network which modified the source code and outputs features.

           with open('old_pickle.pkl', 'rb') as f:
                old_net = pickle.load(f)
            new_net = MyNetwork(*old_obj.init_args, **old_obj.init_kwargs)
            misc.copy_params_and_buffers(old_net, new_net, require_all=True)
    

    However, I noticed in networks_stylegan3.py, initializing the Generator relys on in_embeddings/tf_efficientnet_lite0.pkl. What does this pkl do? And how can I implement the above function?

  • multiple nodes

    multiple nodes

    Hello, I would like to ask if you can use multiple machines and cards for training. That is, using multiple nodes, because it looks like you can only support single multi-card

  • Possible to use transfer learning?

    Possible to use transfer learning?

    Is it possible to transfer learn from the largest resolution on a new dataset? This was a common trick that worked pretty well in other versions of StyleGAN and saved a ton of compute time.

    I tried the following (but may have gotten some of the arguments wrong):

    !python train.py --outdir=./training-runs/test --cfg=stylegan3-t --data=./data/my-dataset-256.zip \
      --gpus=1 --batch=64 --mirror=1 --snap 1 --batch-gpu 4 --kimg 10000 --syn_layers 10 --head_layers 4 \
      --resume=./pretrained/pokemon256.pkl --metrics=None
    

    but I got the following error: RuntimeError: output with shape [12] doesn't match the broadcast shape [12, 12]

Implementation of the ๐Ÿ˜‡ Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbones
Implementation of the ๐Ÿ˜‡ Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbones

HaloNet - Pytorch Implementation of the Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbones. This re

May 18, 2022
Image-Scaling Attacks and Defenses
Image-Scaling Attacks and Defenses

Image-Scaling Attacks & Defenses This repository belongs to our publication: Erwin Quiring, David Klein, Daniel Arp, Martin Johns and Konrad Rieck. Ad

May 19, 2022
A PyTorch implementation of " EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks."

EfficientNet A PyTorch implementation of EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. [arxiv] [Official TF Repo] Implemen

May 11, 2022
Official pytorch implementation of "Scaling-up Disentanglement for Image Translation", ICCV 2021.
Official pytorch implementation of

Official pytorch implementation of "Scaling-up Disentanglement for Image Translation", ICCV 2021.

Mar 9, 2022
For auto aligning, cropping, and scaling HR and LR images for training image based neural networks

ImgAlign For auto aligning, cropping, and scaling HR and LR images for training image based neural networks Usage Make sure OpenCV is installed, 'pip

Apr 13, 2022
Official code for On Path Integration of Grid Cells: Group Representation and Isotropic Scaling (NeurIPS 2021)
Official code for On Path Integration of Grid Cells: Group Representation and Isotropic Scaling (NeurIPS 2021)

On Path Integration of Grid Cells: Group Representation and Isotropic Scaling This repo contains the official implementation for the paper On Path Int

Apr 26, 2022
Implementation of "Scaled-YOLOv4: Scaling Cross Stage Partial Network" using PyTorch framwork.

YOLOv4-large This is the implementation of "Scaled-YOLOv4: Scaling Cross Stage Partial Network" using PyTorch framwork. YOLOv4-CSP YOLOv4-tiny YOLOv4-

May 14, 2022
Unofficial PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution

PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution [arXiv 2021].

May 16, 2022
As-ViT: Auto-scaling Vision Transformers without Training
As-ViT: Auto-scaling Vision Transformers without Training

As-ViT: Auto-scaling Vision Transformers without Training [PDF] Wuyang Chen, Wei Huang, Xianzhi Du, Xiaodan Song, Zhangyang Wang, Denny Zhou In ICLR 2

May 23, 2022
Official implementation of "Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets" (CVPR2021)
Official implementation of

Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets This is the official implementation of "Towards Good Pract

Mar 30, 2022
A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.
A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.

About This repository provides data and code for the paper: Scalable Data Annotation Pipeline for High-Quality Large Speech Datasets Development (subm

May 8, 2022
An easy way to build PyTorch datasets. Modularly build datasets and automatically cache processed results

EasyDatas An easy way to build PyTorch datasets. Modularly build datasets and automatically cache processed results Installation pip install git+https

Dec 14, 2021
Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data.
Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data.

Deep Learning Dataset Maker Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data. How to use Down

May 5, 2022
Cl datasets - PyTorch image dataloaders and utility functions to load datasets for supervised continual learning

Continual learning datasets Introduction This repository contains PyTorch image

Jan 27, 2022
CVPR 2021: "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE"
CVPR 2021:

Diverse Structure Inpainting ArXiv | Papar | Supplementary Material | BibTex This repository is for the CVPR 2021 paper, "Generating Diverse Structure

May 10, 2022
Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)
Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)

Diverse Image Captioning with Context-Object Split Latent Spaces This repository is the PyTorch implementation of the paper: Diverse Image Captioning

May 16, 2022
Diverse Branch Block: Building a Convolution as an Inception-like Unit
Diverse Branch Block: Building a Convolution as an Inception-like Unit

Diverse Branch Block: Building a Convolution as an Inception-like Unit (PyTorch) (CVPR-2021) DBB is a powerful ConvNet building block to replace regul

May 20, 2022
This is the PyTorch implementation of GANs Nโ€™ Roses: Stable, Controllable, Diverse Image to Image Translation
This is the PyTorch implementation of GANs Nโ€™ Roses: Stable, Controllable, Diverse Image to Image Translation

Official PyTorch repo for GAN's N' Roses. Diverse im2im and vid2vid selfie to anime translation.

May 14, 2022
Code for our ACL 2021 paper "One2Set: Generating Diverse Keyphrases as a Set"
Code for our ACL 2021 paper

One2Set This repository contains the code for our ACL 2021 paper โ€œOne2Set: Generating Diverse Keyphrases as a Setโ€. Our implementation is built on the

May 13, 2022