Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder

rave_logo

RAVE: Realtime Audio Variational autoEncoder

Official implementation of RAVE: A variational autoencoder for fast and high-quality neural audio synthesis (article link) by Antoine Caillon and Philippe Esling.

If you use RAVE as a part of a music performance or installation, be sure to cite either this repository or the article !

Installation

RAVE needs python 3.9. Install the dependencies using

pip install -r requirements.txt

Detailed instructions to setup a training station for this project are available here.

Preprocessing

RAVE comes with two command line utilities, resample and duration. resample allows to pre-process (silence removal, loudness normalization) and augment (compression) an entire directory of audio files (.mp3, .aiff, .opus, .wav, .aac). duration prints out the total duration of a .wav folder.

Training

Both RAVE and the prior model are available in this repo. For most users we recommand to use the cli_helper.py script, since it will generate a set of instructions allowing the training and export of both RAVE and the prior model on a specific dataset.

python cli_helper.py

However, if you want to customize even more your training, you can use the provided train_{rave, prior}.py and export_{rave, prior}.py scripts manually.

Reconstructing audio

Once trained, you can reconstruct an entire folder containing wav files using

python reconstruct.py --ckpt /path/to/checkpoint --wav-folder /path/to/wav/folder

You can also export RAVE to a torchscript file using export_rave.py and use the encode and decode methods on tensors.

Realtime usage

UPDATE

If you want to use the realtime mode, you should update your dependencies !

pip install -r requirements.txt

RAVE and the prior model can be used in realtime on live audio streams, allowing creative interactions with both models.

nn~

RAVE is compatible with the nn~ max/msp and PureData external.

max_msp_screenshot

An audio example of the prior sampling patch is available in the docs/ folder.

RAVE vst

You can also use RAVE as a VST audio plugin using the RAVE vst !

plugin_screenshot

Discussion

If you have questions, want to share your experience with RAVE or share musical pieces done with the model, you can use the Discussion tab !

Owner
ACIDS
Artificial Creative Intelligence and Data Science
ACIDS
Comments
  • Error trying to launch train_rave.py

    Error trying to launch train_rave.py

    Hi, I was trying to launch train_rave.py with a dataset for testing. I am using 310 .wav files with the cli_helper.py, which returned me an error like this:

    $ python train_rave.py --name training1 --wav ./dataset/1 --preprocessed /tmp/rave/training1/rave ... 5_38_26.wav: 99%|███████████████████████████▊| 308/310 [00:46<00:00, 6.69it/s]/home/pablo/.local/lib/python3.9/site-packages/librosa/core/audio.py:165: UserWarning: PySoundFile failed. Trying audioread instead. warnings.warn("PySoundFile failed. Trying audioread instead.")

    2_38_13.wav: 100%|███████████████████████████▉| 309/310 [00:46<00:00, 6.60it/s]/home/pablo/.local/lib/python3.9/site-packages/librosa/core/audio.py:165: UserWarning: PySoundFile failed. Trying audioread instead. warnings.warn("PySoundFile failed. Trying audioread instead.")

    2_38_13.wav: 100%|████████████████████████████| 310/310 [00:46<00:00, 6.61it/s] Traceback (most recent call last): File "/home/pablo/RAVE/train_rave.py", line 77, in dataset = SimpleDataset( File "/home/pablo/.local/lib/python3.9/site-packages/udls/simple_dataset.py", line 83, in init raise Exception("No data found !") Exception: No data found !

    Have you seen this error before? I am trying to use it with a GPU, but launching only with CPU throws the same error. Maybe there is a problem with the dataset?

    Thanks!

  • Kernel size error when rave.decode(z)

    Kernel size error when rave.decode(z)

    i get the following error when i try generating from the prior

    Traceback (most recent call last): File "/home/syrinx/RAVE/ravezeke1-generate.py", line 43, in <module> y = rave.decode(z) File "/home/syrinx/RAVE/rave/model.py", line 582, in decode y = self.decoder(z, add_noise=True) File "/home/syrinx/miniconda2/envs/rave/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/syrinx/RAVE/rave/model.py", line 235, in forward x = self.net(x) File "/home/syrinx/miniconda2/envs/rave/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/syrinx/miniconda2/envs/rave/lib/python3.9/site-packages/torch/nn/modules/container.py", line 141, in forward input = module(input) File "/home/syrinx/miniconda2/envs/rave/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1120, in _call_impl result = forward_call(*input, **kwargs) File "/home/syrinx/miniconda2/envs/rave/lib/python3.9/site-packages/cached_conv/convs.py", line 74, in forward return nn.functional.conv1d( RuntimeError: Calculated padded input size per channel: (6). Kernel size: (7). Kernel size can't be greater than actual input size

    this is the code

    `################ PRIOR GENERATION ################

    STEP 1: CREATE DUMMY INPUT TENSOR

    generation_length = 2**18 # approximately 6s at 48kHz x = torch.randn(1, 1, generation_length) # dummy input z = rave.encode(x) # dummy latent representation z = torch.zeros_like(z)

    STEP 2: AUTOREGRESSIVE GENERATION

    z = prior.quantized_normal.encode(prior.diagonal_shift(z)) z = prior.generate(z) z = prior.diagonal_shift.inverse(prior.quantized_normal.decode(z))

    STEP 3: SYNTHESIS AND EXPORT

    y = rave.decode(z) sf.write("output_audio.wav", y.reshape(-1).numpy(), sr) `

    when i change generation_length to a smaller size, i get the error

    RuntimeError: cannot reshape tensor of 0 elements into shape [1, 0, -1] because the unspecified dimension size -1 can be any value and is ambiguous

    my audio is in 44.1kHz

  • Training tip for complex datasets

    Training tip for complex datasets

    Hi,

    Just wanted to share some insights into training RAVE with complex datasets (full songs, heterogeneous sounds).

    One solution I've come across having a good convergence on models is extending the latent size parameter to a full 128. Also, which is what I think had a greater impact, is to extend the Phase 1 training or warmup to about 5 million steps before switching to Phase 2.

    Obviously, more data will have better results, but so far I've found that the more 'detailed' features in my dataset, namely melodies, textures of individual mid-to-high freq. range instruments seem to start converging better beyond the suggested warmup phase of 1 million steps, as the loss keeps consistently going down.

    Hope this helps anyone having trouble with this.

    Good luck!

  • Error when trying to train the prior

    Error when trying to train the prior

    Hey again,

    I just finished training and exporting a new model, but I cant seem to get it to train the prior. I am getting the following error when exporting the model:

    /home/user/code/RAVE/env3.9/lib/python3.9/site-packages/pytorch_lightning/core/saving.py:217: UserWarning: Found keys that are not in the model state dict but in the checkpoint: ['decoder.net.2.net.0.aligned.paddings.0.pad', 'decoder.net.2.net.0.aligned.paddings.1.pad', 'decoder.net.2.net.1.aligned.paddings.0.pad', 'decoder.net.2.net.1.aligned.paddings.1.pad', 'decoder.net.2.net.2.aligned.paddings.0.pad', 'decoder.net.2.net.2.aligned.paddings.1.pad', 'decoder.net.4.net.0.aligned.paddings.0.pad', 'decoder.net.4.net.0.aligned.paddings.1.pad', 'decoder.net.4.net.1.aligned.paddings.0.pad', 'decoder.net.4.net.1.aligned.paddings.1.pad', 'decoder.net.4.net.2.aligned.paddings.0.pad', 'decoder.net.4.net.2.aligned.paddings.1.pad', 'decoder.net.6.net.0.aligned.paddings.0.pad', 'decoder.net.6.net.0.aligned.paddings.1.pad', 'decoder.net.6.net.1.aligned.paddings.0.pad', 'decoder.net.6.net.1.aligned.paddings.1.pad', 'decoder.net.6.net.2.aligned.paddings.0.pad', 'decoder.net.6.net.2.aligned.paddings.1.pad', 'decoder.net.8.net.0.aligned.paddings.0.pad', 'decoder.net.8.net.0.aligned.paddings.1.pad', 'decoder.net.8.net.1.aligned.paddings.0.pad', 'decoder.net.8.net.1.aligned.paddings.1.pad', 'decoder.net.8.net.2.aligned.paddings.0.pad', 'decoder.net.8.net.2.aligned.paddings.1.pad', 'decoder.synth.paddings.0.pad', 'decoder.synth.paddings.1.pad', 'decoder.synth.paddings.2.pad'] rank_zero_warn(

    any ides?

  • Model Size Mismatch

    Model Size Mismatch

    not sure what going on here, but after I pulled the latest master, I am getting this error:

    RuntimeError: Error(s) in loading state_dict for RAVE:
    	size mismatch for decoder.net.2.net.0.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 512, 0]) from checkpoint, the shape in current model is torch.Size([64, 512, 0]).
    	size mismatch for decoder.net.2.net.0.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 512, 0]) from checkpoint, the shape in current model is torch.Size([64, 512, 0]).
    	size mismatch for decoder.net.2.net.1.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 512, 0]) from checkpoint, the shape in current model is torch.Size([64, 512, 0]).
    	size mismatch for decoder.net.2.net.1.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 512, 0]) from checkpoint, the shape in current model is torch.Size([64, 512, 0]).
    	size mismatch for decoder.net.2.net.2.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 512, 0]) from checkpoint, the shape in current model is torch.Size([64, 512, 0]).
    	size mismatch for decoder.net.2.net.2.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 512, 0]) from checkpoint, the shape in current model is torch.Size([64, 512, 0]).
    	size mismatch for decoder.net.4.net.0.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 256, 0]) from checkpoint, the shape in current model is torch.Size([64, 256, 0]).
    	size mismatch for decoder.net.4.net.0.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 256, 0]) from checkpoint, the shape in current model is torch.Size([64, 256, 0]).
    	size mismatch for decoder.net.4.net.1.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 256, 0]) from checkpoint, the shape in current model is torch.Size([64, 256, 0]).
    	size mismatch for decoder.net.4.net.1.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 256, 0]) from checkpoint, the shape in current model is torch.Size([64, 256, 0]).
    	size mismatch for decoder.net.4.net.2.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 256, 0]) from checkpoint, the shape in current model is torch.Size([64, 256, 0]).
    	size mismatch for decoder.net.4.net.2.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 256, 0]) from checkpoint, the shape in current model is torch.Size([64, 256, 0]).
    	size mismatch for decoder.net.6.net.0.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 128, 0]) from checkpoint, the shape in current model is torch.Size([64, 128, 0]).
    	size mismatch for decoder.net.6.net.0.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 128, 0]) from checkpoint, the shape in current model is torch.Size([64, 128, 0]).
    	size mismatch for decoder.net.6.net.1.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 128, 0]) from checkpoint, the shape in current model is torch.Size([64, 128, 0]).
    	size mismatch for decoder.net.6.net.1.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 128, 0]) from checkpoint, the shape in current model is torch.Size([64, 128, 0]).
    	size mismatch for decoder.net.6.net.2.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 128, 0]) from checkpoint, the shape in current model is torch.Size([64, 128, 0]).
    	size mismatch for decoder.net.6.net.2.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 128, 0]) from checkpoint, the shape in current model is torch.Size([64, 128, 0]).
    	size mismatch for decoder.net.8.net.0.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 64, 0]) from checkpoint, the shape in current model is torch.Size([64, 64, 0]).
    	size mismatch for decoder.net.8.net.0.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 64, 0]) from checkpoint, the shape in current model is torch.Size([64, 64, 0]).
    	size mismatch for decoder.net.8.net.1.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 64, 0]) from checkpoint, the shape in current model is torch.Size([64, 64, 0]).
    	size mismatch for decoder.net.8.net.1.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 64, 0]) from checkpoint, the shape in current model is torch.Size([64, 64, 0]).
    	size mismatch for decoder.net.8.net.2.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 64, 0]) from checkpoint, the shape in current model is torch.Size([64, 64, 0]).
    	size mismatch for decoder.net.8.net.2.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 64, 0]) from checkpoint, the shape in current model is torch.Size([64, 64, 0]).
    	size mismatch for decoder.synth.paddings.0.pad: copying a param with shape torch.Size([8, 64, 0]) from checkpoint, the shape in current model is torch.Size([64, 64, 0]).
    	size mismatch for decoder.synth.paddings.1.pad: copying a param with shape torch.Size([8, 64, 0]) from checkpoint, the shape in current model is torch.Size([64, 64, 0]).
    	size mismatch for decoder.synth.paddings.2.pad: copying a param with shape torch.Size([8, 64, 0]) from checkpoint, the shape in current model is torch.Size([64, 64, 0]).
    

    any insights would be welcomed!

  • Can't load pretrained state

    Can't load pretrained state

    Hey, I used rave.cpkt in the example "loading pertained models" from the readme and got the following error:

    Sincerely sorry for the painfully nooby issue

    code

    `import torch

    torch.set_grad_enabled(False) from rave import RAVE from prior import Prior

    import librosa as li import soundfile as sf

    ################ LOADING PRETRAINED MODELS ################ rave = RAVE.load_from_checkpoint("./rave_pretrained/darbouka/rave.ckpt").eval()`

    error

    Traceback (most recent call last): File "/Users/krayyy/PycharmProjects/pythonProject11/rave39test.py", line 11, in <module> rave = RAVE.load_from_checkpoint("./rave_pretrained/darbouka/rave.ckpt").eval() File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/pytorch_lightning/core/saving.py", line 153, in load_from_checkpoint model = cls._load_model_state(checkpoint, strict=strict, **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/pytorch_lightning/core/saving.py", line 201, in _load_model_state keys = model.load_state_dict(checkpoint["state_dict"], strict=strict) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/torch/nn/modules/module.py", line 1482, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for RAVE: Unexpected key(s) in state_dict: "decoder.net.2.net.0.aligned.paddings.0.pad", "decoder.net.2.net.0.aligned.paddings.1.pad", "decoder.net.2.net.1.aligned.paddings.0.pad", "decoder.net.2.net.1.aligned.paddings.1.pad", "decoder.net.2.net.2.aligned.paddings.0.pad", "decoder.net.2.net.2.aligned.paddings.1.pad", "decoder.net.4.net.0.aligned.paddings.0.pad", "decoder.net.4.net.0.aligned.paddings.1.pad", "decoder.net.4.net.1.aligned.paddings.0.pad", "decoder.net.4.net.1.aligned.paddings.1.pad", "decoder.net.4.net.2.aligned.paddings.0.pad", "decoder.net.4.net.2.aligned.paddings.1.pad", "decoder.net.6.net.0.aligned.paddings.0.pad", "decoder.net.6.net.0.aligned.paddings.1.pad", "decoder.net.6.net.1.aligned.paddings.0.pad", "decoder.net.6.net.1.aligned.paddings.1.pad", "decoder.net.6.net.2.aligned.paddings.0.pad", "decoder.net.6.net.2.aligned.paddings.1.pad", "decoder.net.8.net.0.aligned.paddings.0.pad", "decoder.net.8.net.0.aligned.paddings.1.pad", "decoder.net.8.net.1.aligned.paddings.0.pad", "decoder.net.8.net.1.aligned.paddings.1.pad", "decoder.net.8.net.2.aligned.paddings.0.pad", "decoder.net.8.net.2.aligned.paddings.1.pad", "decoder.synth.paddings.0.pad", "decoder.synth.paddings.1.pad", "decoder.synth.paddings.2.pad".

  • reconstruction error

    reconstruction error

    Hi caillonantoine,

    Thanks for your brilliant work, your work is amazing. I just want to try your model using my data, but I think I got some wrong result. My training now is on the first stage(just VAE part) and my epoch is 1220, so my global step is 1220 * 32(batch), the curve is like this:

    image

    My config is as follow:

    DATA_SIZE : 16
    CAPACITY : 64
    LATENT_SIZE : 128
    RATIOS : [4, 4, 4, 2]
    BIAS : True
    NO_LATENCY : False
    
    MIN_KL : 1e-1
    MAX_KL : 1e-1
    CROPPED_LATENT_SIZE : 0
    FEATURE_MATCH : True
    
    LOUD_STRIDE : 1
    
    USE_NOISE : True
    NOISE_RATIOS : [4, 4, 4]
    NOISE_BANDS : 5
    
    D_CAPACITY : 16
    D_MULTIPLIER : 4
    D_N_LAYERS : 4
    
    WARMUP : 1000000
    MODE : "hinge"
    
    
    SR : 22050
    N_SIGNAL : 655360
    MAX_STEPS : 3000000
    
    BATCH : 32
    NAME : 'rave'
    

    my original wav is: image

    and my reconstruction wav is: image

    Is that normal? I think may be some issues with my training.

    can you give me some suggestions about this issue, please?

    Thanks in advance!

    Best,

  • RuntimeError when training prior

    RuntimeError when training prior

    I've been getting issues with the prior side of things for a few weeks now. I've been in between local and colab since I noticed CUDA was maybe an issue (I'm using a 3060 card and torch is often tricky).

    Using torch==1.8.0+cu11.1, I succesfully trained and exported my ts model. When attempting to train prior I get this error:

    RuntimeError: cannot reshape tensor of 0 elements into shape [12, 0, -1] because the unspecified dimension size -1 can be any value and is ambiguous

    I think this is a symptom of compatibility issues and versioning, and would appreciate suggestions on this training pipeline, preferably avoiding the need to retrain with a specific code and torch versions (?...).

    🙂

  • Discriminator training

    Discriminator training

    Hey Antoine,

    I've attempted to train RAVE on a ~200h guitar dataset, trained for 2.5M steps on bs of 32. The results so far appear weaker than expected (top is GT, bottom is reconstruction) image

    Looking more closely at the situation, we see a few things:

    • the distance took a significant hit when D was introduced, and never recovered (13 -> 15).
    • While the distance metric converged, the feature distance keeps increasing. This seems like some unhealthy situation between the two. Will try tapering off LR and some other GAN tricks to help there.
    • There is some suspicious looking situation in the latent codes - the informative part keeps oscillating. image

    Did you observe the same situation with D in your runs?

  • training logs/loss curves

    training logs/loss curves

    Hey Antoine, love this work! Lots of clever tricks :)

    Any chance you can share the training logs? I am trying to reproduce your findings (on a different dataset of similar difficulty), and these would be helpful to know if I am headed in the right direction.

    For example, I ended up tweaking a few things (have more VRAM so I 4x the batch size and adjusted the LR by the sqrt of the bs ratio, as proposed here).

    P.S: the distance metric sadly gets modified by the GAN feature matching loss. Maybe worth logging it separately - I almost had a heart attack haha when the distance metric started spiking :)

  • AssertionError when attempting to run train_rave.py

    AssertionError when attempting to run train_rave.py

    Hi,

    I'm currently attempting to train using the provided cli_helper.py script.

    Unfortunately, I seem to come up with this error whenever I actually attempt to run train_rave.py

    Traceback (most recent call last):
      File "/home/hexorcismos/Desktop/AI/RAVE/train_rave.py", line 109, in <module>
        assert len(CUDA)
    AssertionError
    

    A quick fix that made it work is to comment out these lines in train_rave.py.

        CUDA = gpu.getAvailable(maxMemory=.05)
        assert len(CUDA)
        environ["CUDA_VISIBLE_DEVICES"] = str(CUDA[0])
    

    Then it would detect my GPU card with no problems.

    Wanted to bring it up to attention for anybody experiencing the same error.

    Best and thanks for open sourcing this exciting paper!

  • TypeError: cannot pickle 'Environment' object (Win10)

    TypeError: cannot pickle 'Environment' object (Win10)

    Windows 10, Python 3.9 + conda, all dependencies seem to be installed.

    CUDA installed and nvidia-smi works but the script seems to not see my device.

    PS C:\Users\jesse\rave> python train_rave.py -c small --name kicks --wav C:\Users\jesse\RAVE\kicks\out_44100 --preprocessed C:\Users\jesse\RAVE\kicks_temp\kicks\rave --sr 44100 Recursive search in C:\Users\jesse\RAVE\kicks\out_44100 audio_00524_00000.wav: 100%|████████████████████████████████████████████████████████| 525/525 [00:03<00:00, 135.07it/s] C:\Users\jesse\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\data\dataloader.py:487: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 4 (cpuset is not taken into account), which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. warnings.warn(_create_warning_msg( No GPU found. GPU available: False, used: False TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs

    | Name | Type | Params

    0 | pqmf | CachedPQMF | 16.7 K 1 | loudness | Loudness | 0 2 | encoder | Encoder | 1.4 M 3 | decoder | Generator | 3.4 M 4 | discriminator | StackDiscriminators | 16.9 M

    21.7 M Trainable params 0 Non-trainable params 21.7 M Total params 86.972 Total estimated model params size (MB) Sanity Checking: 0it [00:00, ?it/s]Traceback (most recent call last): File "C:\Users\jesse\rave\train_rave.py", line 171, in trainer.fit(model, train, val, ckpt_path=run) File "C:\Users\jesse\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 768, in fit self._call_and_handle_interrupt( File "C:\Users\jesse\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 721, in _call_and_handle_interrupt return trainer_fn(*args, **kwargs) File "C:\Users\jesse\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 809, in _fit_impl results = self._run(model, ckpt_path=self.ckpt_path) File "C:\Users\jesse\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1234, in _run results = self._run_stage() File "C:\Users\jesse\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1321, in _run_stage return self._run_train() File "C:\Users\jesse\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1343, in _run_train self._run_sanity_check() File "C:\Users\jesse\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1411, in _run_sanity_check val_loop.run() File "C:\Users\jesse\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\loops\base.py", line 204, in run self.advance(*args, **kwargs) File "C:\Users\jesse\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\loops\dataloader\evaluation_loop.py", line 153, in advance dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs) File "C:\Users\jesse\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\loops\base.py", line 199, in run self.on_run_start(*args, **kwargs) File "C:\Users\jesse\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\loops\epoch\evaluation_epoch_loop.py", line 87, in on_run_start self._data_fetcher = iter(data_fetcher) File "C:\Users\jesse\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\utilities\fetching.py", line 178, in iter self.dataloader_iter = iter(self.dataloader) File "C:\Users\jesse\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\data\dataloader.py", line 368, in iter return self._get_iterator() File "C:\Users\jesse\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\data\dataloader.py", line 314, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "C:\Users\jesse\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\data\dataloader.py", line 927, in init w.start() File "C:\Users\jesse\AppData\Local\Programs\Python\Python39\lib\multiprocessing\process.py", line 121, in start self._popen = self._Popen(self) File "C:\Users\jesse\AppData\Local\Programs\Python\Python39\lib\multiprocessing\context.py", line 224, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "C:\Users\jesse\AppData\Local\Programs\Python\Python39\lib\multiprocessing\context.py", line 327, in _Popen return Popen(process_obj) File "C:\Users\jesse\AppData\Local\Programs\Python\Python39\lib\multiprocessing\popen_spawn_win32.py", line 93, in init reduction.dump(process_obj, to_child) File "C:\Users\jesse\AppData\Local\Programs\Python\Python39\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) TypeError: cannot pickle 'Environment' object PS C:\Users\jesse\rave> Traceback (most recent call last): File "", line 1, in File "C:\Users\jesse\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "C:\Users\jesse\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 126, in _main self = reduction.pickle.load(from_parent) EOFError: Ran out of input

  • question about prior model

    question about prior model

    Hi, caillonantoine

    In your paper, you said 'To illustrate this, we train a WaveNet-inspired model at generating latent signals in an autoregressive fashion for performing audio synthesis with RAVE', I don't understand your prior model, in your code:

    x = self.encode(batch) # here got the Z code
    x = self.quantized_normal.encode(self.diagonal_shift(x)) # here what did you mean? why do you need it? 
    pred = self.forward(x)
    
    x = torch.argmax(self.split_classes(x[..., 1:]), -1)
    pred = self.split_classes(pred[..., :-1]) # if here predicts next z code, why did you use cross_entropy loss function?
    
    loss = nn.functional.cross_entropy(
        pred.reshape(-1, self.quantized_normal.resolution),
        x.reshape(-1),
    )
    

    could you give me some reference paper or code which you refer? Best ,

  • Error Exporting Converted Rave Model

    Error Exporting Converted Rave Model

    when trying to export a model, after running the conversion script, I now get the following error:

    [1394.96] script model
    Traceback (most recent call last):
      File "/home/user/code/RAVE/export_rave.py", line 245, in <module>
        model = TraceModel(model, resample, args.FIDELITY)
      File "/home/user/code/RAVE/export_rave.py", line 70, in __init__
        latent_size = 2**math.ceil(math.log2(latent_size))
    ValueError: math domain error
    

    any suggestions?

  • Error when training prior

    Error when training prior

    Hi,

    I'm experiencing troubles with training the prior, similar error as described in closed issue #33 . I'm reopening the issue, as I dont think it was completely solved.

    Basically get the same Runtime error, as described here. Training of RAVE was done with custom parameters, but can't seem to access them as there was no instructions_*.txtgenerated when working with the cli_helper.py

    RuntimeError: cannot reshape tensor of 0 elements into shape [8, 0, 128, -1] because the unspecified dimension size -1 can be any value and is ambiguous

    Thanks!

  • Parallel Training

    Parallel Training

    Hey again,

    thanks again for this awesome library. I have been playing around with it for about a month now and I feel like I am finally starting to get the hang of it haha.

    I noticed you have a parallel_traning.sh script. This seems to be for training multiple models in parallel, is that correct?

    I am training these model on my own hardware (which can take some time) and would love to make use of both of the GPUs I have. I tried to modify the script a few weeks ago but ran into errors with python_lightning, specifically with a data class. Is this something you plan on supporting?

    If not, I would love to take a crack at it and make a PR.

    What are your thoughts?

  • Support for multi-channel audio

    Support for multi-channel audio

    Hi!

    I've added an option to train the network on a stereo material. Hypothesis is that extra channels on just first and last conv layers are enough because l/r channels are highly correlated.

    Here are some examples, trained on 60 mins of drum loops for ~1.4m steps:

    Original Encoded Random sampling from a z-space

    Training speed is:

    • 8it/sec for mono
    • 6it/sec for stereo

    That's on my 3090 with batch size 8.

    The code is working with and without PQMF. Here is the same example encoded by a model which is trained for ~1.4m steps with --data-size 1: Encoded

    Exporting code for rave and prior training is also working.

    Note that I haven't modified export_prior/combine_models because it is not yet implemented as I understand. Also please note that the model is not backward compatible, it now expects and returns a 3-d tensor even for a mono audio.

Implementation for "Manga Filling Style Conversion with Screentone Variational Autoencoder" (SIGGRAPH ASIA 2020 issue)

Manga Filling with ScreenVAE SIGGRAPH ASIA 2020 | Project Website | BibTex This repository is for ScreenVAE introduced in the following paper "Manga F

May 14, 2022
Clockwork Variational Autoencoder
Clockwork Variational Autoencoder

Clockwork Variational Autoencoders (CW-VAE) Vaibhav Saxena, Jimmy Ba, Danijar Hafner If you find this code useful, please reference in your paper: @ar

Apr 21, 2022
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech Jaehyeon Kim, Jungil Kong, and Juhee Son In our rece

May 21, 2022
Recurrent Variational Autoencoder that generates sequential data implemented with pytorch

Pytorch Recurrent Variational Autoencoder Model: This is the implementation of Samuel Bowman's Generating Sentences from a Continuous Space with Kim's

May 4, 2022
Variational autoencoder for anime face reconstruction
Variational autoencoder for anime face reconstruction

VAE animeface Variational autoencoder for anime face reconstruction Introduction This repository is an exploratory example to train a variational auto

Dec 11, 2021
PyTorch Autoencoders - Implementing a Variational Autoencoder (VAE) Series in Pytorch.

PyTorch Autoencoders Implementing a Variational Autoencoder (VAE) Series in Pytorch. Inspired by this repository Model List check model paper conferen

Oct 20, 2021
Code of 3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces

3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces Installation After cloning the repo open

Apr 26, 2022
A GPU-optional modular synthesizer in pytorch, 16200x faster than realtime, for audio ML researchers.
A GPU-optional modular synthesizer in pytorch, 16200x faster than realtime, for audio ML researchers.

torchsynth The fastest synth in the universe. Introduction torchsynth is based upon traditional modular synthesis written in pytorch. It is GPU-option

May 13, 2022
Official Implementation of Swapping Autoencoder for Deep Image Manipulation (NeurIPS 2020)
Official Implementation of Swapping Autoencoder for Deep Image Manipulation (NeurIPS 2020)

Swapping Autoencoder for Deep Image Manipulation Taesung Park, Jun-Yan Zhu, Oliver Wang, Jingwan Lu, Eli Shechtman, Alexei A. Efros, Richard Zhang UC

May 11, 2022
Sign Language is detected in realtime using video sequences. Our approach involves MediaPipe Holistic for keypoints extraction and LSTM Model for prediction.
Sign Language is detected in realtime using video sequences. Our approach involves MediaPipe Holistic for keypoints extraction and LSTM Model for prediction.

RealTime Sign Language Detection using Action Recognition Approach Real-Time Sign Language is commonly predicted using models whose architecture consi

Apr 27, 2022
Official implementation for Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder at NeurIPS 2020

Likelihood-Regret Official implementation of Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder at NeurIPS 2020. T

May 2, 2022
This is the official Pytorch implementation of "Lung Segmentation from Chest X-rays using Variational Data Imputation", Raghavendra Selvan et al. 2020
This is the official Pytorch implementation of

README This is the official Pytorch implementation of "Lung Segmentation from Chest X-rays using Variational Data Imputation", Raghavendra Selvan et a

May 4, 2022
The official implementation of the Interspeech 2021 paper WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution.

WSRGlow The official implementation of the Interspeech 2021 paper WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution. Audio sa

Apr 14, 2022
PyTorch Implementation of Realtime Multi-Person Pose Estimation project.

PyTorch Realtime Multi-Person Pose Estimation This is a pytorch version of Realtime_Multi-Person_Pose_Estimation, origin code is here Realtime_Multi-P

Oct 16, 2021
Tacotron 2 - PyTorch implementation with faster-than-realtime inference
Tacotron 2 - PyTorch implementation with faster-than-realtime inference

Tacotron 2 (without wavenet) PyTorch implementation of Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions. This implementati

May 18, 2022
MADE (Masked Autoencoder Density Estimation) implementation in PyTorch
MADE (Masked Autoencoder Density Estimation) implementation in PyTorch

pytorch-made This code is an implementation of "Masked AutoEncoder for Density Estimation" by Germain et al., 2015. The core idea is that you can turn

May 2, 2022
BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation
BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation This is a demo implementation of BYOL for Audio (BYOL-A), a self-sup

May 17, 2022
Official repository of the paper "A Variational Approximation for Analyzing the Dynamics of Panel Data". Mixed Effect Neural ODE. UAI 2021.
Official repository of the paper

Official repository of the paper (UAI 2021) "A Variational Approximation for Analyzing the Dynamics of Panel Data", Mixed Effect Neural ODE. Panel dat

Apr 19, 2022
HNECV: Heterogeneous Network Embedding via Cloud model and Variational inference
HNECV: Heterogeneous Network Embedding via Cloud model and Variational inference

HNECV This repository provides a reference implementation of HNECV as described in the paper: HNECV: Heterogeneous Network Embedding via Cloud model a

Jan 5, 2022