Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks

Bayesian-Torch: Bayesian neural network layers for uncertainty estimation

Get started | Example usage | Documentation | License | Citing

Bayesian layers and utilities to perform stochastic variational inference in PyTorch

Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks. Bayesian-Torch is designed to be flexible and seamless in extending a deterministic deep neural network architecture to corresponding Bayesian form by simply replacing the deterministic layers with Bayesian layers.

The repository has implementations for the following Bayesian layers:

  • Variational layers with Gaussian mixture model (GMM) posteriors using reparameterized Monte Carlo estimators (in pre-alpha)

    LinearMixture
    Conv1dMixture, Conv2dMixture, Conv3dMixture, ConvTranspose1dMixture, ConvTranspose2dMixture, ConvTranspose3dMixture
    LSTMMixture
    

Please refer to documentation of Bayesian layers for details.

Other features include:

Installation

Install from source:

git clone https://github.com/IntelLabs/bayesian-torch
cd bayesian-torch
pip install .

This code has been tested on PyTorch v1.6.0 and torchvision v0.7.0 with python 3.7.7.

Dependencies:

  • Create conda environment with python=3.7
  • Install PyTorch and torchvision packages within conda environment following instructions from PyTorch install guide
  • conda install -c conda-forge accimage
  • pip install tensorboard
  • pip install scikit-learn

Example usage

We have provided example model implementations using the Bayesian layers.

We also provide example usages and scripts to train/evaluate the models. The instructions for CIFAR10 examples is provided below, similar scripts for ImageNet and MNIST are available.

cd bayesian_torch

Training

To train Bayesian ResNet on CIFAR10, run this command:

Mean-field variational inference (Reparameterized Monte Carlo estimator)

sh scripts/train_bayesian_cifar.sh

Mean-field variational inference (Flipout Monte Carlo estimator)

sh scripts/train_bayesian_flipout_cifar.sh

To train deterministic ResNet on CIFAR10, run this command:

Vanilla

sh scripts/train_deterministic_cifar.sh

Evaluation

To evaluate Bayesian ResNet on CIFAR10, run this command:

Mean-field variational inference (Reparameterized Monte Carlo estimator)

sh scripts/test_bayesian_cifar.sh

Mean-field variational inference (Flipout Monte Carlo estimator)

sh scripts/test_bayesian_flipout_cifar.sh

To evaluate deterministic ResNet on CIFAR10, run this command:

Vanilla

sh scripts/test_deterministic_cifar.sh

Citing

If you use this code, please cite as:

@misc{krishnan2020bayesiantorch,
    author = {Ranganath Krishnan and Piero Esposito},
    title = {Bayesian-Torch: Bayesian neural network layers for uncertainty estimation},
    year = {2020},
    publisher = {GitHub},
    howpublished = {\url{https://github.com/IntelLabs/bayesian-torch}}
}

Cite the weight sampling methods as well: Blundell et al. 2015; Wen et al. 2018

Contributors

  • Ranganath Krishnan
  • Piero Esposito

This code is intended for researchers and developers, enables to quantify principled uncertainty estimates from deep neural network predictions using stochastic variational inference in Bayesian neural networks. Feedbacks, issues and contributions are welcome. Email to [email protected] for any questions.

Comments
  • The average should be taken over log probability rather than logits

    The average should be taken over log probability rather than logits

    https://github.com/IntelLabs/bayesian-torch/blob/7abcfe7ff3811c6a5be6326ab91a8d5cb1e8619f/bayesian_torch/examples/main_bayesian_cifar.py#L363-L367 I think the average across the MC runs should be taken over the log probability. However, the output here is the logits before the softmax operation. I think we may first run output = F.log_softmax(output, dim=1) and then take the average.

    There are two equivalent ways to take the average, which I think is more reasonable. The first way is

    for mc_run in range(args.num_mc):
        output, kl = model(input_var)
        output = F.log_softmax(output, dim=1)
        output_.append(output)
        kl_.append(kl)
    output = torch.mean(torch.stack(output_), dim=0)
    loss= F.nll_loss(output, target_var) # this is to replace the original cross_entropy_loss
    

    Or equivalently, we can first take the cross-entropy loss for each MC run, and average the losses at the end:

    loss = 0
    for mc_run in range(args.num_mc):
        output, kl = model(input_var)
        loss = loss + F.cross_entropy(output, target_var, dim=1)
        kl_.append(kl)
    loss = loss / args.num_mc  # this is to replace the original cross_entropy_loss
    
  • Pytorch version issue

    Pytorch version issue

    Hi, it seems that this software only supports Pytorch 1.8.1 LTS or newer version. When I try to use this code in my own project(which is mainly based on Pytorch 1.4.0), some conflicts occur and seemingly unavoidable. Could you please update this software to make it also support some older version Pytorch, for example, 1.4.0. Great Thanks!!!

  • Let the models return prediction only, saving KL Divergence as an attribute

    Let the models return prediction only, saving KL Divergence as an attribute

    Closes #7 .

    Let the user, if they want, to return predictions only on forward method, while saving kl divergence as an attribute. This is important to make it easier to integrate into PyTorch models.

    Also, it does not break the lib as it is: we added a new parameter on forward method that defaults to True and, if manually set to false, returns predictions only.

    Performed the following changes, on all layers:

    • Added return_kl on all forward methods, defaulting to True. If set to false, won't return kl.
    • Added a new kl attribute to each layer, updating it at every feedforward step. Useful when integrating with already-built PyTorch models.

    That should help integrating with PyTorch experiments while keeping backward compatibility towards this lib.

  • Fix KL computation

    Fix KL computation

    Hello,

    I think there might be a few problems in your model definitions.

    In particular:

    • in resnet_flipout.py and resnet_variational.py you only sum the kl of the last block inside self.layerN
    • in resnet_flipout_large.py and resnet_variational_large.py you check for is None while you probably want is not None or actually no check at all since it can't be None in any reasonable setting. Also the str(layer) check is odd since it contains a BasicBlock or BottleNeck object (you're looping over an nn.Sequential of blocks). In fact in this code that string check is very likely superfluous (didn't test this, but I did include it in this PR as example)

    I hope you can confirm and perhaps fix these issues, which will help me (and maybe others) in building on your nice codebase :)

  • conv_transpose2d parameter order problem

    conv_transpose2d parameter order problem

    https://github.com/IntelLabs/bayesian-torch/blob/f6f516e9b3721466aa0036c735475a421cc3ce80/bayesian_torch/layers/variational_layers/conv_variational.py#L784-L786

    image

    When I try to convert a deterministic autoencoder into a bayesian one with a ConvTranspose2d layer I get the error "Exception has occurred: TypeError conv_transpose2d(): argument 'groups' (position 7) must be int, not tuple" which I suspect comes from self.dilation and self.group which are swaped.

  • Enable the Bayesian layer to freeze the parameters to their mean values

    Enable the Bayesian layer to freeze the parameters to their mean values

    I think it would be good to provide an option to freeze the weights and biases to the mean value when inferencing. The forward function would somehow look like this:

    def forward(self, input, sample=True):
        if sample:
            # do sampling and forward as in the current code
        else:
            # set weight=self.mu_weight
            # set bias=self.mu_bias
            # (optional) set kl=0, since it is useless in this case
        return out, kl
    
  • FR: Enable forward method of Bayesian Layers to return value only for smoother integration with PyTorch

    FR: Enable forward method of Bayesian Layers to return value only for smoother integration with PyTorch

    It would be nice if we could store the KL divergence value as an attribute of the Bayesian Layers and return them on the forward method only if needed.

    With that we can have less friction on integration with PyTorch. being able to "plug and play" with bayesian-torch layers on deterministic models.

    It would be something like that:

    def forward(self, x, return_kl=False):
        ...
        self.kl = kl
    
        if return_kl:
            return out, kl
        return out   
    

    We then can get it from the bayesian layers when calculating the loss with no harm or hard changes to the code, which might encourage users to try the lib.

    I can work on that also.

  • modifing the Scripts for any arbitrary output size

    modifing the Scripts for any arbitrary output size

    I am trying to use these repo for classification with 3 output nodes. However, in the fully connected layer, I got the error of size mismatch. Can you help me with this issue?

  • add mu_kernel to of delta_kernel in flipout layers?

    add mu_kernel to of delta_kernel in flipout layers?

    Hi,

    Shouldn't mu_kernel be added to delta_kernel in the code below?

    Best, Lewis

    diff --git a/bayesian_torch/layers/flipout_layers/conv_flipout.py b/bayesian_torch/layers/flipout_layers/conv_flipout.py
    index 4b3e88d..719cfdc 100644
    --- a/bayesian_torch/layers/flipout_layers/conv_flipout.py
    +++ b/bayesian_torch/layers/flipout_layers/conv_flipout.py
    @@ -165,7 +165,7 @@ class Conv1dFlipout(BaseVariationalLayer_):
             sigma_weight = torch.log1p(torch.exp(self.rho_kernel))
             eps_kernel = self.eps_kernel.data.normal_()
     
    -        delta_kernel = (sigma_weight * eps_kernel)
    +        delta_kernel = (sigma_weight * eps_kernel) + self.mu_kernel 
     
             kl = self.kl_div(self.mu_kernel, sigma_weight, self.prior_weight_mu,
                              self.prior_weight_sigma)
    
  • dnn_to_bnn() with LSTM network

    dnn_to_bnn() with LSTM network

    I have a problem with the dnn_to_bnn() transformation of my lstm network. I define my lstm network the following:

    model = []
    n_layers = self.suggest_hyperparam_to_optuna('n_layers')
    p = self.suggest_hyperparam_to_optuna('dropout_rate')
    n_feature = self.n_features
    lstm_hidden_dim = self.suggest_hyperparam_to_optuna('lstm_hidden_dim')
    
    model.append(PrepareForlstm())
    model.append(torch.nn.LSTM(input_size=n_feature, hidden_size=lstm_hidden_dim, num_layers=n_layers,
                               dropout=p))
    model.append(GetOutputZero())
    model.append(PrepareForDropout())
    model.append(torch.nn.Dropout(p))
    model.append(torch.nn.Linear(in_features=lstm_hidden_dim, out_features=self.n_outputs))
    model = torch.nn.Sequential(*model)
    

    The classes PrepareForlstm, GetOutputZero, and PrepareForDropout are defined as follows:

    # Class to only get the first output of a lstm layer
    class GetOutputZero(torch.nn.Module):
        def __init__(self):
            super(GetOutputZero, self).__init__()
    
        def forward(self, x):
            lstm_out, (hn, cn) = x
            return lstm_out
    
    
    # Class to reshape the data suitable for lstm layer
    class PrepareForlstm(torch.nn.Module):
        def __init__(self):
            super(PrepareForlstm, self).__init__()
    
        def forward(self, x):
            return x.view(x.shape[0], x.shape[1], -1)
    
    
    # Class to reshape data suitable for dropout layer
    class PrepareForDropout(torch.nn.Module):
        def __init__(self):
            super(PrepareForDropout, self).__init__()
    
        def forward(self, lstm_out):
            return lstm_out[:, -1, :]
    

    This network works fine for me. Now I wanted to try out the dnn_to_bnn() API and transform it to a bayesian lstm with:

    bnn_prior_parameters = {
                "prior_mu": self.suggest_hyperparam_to_optuna('prior_mu'),
                "prior_sigma": self.suggest_hyperparam_to_optuna('prior_sigma'),
                "posterior_mu_init": self.suggest_hyperparam_to_optuna('posterior_mu_init'),
                "posterior_rho_init": self.suggest_hyperparam_to_optuna('posterior_rho_init'),
                "type": self.suggest_hyperparam_to_optuna('type'),
                "moped_enable": self.suggest_hyperparam_to_optuna('moped_enable')
            }
    dnn_to_bnn(model, bnn_prior_parameters)
    

    When executing my code, I get the following traceback:

    File "/home/josef/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/home/josef/.local/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward
        input = module(input)
      File "/home/josef/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/home/josef/.local/lib/python3.8/site-packages/bayesian_torch/layers/variational_layers/rnn_variational.py", line 126, in forward
        ff_i, kl_i = self.ih(x_t)
      File "/home/josef/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/home/josef/.local/lib/python3.8/site-packages/bayesian_torch/layers/variational_layers/linear_variational.py", line 164, in forward
        out = F.linear(input, weight, bias)
      File "/home/josef/.local/lib/python3.8/site-packages/torch/nn/functional.py", line 1690, in linear
        ret = torch.addmm(bias, input, weight.t())
    RuntimeError: mat1 and mat2 shapes cannot be multiplied (4x1 and 23x180)
    

    So the size of the input does not match the size of weight in out = F.linear(input, weight, bias) in line 164 in linear_variational.py. I tried to trace back how this mats came about, but have not clue why the network does not work anymore. Maybe someone of you has a clue?

  • Dnn2bnn

    Dnn2bnn

    PR supports DNN to BNN auto conversion. One can create a torch.nn DNN model and call dnn_to_bnn() to convert it to BNN model. An example script examples/main_bayesian_cifar_dnn2bnn.py is provided.

  • How bayesian network be used in quantization?

    How bayesian network be used in quantization?

    @peteriz @jpablomch @ranganathkrishnan we can get the $\mu$ and $\sigma$ of each weight ,so how it can be used to quantization which mapping the $w_{float32 }$ into $w_{int8}$

Python package facilitating the use of Bayesian Deep Learning methods with Variational Inference for PyTorch
Python package facilitating the use of Bayesian Deep Learning methods with Variational Inference for PyTorch

PyVarInf PyVarInf provides facilities to easily train your PyTorch neural network models using variational inference. Bayesian Deep Learning with Vari

Sep 11, 2022
Code for "Infinitely Deep Bayesian Neural Networks with Stochastic Differential Equations"
Code for

Infinitely Deep Bayesian Neural Networks with SDEs This library contains JAX and Pytorch implementations of neural ODEs and Bayesian layers for stocha

Nov 26, 2021
Compare outputs between layers written in Tensorflow and layers written in Pytorch

Compare outputs of Wasserstein GANs between TensorFlow vs Pytorch This is our testing module for the implementation of improved WGAN in Pytorch Prereq

Dec 17, 2021
Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready inference.
Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready inference.

Yolov5 running on TorchServe (GPU compatible) ! This is a dockerfile to run TorchServe for Yolo v5 object detection model. (TorchServe (PyTorch librar

Sep 15, 2022
Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks
Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks

Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks (SDPoint) This repository contains the cod

Jul 4, 2022
A machine learning library for spiking neural networks. Supports training with both torch and jax pipelines, and deployment to neuromorphic hardware.
A machine learning library for spiking neural networks. Supports training with both torch and jax pipelines, and deployment to neuromorphic hardware.

Rockpool Rockpool is a Python package for developing signal processing applications with spiking neural networks. Rockpool allows you to build network

Sep 14, 2022
HNECV: Heterogeneous Network Embedding via Cloud model and Variational inference
HNECV: Heterogeneous Network Embedding via Cloud model and Variational inference

HNECV This repository provides a reference implementation of HNECV as described in the paper: HNECV: Heterogeneous Network Embedding via Cloud model a

Jun 28, 2022
Library extending Jupyter notebooks to integrate with Apache TinkerPop and RDF SPARQL.
Library extending Jupyter notebooks to integrate with Apache TinkerPop and RDF SPARQL.

Graph Notebook: easily query and visualize graphs The graph notebook provides an easy way to interact with graph databases using Jupyter notebooks. Us

Sep 23, 2022
Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.

Core ML Tools Use coremltools to convert machine learning models from third-party libraries to the Core ML format. The Python package contains the sup

Sep 21, 2022
Python Library for learning (Structure and Parameter) and inference (Statistical and Causal) in Bayesian Networks.

pgmpy pgmpy is a python library for working with Probabilistic Graphical Models. Documentation and list of algorithms supported is at our official sit

Sep 24, 2022
Torch implementation of "Enhanced Deep Residual Networks for Single Image Super-Resolution"
Torch implementation of

NTIRE2017 Super-resolution Challenge: SNU_CVLab Introduction This is our project repository for CVPR 2017 Workshop (2nd NTIRE). We, Team SNU_CVLab, (B

Sep 19, 2022
【CVPR 2021, Variational Inference Framework, PyTorch】 From Rain Generation to Rain Removal
【CVPR 2021, Variational Inference Framework, PyTorch】 From Rain Generation to Rain Removal

From Rain Generation to Rain Removal (CVPR2021) Hong Wang, Zongsheng Yue, Qi Xie, Qian Zhao, Yefeng Zheng, and Deyu Meng [PDF&&Supplementary Material]

Sep 13, 2022
Pytorch Implementation of paper "Noisy Natural Gradient as Variational Inference"
Pytorch Implementation of paper

Noisy Natural Gradient as Variational Inference PyTorch implementation of Noisy Natural Gradient as Variational Inference. Requirements Python 3 Pytor

Aug 15, 2022
PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices.

PyTorch-LIT PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices. With

Sep 13, 2022
Extending JAX with custom C++ and CUDA code

Extending JAX with custom C++ and CUDA code This repository is meant as a tutorial demonstrating the infrastructure required to provide custom ops in

Sep 23, 2022
Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)

AudioCLIP Extending CLIP to Image, Text and Audio This repository contains implementation of the models described in the paper arXiv:2106.13043. This

Sep 20, 2022
Storchastic is a PyTorch library for stochastic gradient estimation in Deep Learning
Storchastic is a PyTorch library for stochastic gradient estimation in Deep Learning

Storchastic is a PyTorch library for stochastic gradient estimation in Deep Learning

Sep 3, 2022
Improving Deep Network Debuggability via Sparse Decision Layers
 Improving Deep Network Debuggability via Sparse Decision Layers

Improving Deep Network Debuggability via Sparse Decision Layers This repository contains the code for our paper: Leveraging Sparse Linear Layers for D

Jun 13, 2022