[ICLR 2022 Oral] F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization

F8Net
Fixed-Point 8-bit Only Multiplication for Network Quantization (ICLR 2022 Oral)

OpenReview | arXiv | PDF | Model Zoo | BibTex

PyTorch implementation of neural network quantization with fixed-point 8-bit only multiplication.

F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization
Qing Jin1,2, Jian Ren1, Richard Zhuang1, Sumant Hanumante1, Zhengang Li2, Zhiyu Chen3, Yanzhi Wang2, Kaiyuan Yang3, Sergey Tulyakov1
1Snap Inc., 2Northeastern University, 3Rice University
ICLR 2022 Oral.

Overview Neural network quantization implements efficient inference via reducing the weight and input precisions. Previous methods for quantization can be categorized as simulated quantization, integer-only quantization, and fixed-point quantization, with the former two involving high-precision multiplications with 32-bit floating-point or integer scaling. In contrast, fixed-point models can avoid such high-demanding requirements but demonstrates inferior performance to the other two methods. In this work, we study the problem of how to train such models. Specifically, we conduct statistical analysis on values for quantization and propose to determine the fixed-point format from data during training with some semi-empirical formula. Our method demonstrates that high-precision multiplication is not necessary for the quantized model to achieve comparable performance as their full-precision counterparts.

Getting Started

Requirements
  1. Please check the requirements and download packages.

  2. Prepare ImageNet-1k data following pytorch example, and create a softlink to the ImageNet data path to data under current the code directory (ln -s /path/to/imagenet data).

Model Training
Conventional training
  • We train the model with the file distributed_run.sh and the command
    bash distributed_run.sh /path/to/yml_file batch_size
    
  • We set batch_size=2048 for conventional training of floating-/fixed-point ResNet18 and MobileNet V1/V2.
  • Before training, please update the dataset_dir and log_dir arguments in the yaml files for training the floating-/fixed-point models.
  • To train the floating-point model, please use the yaml file ***_floating_train.yml in the conventional subfolder under the corresponding folder of the model.
  • To train the fixed-point model, please first train the floating-point model as the initialization. Please use the yaml file ***_fix_quant_train.yml in the conventional subfolder under the corresponding folder of the model. Please make sure the argument fp_pretrained_file directs to the correct path for the corresponding floating-point checkpoint. We also provide our pretrained floating-point models in the Model Zoo below.
Tiny finetuning
  • We finetune the model with the file run.sh and the command

    bash run.sh /path/to/yml_file batch_size
    
  • We set batch_size=128 and use one GPU for tiny-finetuning of fixed-point ResNet18/50.

  • Before fine-tuning, please update the dataset_dir and log_dir arguments in the yaml files for finetuning the fixed-point models.

  • To finetune the fixed-point model, please use the yaml file ***_fix_quant_***_pretrained_train.yml in the tiny_finetuning subfolder under the corresponding folder of the model. For model pretrained with PytorchCV (Baseline of ResNet18 and Baseline#1 of ResNet50), the floating-point checkpoint will be downloaded automatically during code running. For the model pretrained by Nvidia (Baseline#2 of ResNet50), please download the checkpoint first and make sure the argument nvidia_pretrained_file directs to the correct path of this checkpoint.

Model Testing
  • We test the model with the file run.sh and the command

    bash run.sh /path/to/yml_file batch_size
    
  • We set batch_size=128 and use one GPU for model testing.

  • Before testing, please update the dataset_dir and log_dir arguments in the yaml files. Please update the argument integize_file_path and int_op_only_file_path arguments in the yaml files ***_fix_quant_test***_integize.yml and ***_fix_quant_test***_int_op_only.yml, respectively. Please also update other arguments like nvidia_pretrained_file if necessary (even if they are not used during testing).

  • We use the yaml file ***_floating_test.yml for testing the floating-point model; ***_fix_quant***_test.yml for testing the fixed-point model with the same setting as during training/tiny-finetuning; ***_fix_quant***_test_int_model.yml for testing the fixed-point model on GPU with all quantized weights, bias and inputs implemented with integers (but with float dtype as GPU does not support integer operations) and use the original modules during training (e.g. with batch normalization layers); ***_fix_quant***_test_integize.yml for testing the fixed-point model on GPU with all quantized weights, bias and inputs implemented with integers (but with float dtype as GPU does not support integer operations) and a new equivalent model with only convolution, pooling and fully-connected layers; ***_fix_quant***_test_int_op_only.yml for testing the fixed-point model on CPU with all quantized weights, bias and inputs implemented with integers (with int dtype) and a new equivalent model with only convolution, pooling and fully-connected layers. Note that the accuracy from the four testing files can differ a little due to numerical error.

Model Export
  • We export fixed-point model with integer weights, bias and inputs to run on GPU and CPU during model testing with ***_fix_quant_test_integize.yml and ***_fix_quant_test_int_op_only.yml files, respectively.

  • The exported onnx files are saved to the path given by the arguments integize_file_path and int_op_only_file_path.

F8Net Model Zoo

All checkpoints and onnx files are available at here.

Conventional

Model Type Top-1 Acc.a Checkpoint
ResNet18 FP
8-bit
70.3
71.0
Res18_32
Res18_8
MobileNet-V1 FP
8-bit
72.4
72.8
MBV1_32
MBV1_8
MobileNet-V2b FP
8-bit
72.7
72.6
MBV2b_32
MBV2b_8

Tiny Finetuning

Model Type Top-1 Acc.a Checkpoint
ResNet18 FP
8-bit
73.1
72.3
Res18_32p
Res18_8p
ResNet50b (BL#1) FP
8-bit
77.6
77.6
Res50b_32p
Res50b_8p
ResNet50b (BL#2) FP
8-bit
78.5
78.1
Res50b_32n
Res50b_8n

a The accuracies are obtained from the inference step during training. Test accuracy for the final exported model might have some small accuracy difference due to numerical error.

Technical Details

The main techniques for neural network quantization with 8-bit fixed-point multiplication includes the following:

  • Quantized methods/modules including determining fixed-point formats from statistics or by grid-search, fusing convolution and batch normalization layers, and reformulating PACT with fixed-point quantization are implemented in models/fix_quant_ops.
  • Clipping-level sharing and private fractional length for residual blocks are implemented in the ResNet (models/fix_resnet) and MobileNet V2 (models/fix_mobilenet_v2).

Acknowledgement

This repo is based on AdaBits.

Citation

If our code or models help your work, please cite our paper:

@inproceedings{
  jin2022fnet,
  title={F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization},
  author={Qing Jin and Jian Ren and Richard Zhuang and Sumant Hanumante and Zhengang Li and Zhiyu Chen and Yanzhi Wang and Kaiyuan Yang and Sergey Tulyakov},
  booktitle={International Conference on Learning Representations},
  year={2022},
  url={https://openreview.net/forum?id=_CfpJazzXT2}
}
Similar Resources

This project is the official implementation of our accepted ICLR 2021 paper BiPointNet: Binary Neural Network for Point Clouds.

This project is the official implementation of our accepted ICLR 2021 paper BiPointNet: Binary Neural Network for Point Clouds.

BiPointNet: Binary Neural Network for Point Clouds Created by Haotong Qin, Zhongang Cai, Mingyuan Zhang, Yifu Ding, Haiyu Zhao, Shuai Yi, Xianglong Li

May 14, 2022

[CVPR 2021 Oral] Variational Relational Point Completion Network

[CVPR 2021 Oral] Variational Relational Point Completion Network

VRCNet: Variational Relational Point Completion Network This repository contains the PyTorch implementation of the paper: Variational Relational Point

May 13, 2022

This is project is the implementation of the DeepShift: Towards Multiplication-Less Neural Networks paper

This is project is the implementation of the DeepShift: Towards Multiplication-Less Neural Networks paper

DeepShift This is project is the implementation of the DeepShift: Towards Multiplication-Less Neural Networks paper, that aims to replace multiplicati

Apr 21, 2022

TensorFlow implementation of PHM (Parameterization of Hypercomplex Multiplication)

TensorFlow implementation of PHM (Parameterization of Hypercomplex Multiplication)

Parameterization of Hypercomplex Multiplications (PHM) This repository contains the TensorFlow implementation of PHM (Parameterization of Hypercomplex

Apr 29, 2022

Unofficial implementation of Point-Unet: A Context-Aware Point-Based Neural Network for Volumetric Segmentation

Point-Unet This is an unofficial implementation of the MICCAI 2021 paper Point-Unet: A Context-Aware Point-Based Neural Network for Volumetric Segment

May 2, 2022

[ICLR 2021 Spotlight Oral] "Undistillable: Making A Nasty Teacher That CANNOT teach students", Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang

Undistillable: Making A Nasty Teacher That CANNOT teach students "Undistillable: Making A Nasty Teacher That CANNOT teach students" Haoyu Ma, Tianlong

May 17, 2022

Based on the paper "Geometry-aware Instance-reweighted Adversarial Training" ICLR 2021 oral

Based on the paper

Geometry-aware Instance-reweighted Adversarial Training This repository provides codes for Geometry-aware Instance-reweighted Adversarial Training (ht

May 14, 2022

PyTorch implementation for Score-Based Generative Modeling through Stochastic Differential Equations (ICLR 2021, Oral)

PyTorch implementation for Score-Based Generative Modeling through Stochastic Differential Equations (ICLR 2021, Oral)

Score-Based Generative Modeling through Stochastic Differential Equations This repo contains a PyTorch implementation for the paper Score-Based Genera

May 19, 2022

Official codebase for ICLR oral paper Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling

CLIORA This is the official codebase for ICLR oral paper: Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling. We introduce

May 20, 2022
Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation. In CVPR 2022.
Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation. In CVPR 2022.

Nonuniform-to-Uniform Quantization This repository contains the training code of N2UQ introduced in our CVPR 2022 paper: "Nonuniform-to-Uniform Quanti

May 12, 2022
DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight or group of weights, in order to achieve a given trade-off between model size and accuracy.

Differentiable Model Compression via Pseudo Quantization Noise DiffQ performs differentiable quantization using pseudo quantization noise. It can auto

Apr 17, 2022
Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.
Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.

HAWQ: Hessian AWare Quantization HAWQ is an advanced quantization library written for PyTorch. HAWQ enables low-precision and mixed-precision uniform

May 12, 2022
QTool: A Low-bit Quantization Toolbox for Deep Neural Networks in Computer Vision

This project provides abundant choices of quantization strategies (such as the quantization algorithms, training schedules and empirical tricks) for quantizing the deep neural networks into low-bit counterparts.

May 16, 2022
A PyTorch implementation of ICLR 2022 Oral paper PiCO: Contrastive Label Disambiguation for Partial Label Learning
A PyTorch implementation of ICLR 2022 Oral paper PiCO: Contrastive Label Disambiguation for Partial Label Learning

PiCO: Contrastive Label Disambiguation for Partial Label Learning This is a PyTorch implementation of ICLR 2022 Oral paper PiCO; also see our Project

May 11, 2022
[ICCV 2021 Oral] SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer
[ICCV 2021 Oral] SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer

This repository contains the source code for the paper SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer (ICCV 2021 Oral). The project page is here.

May 19, 2022
PyTorch implementation of Wide Residual Networks with 1-bit weights by McDonnell (ICLR 2018)
PyTorch implementation of Wide Residual Networks with 1-bit weights by McDonnell (ICLR 2018)

1-bit Wide ResNet PyTorch implementation of training 1-bit Wide ResNets from this paper: Training wide residual networks for deployment using a single

Mar 26, 2022
Neural Fixed-Point Acceleration for Convex Optimization

Licensing The majority of neural-scs is licensed under the CC BY-NC 4.0 License, however, portions of the project are available under separate license

Dec 22, 2021
(CVPR 2022 Oral) Official implementation for "Surface Representation for Point Clouds"
(CVPR 2022 Oral) Official implementation for

RepSurf - Surface Representation for Point Clouds [CVPR 2022 Oral] By Haoxi Ran* , Jun Liu, Chengjie Wang ( * : corresponding contact) The pytorch off

May 20, 2022
I-BERT: Integer-only BERT Quantization
I-BERT: Integer-only BERT Quantization

I-BERT: Integer-only BERT Quantization HuggingFace Implementation I-BERT is also available in the master branch of HuggingFace! Visit the following li

May 18, 2022