๐ŸŽ๏ธ Accelerate training and inference of ๐Ÿค— Transformers with easy to use hardware optimization tools

ONNX Runtime neural_compressor

Hugging Face Optimum

๐Ÿค— Optimum is an extension of ๐Ÿค— Transformers, providing a set of performance optimization tools enabling maximum efficiency to train and run models on targeted hardware.

The AI ecosystem evolves quickly and more and more specialized hardware along with their own optimizations are emerging every day. As such, Optimum enables users to efficiently use any of these platforms with the same ease inherent to transformers.

Integration with Hardware Partners

๐Ÿค— Optimum aims at providing more diversity towards the kind of hardware users can target to train and finetune their models.

To achieve this, we are collaborating with the following hardware manufacturers in order to provide the best transformers integration:

Optimizing models towards inference

Along with supporting dedicated AI hardware for training, Optimum also provides inference optimizations towards various frameworks and platforms.

We currently support ONNX runtime along with Intel Neural Compressor (INC).

Features ONNX Runtime Intel Neural Compressor
Post-training Dynamic Quantization โœ”๏ธ โœ”๏ธ
Post-training Static Quantization โœ”๏ธ โœ”๏ธ
Quantization Aware Training (QAT) Stay tuned! โญ โœ”๏ธ
Pruning N/A โœ”๏ธ

Installation

๐Ÿค— Optimum can be installed using pip as follows:

python -m pip install optimum

If you'd like to use the accelerator-specific features of ๐Ÿค— Optimum, you can install the required dependencies according to the table below:

Accelerator Installation
ONNX runtime python -m pip install optimum[onnxruntime]
Intel Neural Compressor (INC) python -m pip install optimum[intel]
Graphcore IPU python -m pip install optimum[graphcore]
Habana Gaudi Processor (HPU) python -m pip install optimum[habana]

If you'd like to play with the examples or need the bleeding edge of the code and can't wait for a new release, you can install the base library from source as follows:

python -m pip install git+https://github.com/huggingface/optimum.git

For the accelerator-specific features, you can install them by appending #egg=optimum[accelerator_type] to the pip command, e.g.

python -m pip install git+https://github.com/huggingface/optimum.git#egg=optimum[onnxruntime]

Quickstart

At its core, ๐Ÿค— Optimum uses configuration objects to define parameters for optimization on different accelerators. These objects are then used to instantiate dedicated optimizers, quantizers, and pruners.

Quantization

For example, here's how you can apply dynamic quantization with ONNX Runtime:

from optimum.onnxruntime.configuration import AutoQuantizationConfig
from optimum.onnxruntime import ORTQuantizer

# The model we wish to quantize
model_checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
# The type of quantization to apply
qconfig = AutoQuantizationConfig.arm64(is_static=False, per_channel=False)
quantizer = ORTQuantizer.from_pretrained(model_checkpoint, feature="sequence-classification")

# Quantize the model!
quantizer.export(
    onnx_model_path="model.onnx",
    onnx_quantized_model_output_path="model-quantized.onnx",
    quantization_config=qconfig,
)

In this example, we've quantized a model from the Hugging Face Hub, but it could also be a path to a local model directory. The feature argument in the from_pretrained() method corresponds to the type of task that we wish to quantize the model for. The result from applying the export() method is a model-quantized.onnx file that can be used to run inference. Here's an example of how to load an ONNX Runtime model and generate predictions with it:

from functools import partial
from datasets import Dataset
from optimum.onnxruntime.model import ORTModel

# Load quantized model
ort_model = ORTModel("model-quantized.onnx", quantizer._onnx_config)
# Create a dataset or load one from the Hub
ds = Dataset.from_dict({"sentence": ["I love burritos!"]})
# Tokenize the inputs
def preprocess_fn(ex, tokenizer):
    return tokenizer(ex["sentence"])

tokenized_ds = ds.map(partial(preprocess_fn, tokenizer=quantizer.tokenizer))
ort_outputs = ort_model.evaluation_loop(tokenized_ds)
# Extract logits!
ort_outputs.predictions

Similarly, you can apply static quantization by simply setting is_static to True when instantiating the QuantizationConfig object:

qconfig = AutoQuantizationConfig.arm64(is_static=True, per_channel=False)

Static quantization relies on feeding batches of data through the model to estimate the activation quantization parameters ahead of inference time. To support this, ๐Ÿค— Optimum allows you to provide a calibration dataset. The calibration dataset can be a simple Dataset object from the ๐Ÿค— Datasets library, or any dataset that's hosted on the Hugging Face Hub. For this example, we'll pick the sst2 dataset that the model was originally trained on:

from optimum.onnxruntime.configuration import AutoCalibrationConfig

# Create the calibration dataset
calibration_dataset = quantizer.get_calibration_dataset(
    "glue",
    dataset_config_name="sst2",
    preprocess_function=partial(preprocess_fn, tokenizer=quantizer.tokenizer),
    num_samples=50,
    dataset_split="train",
)
# Create the calibration configuration containing the parameters related to calibration.
calibration_config = AutoCalibrationConfig.minmax(calibration_dataset)
# Perform the calibration step: computes the activations quantization ranges
ranges = quantizer.fit(
    dataset=calibration_dataset,
    calibration_config=calibration_config,
    onnx_model_path="model.onnx",
    operators_to_quantize=qconfig.operators_to_quantize,
)
# Quantize the same way we did for dynamic quantization!
quantizer.export(
    onnx_model_path="model.onnx",
    onnx_quantized_model_output_path="model-quantized.onnx",
    calibration_tensors_range=ranges,
    quantization_config=qconfig,
)

Graph optimization

Then let's take a look at applying graph optimizations techniques such as operator fusion and constant folding. As before, we load a configuration object, but this time by setting the optimization level instead of the quantization approach:

from optimum.onnxruntime.configuration import OptimizationConfig

# optimization_config=99 enables all available graph optimisations
optimization_config = OptimizationConfig(optimization_level=99)

Next, we load an optimizer to apply these optimisations to our model:

from optimum.onnxruntime import ORTOptimizer

optimizer = ORTOptimizer.from_pretrained(
    model_checkpoint,
    feature="sequence-classification",
)

# Export the optimized model
optimizer.export(
    onnx_model_path="model.onnx",
    onnx_optimized_model_output_path="model-optimized.onnx",
    optimization_config=optimization_config,
)

And that's it - the model is now optimized and ready for inference!

As you can see, the process is similar in each case:

  1. Define the optimization / quantization strategies via an OptimizationConfig / QuantizationConfig object
  2. Instantiate an ORTQuantizer or ORTOptimizer class
  3. Apply the export() method
  4. Run inference

Training

Besides supporting ONNX Runtime inference, ๐Ÿค— Optimum also supports ONNX Runtime training, reducing the memory and computations needed during training. This can be achieved by using the class ORTTrainer, which possess a similar behavior than the Trainer of ๐Ÿค— Transformers:

-from transformers import Trainer
+from optimum.onnxruntime import ORTTrainer

# Step 1: Create your ONNX Runtime Trainer
-trainer = Trainer(
+trainer = ORTTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=default_data_collator,
    feature="sequence-classification",
)

# Step 2: Use ONNX Runtime for training and evalution!๐Ÿค—
train_result = trainer.train()
eval_metrics = trainer.evaluate()

By replacing Trainer by ORTTrainer, you will be able to leverage ONNX Runtime for fine-tuning tasks.

Check out the examples directory for more sophisticated usage.

Happy optimizing ๐Ÿค— !

Owner
Hugging Face
The AI community building the future.
Hugging Face
Comments
  • disable or remove non-implemented `entry_points`

    disable or remove non-implemented `entry_points`

    The code for the following three entry_points do not exist currently. Please either disable them by commenting them out, until implemented, or remove them entirely.

    • optimum.onnxruntime.convert:main
    • optimum.onnxruntime.optimize_model:main
    • optimum.onnxruntime.convert_and_optimize:main

    https://github.com/huggingface/optimum/blob/1ac1f767815f1583a6228b8011e7eef54dd9cf4b/setup.py#L68-L73

    Current repo structure: image

  • Allow onnxruntime quantization preprocessor for dynamic quantization

    Allow onnxruntime quantization preprocessor for dynamic quantization

    What does this PR do?

    Currently, for the onnxruntime backend, the QuantizationPreprocessor is usable only for static quantization to exclude nodes to quantize, because the onnx model needs to be already saved when initializing QuantizationPreprocessor, which was handled by partial_fit method used during calibration.

    With this PR, it is possible to use QuantizationPreprocessor for dynamic quantization (if it happens to be relevant at some point -- at least I would like to test it), while making no change to the current workflow.

    Before submitting

    • QuantizationPreprocessor is largely (publicly) untested and documented, in a future PR we could improve that.

    This follows up https://github.com/huggingface/optimum/pull/166 , I messed up with my fork.

  • Installation issues

    Installation issues

    I have Python 3.6.9 and i get the following installation issues.

    (venv_hf_optimum)$ pip install "optimum[onnxruntime]==1.2.0"
    Collecting optimum[onnxruntime]==1.2.0
      Cache entry deserialization failed, entry ignored
      Could not find a version that satisfies the requirement optimum[onnxruntime]==1.2.0 (from versions: 0.0.1, 0.1.0, 0.1.1, 0.1.2a0, 0.1.2, 0.1.3, 1.0.0, 1.1.0, 1.1.1)
    No matching distribution found for optimum[onnxruntime]==1.2.0
    
    (venv_hf_optimum)$ python -m pip install optimum
    Collecting optimum
      Cache entry deserialization failed, entry ignored
      Downloading https://files.pythonhosted.org/packages/a5/05/4f31c8ff3b01f8d99a6352528d221210341bf4b38859e8747cfc19c5cd9d/optimum-1.1.1.tar.gz (66kB)
        100% |โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 71kB 755kB/s 
        Complete output from command python setup.py egg_info:
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
          File "/tmp/pip-build-vwtwg4zk/optimum/setup.py", line 3, in <module>
            from setuptools import find_namespace_packages, setup
        ImportError: cannot import name 'find_namespace_packages'
        
        ----------------------------------------
    Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-vwtwg4zk/optimum/
    
  • Update Dockerfile for onnxruntime 1.11.0

    Update Dockerfile for onnxruntime 1.11.0

    What does this PR do?

    Add new docker file for onnxruntime-training 1.11.0. This version solves the incompatibility between onnxruntime training and mixed-precision training.

  • add `optimum` to conda-forge channel on `conda`

    add `optimum` to conda-forge channel on `conda`

    :memo: It will be nice to have optimum on conda-forge for installation and for downstream applications that use optimum and want to make themselves available on conda-forge channel.

    :point_right: I have already started the work on this. The PR is very close to getting merged. :fire:

    • PR: https://github.com/conda-forge/staged-recipes/pull/18300

    :information_source: I will share updates on the availability of optimum on conda-forge, once it gets merged.

    :zap: With this addition, users will be able to install optimum with:

    conda install -c conda-forge optimum
    
  • Error: Expected shape from model of {} does not match actual shape of {1,1,1} for output

    Error: Expected shape from model of {} does not match actual shape of {1,1,1} for output

    Problem

    I'm getting the following error when I'm trying to apply static quantization (ONNX) with the ORTQuantizer .

    image

    Tests

    This error occurs for:

    • my custom script
    • the example code in the README.md
    • The example [notebook] in this repository (https://github.com/huggingface/notebooks/blob/master/examples/text_classification_quantization_ort.ipynb)
    • image
    • a brand new project with only transformers, datasets and optimum[onnxruntime] installed
    • a brand new project with only transformers, datasets and optimum[onnxruntime] (with python -m pip install git+https://github.com/huggingface/optimum.git installed)

    More

    • The resulting model-quantized.onnx can be loaded but produces very bad results.
    • dynamic quantization works seamlessy
    • using:
      • Python 3.9
      • tested on two different devices with different operating systems:
        • MacOS Monterey (with Intel)
        • WSL for Windows 11 (Ubuntu)
  • add more examples[support dynamic, static and aware_training quantization]

    add more examples[support dynamic, static and aware_training quantization]

    What does this PR do?

    This PR will provide more available tasks for user to apply. Tasks can be apply with dynamic, static or aware_training quantization with INC.

    Task includes:

    • [multiple-choice] Done
    • [summarization] Done
    • [language-modeling] CLM Done, MLM&PLM Done.
      • limitation: a seed set in MLM&PLM take_eval_step to ensure verify loading success.
    • [translation] Done. issues#14268, PR#14276 fixed the issue.
  • optimum inference for summarization

    optimum inference for summarization

    With reference to the blog: https://huggingface.co/blog/optimum-inference, I am able to do this:

    from transformers import AutoTokenizer, pipeline
    -from transformers import AutoModelForQuestionAnswering
    +from optimum.onnxruntime import ORTModelForQuestionAnswering
    
    -model = AutoModelForQuestionAnswering.from_pretrained("deepset/roberta-base-squad2") # pytorch checkpoint
    +model = ORTModelForQuestionAnswering.from_pretrained("optimum/roberta-base-squad2") # onnx checkpoint
    tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")
    
    optimum_qa = pipeline("question-answering", model=model, tokenizer=tokenizer)
    
    question = "What's my name?"
    context = "My name is Philipp and I live in Nuremberg."
    pred = optimum_qa(question, context)
    

    I need to do similar inference for summarization, for the following code:

    from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
    tokenizer = AutoTokenizer.from_pretrained("sshleifer/distilbart-cnn-12-6")
    model = AutoModelForSeq2SeqLM.from_pretrained("sshleifer/distilbart-cnn-12-6")
    
    

    I get the following error:

    >>> from optimum.onnxruntime import ORTModelForSeq2SeqLM
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ImportError: cannot import name 'ORTModelForSeq2SeqLM' from 'optimum.onnxruntime' (/datadrive/shilpa/work/virtual_environments/venv_hf_optimum/lib/python3.9/site-packages/optimum/onnxruntime/__init__.py)
    

    https://huggingface.co/docs/optimum/main/en/pipelines - mentions "text-generation" as one of the supported tasks, "summarization" i assumed comes under this category. Am I right?

  • Add a top level init

    Add a top level init

    What does this PR do?

    A __init__.py was I think missing at the optimum/ level to allow to import the introduced optimum pipeline in https://github.com/huggingface/optimum/commit/a31e59eddafc53f6146a4850ffe91d4cfc00c0a6 as from optimum import pipeline as suggested in the doc: https://huggingface.co/docs/optimum/main/en/pipelines#optimum-pipelines-for-inference

    @philschmid (I can not add reviewers so I ping you)

    Before submitting

    • [x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [Patch] Add loss for ORT inference

    [Patch] Add loss for ORT inference

    What does this PR do?

    • Wrap OnnxConfig by wrap_onnx_config_for_loss to obtain the loss while using ORTTrainer under the mode inference_with_ort=True.
    • Enable deepspeed for ONNX Runtime training. (Tested with ZeRO stage 2, full availability under progress)
    • Clean up unused dependencies in ORTTrainer.
    • Update CI of onnxruntime training.
    • Update associated tests.
  • Optimum Pruning and Quantization Current Limitation

    Optimum Pruning and Quantization Current Limitation

    I just added a topic on the Huggingface forum about limitations that I found while trying out the Huggingface Optimum on text classification and text summarization tasks.

    https://discuss.huggingface.co/t/optimum-pruning-and-quantization-current-limitation/13978


    The following is a copy of the text I wrote there:

    We are checking out the Huggingface Optimum. There are some issues that we would like to clarify:

    • Pruning does not always speed up the model, and it may increase the model's storage size which is not expected.

    • Dynamic quantization works only on CPU (Running it on GPU shows error conflict between CPU and GPU

    Could someone or developer in the area explain this behavior? We think the Huggingface Optimum has a high hope for model compression.

    If some details are necessary, I would be glad to clarify more.

  • Posibility to load an ORTQuantizer or ORTOptimizer from Onnx

    Posibility to load an ORTQuantizer or ORTOptimizer from Onnx

    FIrst, thanks a lot for this library, it make work so much easier.

    I was wondering if it's possible to quantize and then optimize a model (or the reverse) but looking at the doc, it seems possible to do so only by passing a huggingface vanilla model.

    Is it possible to do so with already compiled models?

    Like : MyFineTunedModel ---optimize----> MyFineTunedOnnxOptimizedModel -----quantize-----> MyFinalReallyLightModel

    # Note that self.model_dir is my local folder with my custom fine-tuned hugginface model
    onnx_path = self.model_dir.joinpath("model.onnx")
    onnx_quantized_path = self.model_dir.joinpath("quantized_model.onnx")
    onnx_chad_path = self.model_dir.joinpath("chad_model.onnx")
    onnx_path.unlink(missing_ok=True)
    onnx_quantized_path.unlink(missing_ok=True)
    onnx_chad_path.unlink(missing_ok=True)
    
    quantizer = ORTQuantizer.from_pretrained(self.model_dir, feature="token-classification")
    quantized_path = quantizer.export(
        onnx_model_path=onnx_path, onnx_quantized_model_output_path=onnx_quantized_path,
        quantization_config=AutoQuantizationConfig.arm64(is_static=False, per_channel=False),
    )
    quantizer.model.save_pretrained(optimized_path.parent) # To have the model config.json
    quantized_path.parent.joinpath("pytorch_model.bin").unlink() # To ensure that we're not loading the vanilla pytorch model
    
    # Load an Optimizer from an onnx path... 
    # optimizer = ORTOptimizer.from_pretrained(quantized_path.parent, feature="token-classification")  <-- this fails
    # optimizer.export(
    #     onnx_model_path=onnx_path,
    #     onnx_optimized_model_output_path=onnx_chad_path,
    #     optimization_config=OptimizationConfig(optimization_level=99),
    # )
    model = ORTModelForTokenClassification.from_pretrained(quantized_path.parent, file_name="quantized_model.onnx")
    # Ideally would load onnx_chad_path (with chad_model.onnx) if the commented section works.
    
    tokenizer: PreTrainedTokenizer = AutoTokenizer.from_pretrained(self.model_dir)
    self.pipeline = cast(TokenClassificationPipeline, pipeline(
        model=model, tokenizer=tokenizer,
        task="token-classification", accelerator="ort",
        aggregation_strategy=AggregationStrategy.SIMPLE,
        device=device_number(self.device),
    ))
    

    Note that optimization alone works perfectly fine, quantization too, but I was hopping that both would be feasible.. unless optimization also does some kind of quantization or lighter model ?

    Thanks in advance. Have a great day

  • Fix wrong mapping between dataset label ids and model.config.label2id in text-classification

    Fix wrong mapping between dataset label ids and model.config.label2id in text-classification

    What does this PR do?

    Fix a bug for certain text-classification models using a sentence pair input.

    In text classification, some models have different ids than the dataset label for the same label name. For example, in glue/mnli, the model roberta-large-mnli has

      "label2id": {
        "CONTRADICTION": 0,
        "ENTAILMENT": 2,
        "NEUTRAL": 1
      },
    

    while the dataset labels has the order "label": {"num_classes": 3, "names": ["entailment", "neutral", "contradiction"], effectively swapping "contradiction" and "entailment" labels.

    In transformers example, this is fixed using an additional piece of code, see https://github.com/huggingface/transformers/blob/d6b8e9cec7301ba02f642588a6f12e78ec3b9798/examples/pytorch/text-classification/run_glue.py#L395-L413 . This piece of code is missing in the optimization and quantization examples (although not in the training example) and this PR add it back.

    Note that, for example for glue/mnli, the label_to_id ends up being {0: 2, 1: 1, 2: 0}, correcting the swap between the labels 0 and 2.

    Before submitting

    • [x] This PR fixes a bug
  • Add issue template

    Add issue template

    What does this PR do?

    Add issue templates, largely copied from transformers.

    This follows up https://github.com/huggingface/optimum/pull/180 , I messed up with my repo.

  • Compare optimized models vs. transformers models

    Compare optimized models vs. transformers models

    Feedback welcome, notably for the design, code quality, etc.

    This PR aims at introducing an unified way to benchmark transformers vs. optimized models, backend-independent (in the sense of, any backend can be plugged for inference and evaluation), code-free (in the sense of, the user does not need to code to start runs and evaluate them).

    The two main contributions is to introduce helper classes, methods for data preprocessing, inference, evaluation. In several files: * optimum/runs_base.py: general methods, this should be backend-agnostic. * optimum/utils/preprocessing/: handle loading and preprocessing datasets, running inference with pipelines, running evaluation. This should be backend-agnostic. * optimum/onnxruntime/runs/: OnnxRuntime specific methods

    For now, dataset preprocessing and evaluation are task-specific, the supported tasks are:

    • text-classification
    • token-classification
    • question-answering

    As for evaluation of transformers models, I believe there is some duplicate work with what exists in the AutoTrain backend and what is being done in https://github.com/huggingface/evaluate. However, my understanding being that it is not a priority to support Optimum-based inference within AutoTrain, it makes sense to me to have a common implementation to evaluate transformers/optimized models for them to be comparable. I hope we can make it such that we minimize duplicate efforts.

    I used pipelines for inference for the general metrics, and ORTModel.forward() to measure latency/throughput.

    Tasks before (or after) merge

    • [x] Documentation
    • [x] Test on several datasets
    • [x] See if it would make sense to use Trainer.evaluate() instead of an explicit loop for evaluation --> I think it doesn't, there is a lot of abstraction in pipelines already, we should make use of it.
    • [ ] Make use of train-eval-index metadata from datasets to auto-infer data, label columns (see e.g. https://github.com/huggingface/datasets/pull/4234)
    • [x] Support multi-column data (2 would be sufficient I guess, see https://github.com/huggingface/transformers/issues/8573)
    • [ ] Document node exclusion
    • [x] Support node exclusion for dynamic quantization for OnnxRuntime (implemented in https://github.com/huggingface/optimum/pull/196)
    • [ ] Avoid tracking PyTorch to Numpy conversion for time measurements, in https://github.com/huggingface/optimum/blob/cf91bd7276714c7a39324c0bb3d2e57f820b0ad6/optimum/onnxruntime/modeling_ort.py#L323-L327
    • [x] Still some work to distinguish backend-agnostic code vs. backend-specific code
    • [x] Clean all remainings # TODO
    • [ ] Unit tests / github workflows

    This, with some additional work, should close #128 .

  • Not possible to configure GPU in pipelines nor leveraging batch_size parallelisation

    Not possible to configure GPU in pipelines nor leveraging batch_size parallelisation

    When setting the device variable in the pipeline function/class to >= 0, an error appears AttributeError: 'ORTModelForCausalLM' object has no attribute 'to' - when running in GPU. This was initially reported in #161 so opening this issue to encompass supporting the device parameter in the ORT classes. This is important as otherwise it won't be possible to allow configuration of CPU/GPU similar to normal transformer libraries.

    Is there currently a workaround to ensure that the class is run on GPU? By default it seems this woudl eb set in CPU even when GPU is available:

    >>> m = ORTModelForCausalLM.from_pretrained("gpt2", from_transformers=True)
    >>> t = AutoTokenizer.from_pretrained("gpt2")
    >>> pp = pipeline("text-generation", model=m, tokenizer=t)
    >>> pp.device
    
    device(type='cpu')
    

    This is still the case even with the optimum[onnxruntime-gpu] package. I have validated by testing against a normal transformer with batch_size=X (ie pp = pipeline("text-generation", model=m, tokenizer=t, batch_size=128)) and it seems there is no optimization with parallel processing with optimum, whereas normal transformer is orders of magnitude faster (which is most likely as it's not utilizing the parallelism)

    I can confirm that the model is loaded with GPU correctly:

    >>> m.device
    
    device(type='cuda', index=0)
    

    And GPU is configured correctly:

    >>> from optimum.onnxruntime.utils import _is_gpu_available
    >>> _is_gpu_available()
    
    True
    

    Is there a way to enable GPU for processing with batching in optimum?

PyTorch common framework to accelerate network implementation, training and validation

pytorch-framework PyTorch common framework to accelerate network implementation, training and validation. This framework is inspired by works from MML

Dec 1, 2021
Accelerate Neural Net Training by Progressively Freezing Layers
Accelerate Neural Net Training by Progressively Freezing Layers

FreezeOut A simple technique to accelerate neural net training by progressively freezing layers. This repository contains code for the extended abstra

Feb 24, 2022
A machine learning library for spiking neural networks. Supports training with both torch and jax pipelines, and deployment to neuromorphic hardware.
A machine learning library for spiking neural networks. Supports training with both torch and jax pipelines, and deployment to neuromorphic hardware.

Rockpool Rockpool is a Python package for developing signal processing applications with spiking neural networks. Rockpool allows you to build network

Apr 21, 2022
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

May 16, 2022
Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready inference.
Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready inference.

Yolov5 running on TorchServe (GPU compatible) ! This is a dockerfile to run TorchServe for Yolo v5 object detection model. (TorchServe (PyTorch librar

May 12, 2022
FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization
FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

FuseDream This repo contains code for our paper (paper link): FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimizat

May 14, 2022
(CVPR2021) ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic

ClassSR (CVPR2021) ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic Paper Authors: Xiangtao Kong, Hengyuan

May 18, 2022
GT4SD, an open-source library to accelerate hypothesis generation in the scientific discovery process.
GT4SD, an open-source library to accelerate hypothesis generation in the scientific discovery process.

The GT4SD (Generative Toolkit for Scientific Discovery) is an open-source platform to accelerate hypothesis generation in the scientific discovery process. It provides a library for making state-of-the-art generative AI models easier to use.

May 22, 2022
Monocular 3D pose estimation. OpenVINO. CPU inference or iGPU (OpenCL) inference.
Monocular 3D pose estimation. OpenVINO. CPU inference or iGPU (OpenCL) inference.

human-pose-estimation-3d-python-cpp RealSenseD435 (RGB) 480x640 + CPU Corei9 45 FPS (Depth is not used) 1. Run 1-1. RealSenseD435 (RGB) 480x640 + CPU

Feb 16, 2022
Data-depth-inference - Data depth inference with python
Data-depth-inference - Data depth inference with python

Welcome! This readme will guide you through the use of the code in this reposito

Feb 8, 2022
Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)
Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

scikit-opt Swarm Intelligence in Python (Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Algorithm, Immune Algorithm,A

May 18, 2022
library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

NLopt is a library for nonlinear local and global optimization, for functions with and without gradient information. It is designed as a simple, unifi

May 17, 2022
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

This repository holds NVIDIA-maintained utilities to streamline mixed precision and distributed training in Pytorch. Some of the code here will be included in upstream Pytorch eventually. The intention of Apex is to make up-to-date utilities available to users as quickly as possible.

May 18, 2022
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Introduction This is a Python package available on PyPI for NVIDIA-maintained utilities to streamline mixed precision and distributed training in Pyto

Sep 29, 2021
Racing line optimization algorithm in python that uses Particle Swarm Optimization.
Racing line optimization algorithm in python that uses Particle Swarm Optimization.

Racing Line Optimization with PSO This repository contains a racing line optimization algorithm in python that uses Particle Swarm Optimization. Requi

May 13, 2022
A Lighting Pytorch Framework for Recommendation System, Easy-to-use and Easy-to-extend.

Torch-RecHub A Lighting Pytorch Framework for Recommendation Models, Easy-to-use and Easy-to-extend. ๅฎ‰่ฃ… pip install torch-rechub ไธป่ฆ็‰นๆ€ง scikit-learn้ฃŽๆ ผๆ˜“็”จ

May 20, 2022
Experimental Python implementation of OpenVINO Inference Engine (very slow, limited functionality). All codes are written in Python. Easy to read and modify.
Experimental Python implementation of OpenVINO Inference Engine (very slow, limited functionality). All codes are written in Python. Easy to read and modify.

PyOpenVINO - An Experimental Python Implementation of OpenVINO Inference Engine (minimum-set) Description The PyOpenVINO is a spin-off product from my

Mar 10, 2022
May 18, 2022
NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

NVIDIA Merlin NVIDIA Merlin is an open source library designed to accelerate recommender systems on NVIDIAโ€™s GPUs. It enables data scientists, machine

May 19, 2022