🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.



Build GitHub Documentation GitHub release Contributor Covenant DOI

English | 简体中文 | 繁體中文 | 한국어

State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow

🤗 Transformers provides thousands of pretrained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, text generation and more in over 100 languages. Its aim is to make cutting-edge NLP easier to use for everyone.

🤗 Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets and then share them with the community on our model hub. At the same time, each python module defining an architecture is fully standalone and can be modified to enable quick research experiments.

🤗 Transformers is backed by the three most popular deep learning libraries — Jax, PyTorch and TensorFlow — with a seamless integration between them. It's straightforward to train your models with one before loading them for inference with the other.

Online demos

You can test most of our models directly on their pages from the model hub. We also offer private model hosting, versioning, & an inference API for public and private models.

Here are a few examples:

Write With Transformer, built by the Hugging Face team, is the official demo of this repo’s text generation capabilities.

If you are looking for custom support from the Hugging Face team

HuggingFace Expert Acceleration Program

Quick tour

To immediately use a model on a given text, we provide the pipeline API. Pipelines group together a pretrained model with the preprocessing that was used during that model's training. Here is how to quickly use a pipeline to classify positive versus negative texts:

>>> from transformers import pipeline

# Allocate a pipeline for sentiment-analysis
>>> classifier = pipeline('sentiment-analysis')
>>> classifier('We are very happy to introduce pipeline to the transformers repository.')
[{'label': 'POSITIVE', 'score': 0.9996980428695679}]

The second line of code downloads and caches the pretrained model used by the pipeline, while the third evaluates it on the given text. Here the answer is "positive" with a confidence of 99.97%.

Many NLP tasks have a pre-trained pipeline ready to go. For example, we can easily extract question answers given context:

>>> from transformers import pipeline

# Allocate a pipeline for question-answering
>>> question_answerer = pipeline('question-answering')
>>> question_answerer({
...     'question': 'What is the name of the repository ?',
...     'context': 'Pipeline has been included in the huggingface/transformers repository'
... })
{'score': 0.30970096588134766, 'start': 34, 'end': 58, 'answer': 'huggingface/transformers'}

In addition to the answer, the pretrained model used here returned its confidence score, along with the start position and end position of the answer in the tokenized sentence. You can learn more about the tasks supported by the pipeline API in this tutorial.

To download and use any of the pretrained models on your given task, all it takes is three lines of code. Here is the PyTorch version:

>>> from transformers import AutoTokenizer, AutoModel

>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
>>> model = AutoModel.from_pretrained("bert-base-uncased")

>>> inputs = tokenizer("Hello world!", return_tensors="pt")
>>> outputs = model(**inputs)

And here is the equivalent code for TensorFlow:

>>> from transformers import AutoTokenizer, TFAutoModel

>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
>>> model = TFAutoModel.from_pretrained("bert-base-uncased")

>>> inputs = tokenizer("Hello world!", return_tensors="tf")
>>> outputs = model(**inputs)

The tokenizer is responsible for all the preprocessing the pretrained model expects, and can be called directly on a single string (as in the above examples) or a list. It will output a dictionary that you can use in downstream code or simply directly pass to your model using the ** argument unpacking operator.

The model itself is a regular Pytorch nn.Module or a TensorFlow tf.keras.Model (depending on your backend) which you can use normally. This tutorial explains how to integrate such a model into a classic PyTorch or TensorFlow training loop, or how to use our Trainer API to quickly fine-tune on a new dataset.

Why should I use transformers?

  1. Easy-to-use state-of-the-art models:

    • High performance on NLU and NLG tasks.
    • Low barrier to entry for educators and practitioners.
    • Few user-facing abstractions with just three classes to learn.
    • A unified API for using all our pretrained models.
  2. Lower compute costs, smaller carbon footprint:

    • Researchers can share trained models instead of always retraining.
    • Practitioners can reduce compute time and production costs.
    • Dozens of architectures with over 2,000 pretrained models, some in more than 100 languages.
  3. Choose the right framework for every part of a model's lifetime:

    • Train state-of-the-art models in 3 lines of code.
    • Move a single model between TF2.0/PyTorch frameworks at will.
    • Seamlessly pick the right framework for training, evaluation and production.
  4. Easily customize a model or an example to your needs:

    • We provide examples for each architecture to reproduce the results published by its original authors.
    • Model internals are exposed as consistently as possible.
    • Model files can be used independently of the library for quick experiments.

Why shouldn't I use transformers?

  • This library is not a modular toolbox of building blocks for neural nets. The code in the model files is not refactored with additional abstractions on purpose, so that researchers can quickly iterate on each of the models without diving into additional abstractions/files.
  • The training API is not intended to work on any model but is optimized to work with the models provided by the library. For generic machine learning loops, you should use another library.
  • While we strive to present as many use cases as possible, the scripts in our examples folder are just that: examples. It is expected that they won't work out-of-the box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs.

Installation

With pip

This repository is tested on Python 3.6+, Flax 0.3.2+, PyTorch 1.3.1+ and TensorFlow 2.3+.

You should install 🤗 Transformers in a virtual environment. If you're unfamiliar with Python virtual environments, check out the user guide.

First, create a virtual environment with the version of Python you're going to use and activate it.

Then, you will need to install at least one of Flax, PyTorch or TensorFlow. Please refer to TensorFlow installation page, PyTorch installation page and/or Flax installation page regarding the specific install command for your platform.

When one of those backends has been installed, 🤗 Transformers can be installed using pip as follows:

pip install transformers

If you'd like to play with the examples or need the bleeding edge of the code and can't wait for a new release, you must install the library from source.

With conda

Since Transformers version v4.0.0, we now have a conda channel: huggingface.

🤗 Transformers can be installed using conda as follows:

conda install -c huggingface transformers

Follow the installation pages of Flax, PyTorch or TensorFlow to see how to install them with conda.

Model architectures

All the model checkpoints provided by 🤗 Transformers are seamlessly integrated from the huggingface.co model hub where they are uploaded directly by users and organizations.

Current number of checkpoints:

🤗 Transformers currently provides the following architectures (see here for a high-level summary of each them):

  1. ALBERT (from Google Research and the Toyota Technological Institute at Chicago) released with the paper ALBERT: A Lite BERT for Self-supervised Learning of Language Representations, by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
  2. BART (from Facebook) released with the paper BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
  3. BARThez (from École polytechnique) released with the paper BARThez: a Skilled Pretrained French Sequence-to-Sequence Model by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.
  4. BARTpho (from VinAI Research) released with the paper BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen.
  5. BEiT (from Microsoft) released with the paper BEiT: BERT Pre-Training of Image Transformers by Hangbo Bao, Li Dong, Furu Wei.
  6. BERT (from Google) released with the paper BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
  7. BERTweet (from VinAI Research) released with the paper BERTweet: A pre-trained language model for English Tweets by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen.
  8. BERT For Sequence Generation (from Google) released with the paper Leveraging Pre-trained Checkpoints for Sequence Generation Tasks by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
  9. BigBird-RoBERTa (from Google Research) released with the paper Big Bird: Transformers for Longer Sequences by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
  10. BigBird-Pegasus (from Google Research) released with the paper Big Bird: Transformers for Longer Sequences by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
  11. Blenderbot (from Facebook) released with the paper Recipes for building an open-domain chatbot by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
  12. BlenderbotSmall (from Facebook) released with the paper Recipes for building an open-domain chatbot by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
  13. BORT (from Alexa) released with the paper Optimal Subarchitecture Extraction For BERT by Adrian de Wynter and Daniel J. Perry.
  14. ByT5 (from Google Research) released with the paper ByT5: Towards a token-free future with pre-trained byte-to-byte models by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
  15. CamemBERT (from Inria/Facebook/Sorbonne) released with the paper CamemBERT: a Tasty French Language Model by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
  16. CANINE (from Google Research) released with the paper CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting.
  17. CLIP (from OpenAI) released with the paper Learning Transferable Visual Models From Natural Language Supervision by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
  18. ConvBERT (from YituTech) released with the paper ConvBERT: Improving BERT with Span-based Dynamic Convolution by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
  19. CPM (from Tsinghua University) released with the paper CPM: A Large-scale Generative Chinese Pre-trained Language Model by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
  20. CTRL (from Salesforce) released with the paper CTRL: A Conditional Transformer Language Model for Controllable Generation by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
  21. DeBERTa (from Microsoft) released with the paper DeBERTa: Decoding-enhanced BERT with Disentangled Attention by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
  22. DeBERTa-v2 (from Microsoft) released with the paper DeBERTa: Decoding-enhanced BERT with Disentangled Attention by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
  23. DeiT (from Facebook) released with the paper Training data-efficient image transformers & distillation through attention by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
  24. DETR (from Facebook) released with the paper End-to-End Object Detection with Transformers by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
  25. DialoGPT (from Microsoft Research) released with the paper DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
  26. DistilBERT (from HuggingFace), released together with the paper DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into DistilGPT2, RoBERTa into DistilRoBERTa, Multilingual BERT into DistilmBERT and a German version of DistilBERT.
  27. DPR (from Facebook) released with the paper Dense Passage Retrieval for Open-Domain Question Answering by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
  28. EncoderDecoder (from Google Research) released with the paper Leveraging Pre-trained Checkpoints for Sequence Generation Tasks by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
  29. ELECTRA (from Google Research/Stanford University) released with the paper ELECTRA: Pre-training text encoders as discriminators rather than generators by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
  30. FlauBERT (from CNRS) released with the paper FlauBERT: Unsupervised Language Model Pre-training for French by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
  31. FNet (from Google Research) released with the paper FNet: Mixing Tokens with Fourier Transforms by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
  32. Funnel Transformer (from CMU/Google Brain) released with the paper Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
  33. GPT (from OpenAI) released with the paper Improving Language Understanding by Generative Pre-Training by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
  34. GPT-2 (from OpenAI) released with the paper Language Models are Unsupervised Multitask Learners by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
  35. GPT-J (from EleutherAI) released in the repository kingoflolz/mesh-transformer-jax by Ben Wang and Aran Komatsuzaki.
  36. GPT Neo (from EleutherAI) released in the repository EleutherAI/gpt-neo by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
  37. Hubert (from Facebook) released with the paper HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
  38. I-BERT (from Berkeley) released with the paper I-BERT: Integer-only BERT Quantization by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer.
  39. LayoutLM (from Microsoft Research Asia) released with the paper LayoutLM: Pre-training of Text and Layout for Document Image Understanding by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
  40. LayoutLMv2 (from Microsoft Research Asia) released with the paper LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.
  41. LayoutXLM (from Microsoft Research Asia) released with the paper LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
  42. LED (from AllenAI) released with the paper Longformer: The Long-Document Transformer by Iz Beltagy, Matthew E. Peters, Arman Cohan.
  43. Longformer (from AllenAI) released with the paper Longformer: The Long-Document Transformer by Iz Beltagy, Matthew E. Peters, Arman Cohan.
  44. LUKE (from Studio Ousia) released with the paper LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
  45. LXMERT (from UNC Chapel Hill) released with the paper LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering by Hao Tan and Mohit Bansal.
  46. M2M100 (from Facebook) released with the paper Beyond English-Centric Multilingual Machine Translation by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
  47. MarianMT Machine translation models trained using OPUS data by Jörg Tiedemann. The Marian Framework is being developed by the Microsoft Translator Team.
  48. MBart (from Facebook) released with the paper Multilingual Denoising Pre-training for Neural Machine Translation by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
  49. MBart-50 (from Facebook) released with the paper Multilingual Translation with Extensible Multilingual Pretraining and Finetuning by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
  50. Megatron-BERT (from NVIDIA) released with the paper Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
  51. Megatron-GPT2 (from NVIDIA) released with the paper Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
  52. MPNet (from Microsoft Research) released with the paper MPNet: Masked and Permuted Pre-training for Language Understanding by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
  53. MT5 (from Google AI) released with the paper mT5: A massively multilingual pre-trained text-to-text transformer by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
  54. Pegasus (from Google) released with the paper PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
  55. PhoBERT (from VinAI Research) released with the paper PhoBERT: Pre-trained language models for Vietnamese by Dat Quoc Nguyen and Anh Tuan Nguyen.
  56. ProphetNet (from Microsoft Research) released with the paper ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
  57. Reformer (from Google Research) released with the paper Reformer: The Efficient Transformer by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
  58. RemBERT (from Google Research) released with the paper Rethinking embedding coupling in pre-trained language models by Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder.
  59. RoBERTa (from Facebook), released together with the paper a Robustly Optimized BERT Pretraining Approach by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
  60. RoFormer (from ZhuiyiTechnology), released together with the paper a RoFormer: Enhanced Transformer with Rotary Position Embedding by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
  61. SEW (from ASAPP) released with the paper Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
  62. SEW-D (from ASAPP) released with the paper Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
  63. SpeechToTextTransformer (from Facebook), released together with the paper fairseq S2T: Fast Speech-to-Text Modeling with fairseq by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
  64. SpeechToTextTransformer2 (from Facebook), released together with the paper Large-Scale Self- and Semi-Supervised Learning for Speech Translation by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau.
  65. Splinter (from Tel Aviv University), released together with the paper Few-Shot Question Answering by Pretraining Span Selection by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy.
  66. SqueezeBert (from Berkeley) released with the paper SqueezeBERT: What can computer vision teach NLP about efficient neural networks? by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
  67. T5 (from Google AI) released with the paper Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
  68. T5v1.1 (from Google AI) released in the repository google-research/text-to-text-transfer-transformer by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
  69. TAPAS (from Google AI) released with the paper TAPAS: Weakly Supervised Table Parsing via Pre-training by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
  70. Transformer-XL (from Google/CMU) released with the paper Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
  71. TrOCR (from Microsoft), released together with the paper TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
  72. Vision Transformer (ViT) (from Google AI) released with the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
  73. VisualBERT (from UCLA NLP) released with the paper VisualBERT: A Simple and Performant Baseline for Vision and Language by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
  74. Wav2Vec2 (from Facebook AI) released with the paper wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
  75. XLM (from Facebook) released together with the paper Cross-lingual Language Model Pretraining by Guillaume Lample and Alexis Conneau.
  76. XLM-ProphetNet (from Microsoft Research) released with the paper ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
  77. XLM-RoBERTa (from Facebook AI), released together with the paper Unsupervised Cross-lingual Representation Learning at Scale by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
  78. XLNet (from Google/CMU) released with the paper ​XLNet: Generalized Autoregressive Pretraining for Language Understanding by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
  79. XLSR-Wav2Vec2 (from Facebook AI) released with the paper Unsupervised Cross-Lingual Representation Learning For Speech Recognition by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
  80. Want to contribute a new model? We have added a detailed guide and templates to guide you in the process of adding a new model. You can find them in the templates folder of the repository. Be sure to check the contributing guidelines and contact the maintainers or open an issue to collect feedbacks before starting your PR.

To check if each model has an implementation in Flax, PyTorch or TensorFlow, or has an associated tokenizer backed by the 🤗 Tokenizers library, refer to this table.

These implementations have been tested on several datasets (see the example scripts) and should match the performance of the original implementations. You can find more details on performance in the Examples section of the documentation.

Learn more

Section Description
Documentation Full API documentation and tutorials
Task summary Tasks supported by 🤗 Transformers
Preprocessing tutorial Using the Tokenizer class to prepare data for the models
Training and fine-tuning Using the models provided by 🤗 Transformers in a PyTorch/TensorFlow training loop and the Trainer API
Quick tour: Fine-tuning/usage scripts Example scripts for fine-tuning models on a wide range of tasks
Model sharing and uploading Upload and share your fine-tuned models with the community
Migration Migrate to 🤗 Transformers from pytorch-transformers or pytorch-pretrained-bert

Citation

We now have a paper you can cite for the 🤗 Transformers library:

@inproceedings{wolf-etal-2020-transformers,
    title = "Transformers: State-of-the-Art Natural Language Processing",
    author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = oct,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.emnlp-demos.6",
    pages = "38--45"
}
Owner
Hugging Face
The AI community building the future.
Hugging Face
Comments
  • How to use BERT for finding similar sentences or similar news?

    How to use BERT for finding similar sentences or similar news?

    I have used BERT NextSentencePredictor to find similar sentences or similar news, However, It's super slow. Even on Tesla V100 which is the fastest GPU till now. It takes around 10secs for a query title with around 3,000 articles. Is there a way to use BERT better for finding similar sentences or similar news given a corpus of news articles?

  • Summarization Fine Tuning

    Summarization Fine Tuning

    ❓ Questions & Help

    Details

    I tried using T5 and Bart but the abstraction summarization on scientific texts does not seem to give the results I want since I think they are both trained on news corpora. I have scraped all of the free PMC articles and I am thinking about fine-tuning a seq2seq model between the articles and their abstracts to make an abstractive summarizer for scientific texts. This Medium article (https://medium.com/huggingface/encoder-decoders-in-transformers-a-hybrid-pre-trained-architecture-for-seq2seq-af4d7bf14bb8) provides a bit of an introduction to how to approach this but does not quite go into detail so I am wondering how to approach this.

    I'm not really asking for help being stuck but I just don't really know how to approach this problem.

    A link to original question on Stack Overflow: https://stackoverflow.com/questions/61826443/train-custom-seq2seq-transformers-model

  • ONNXConfig: Add a configuration for all available models

    ONNXConfig: Add a configuration for all available models

    This issue is about the working group specially created for this task. If you are interested in helping out, take a look at this organization, or add me on Discord: ChainYo#3610

    We want to contribute to HuggingFace's ONNX implementation for all available models on HF's hub. There are already a lot of architectures implemented for converting PyTorch models to ONNX, but we need more! We need them all!

    Feel free to join us in this adventure! Join the org by clicking here

    Here is a non-exhaustive list of models that all models available:

    • [x] Albert
    • [x] BART
    • [x] BeiT
    • [x] BERT
    • [x] BigBird
    • [x] BigBirdPegasus
    • [x] Blenderbot
    • [x] BlenderbotSmall
    • [x] BLOOM
    • [x] CamemBERT
    • [ ] CANINE
    • [x] CLIP
    • [x] CodeGen
    • [x] ConvNext
    • [x] ConvBert
    • [ ] CTRL
    • [ ] CvT
    • [x] Data2VecText
    • [x] Data2VecVision
    • [x] Deberta
    • [x] DebertaV2
    • [x] DeiT
    • [ ] DecisionTransformer
    • [x] DETR
    • [x] Distilbert
    • [ ] DPR
    • [ ] DPT
    • [x] ELECTRA
    • [ ] FNet
    • [ ] FSMT
    • [x] Flaubert
    • [ ] FLAVA
    • [ ] Funnel Transformer
    • [ ] GLPN
    • [x] GPT2
    • [x] GPTJ
    • [x] GPT-Neo
    • [ ] GPT-NeoX
    • [ ] Hubert
    • [x] I-Bert
    • [ ] ImageGPT
    • [ ] LED
    • [x] LayoutLM
    • [ ] 🛠️ LayoutLMv2
    • [x] LayoutLMv3
    • [ ] LayoutXLM
    • [ ] LED
    • [x] LeViT
    • [x] Longformer
    • [x] LongT5
    • [ ] 🛠️ Luke
    • [ ] Lxmert
    • [x] M2M100
    • [ ] MaskFormer
    • [x] mBart
    • [ ] MCTCT
    • [ ] MPNet
    • [x] MT5
    • [x] MarianMT
    • [ ] MegatronBert
    • [x] MobileBert
    • [x] MobileViT
    • [ ] Nyströmformer
    • [x] OpenAIGPT-2
    • [ ] 🛠️ OPT
    • [x] OWLViT
    • [x] PLBart
    • [ ] Pegasus
    • [x] Perceiver
    • [ ] PoolFormer
    • [ ] ProphetNet
    • [ ] QDQBERT
    • [ ] RAG
    • [ ] REALM
    • [ ] 🛠️ Reformer
    • [x] RemBert
    • [x] ResNet
    • [ ] RegNet
    • [ ] RetriBert
    • [x] RoFormer
    • [x] RoBERTa
    • [ ] SEW
    • [ ] SEW-D
    • [ ] SegFormer
    • [ ] Speech2Text
    • [ ] Speech2Text2
    • [ ] Splinter
    • [x] SqueezeBERT
    • [ ] Swin Transformer
    • [x] T5
    • [ ] TAPAS
    • [ ] TAPEX
    • [ ] Transformer XL
    • [x] TrOCR
    • [ ] UniSpeech
    • [ ] UniSpeech-SAT
    • [ ] VAN
    • [x] ViT
    • [ ] Vilt
    • [ ] VisualBERT
    • [ ] Wav2Vec2
    • [ ] WavLM
    • [ ] XGLM
    • [x] XLM
    • [ ] XLMProphetNet
    • [x] XLM-RoBERTa
    • [x] XLM-RoBERTa-XL
    • [ ] 🛠️ XLNet
    • [x] YOLOS
    • [ ] Yoso

    🛠️ next to a model suggests that the PR is in progress. If there is nothing next to a model, it means that ONNX does not yet support the model, and thus we need to add support for it.

    If you need help implementing an unsupported model, here is a guide from HuggingFace's documentation.

    If you want an example of implementation, I did one for CamemBERT months ago.

  • GPT-J-6B

    GPT-J-6B

    What does this PR do?

    Introduces the long awaited GPT J model class to HuggingFace! Concurrently with this PR being merged I will make a GPT J 6B checkpoint public on the EleutherAI HF page for people to use. The model has been evaluated as being within error tolerances of the GPT J 6B model we released in Jax two months ago.

    @patil-suraj was very helpful in assisting me to understand HF philosophy and how to make this PR most in line with the rest of the codebase. Other than that, the major design consideration was to make the configs compatible with GPT-2 rather than GPT-Neo. GPT-Neo has some usability limitations due to its configs having names unrelated to GPT-2’s (see #12183 for details). Given those problems and my hope that GPT-Neo will have it’s configs updated in the future, it seemed like a clear choice to align GPT J with GPT-2.

    Shout outs to @finetuneanon whose implementation this one is based off of, as well as @kumuruz for assistence optimizing and debugging.

    Supersedes #12243 #13010 #13022

    Closes #12098

    Before submitting

    • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
    • [X] Did you read the contributor guideline, Pull Request section?
    • [X] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case. It was discussed in Slack with @patil-suraj
    • [X] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
    • [X] Did you write any new necessary tests?

    Who can review?

    • gpt2: @patrickvonplaten, @LysandreJik, @patil-suraj
  • [DeepSpeed] [success] trained t5-11b on 1x 40GB gpu

    [DeepSpeed] [success] trained t5-11b on 1x 40GB gpu

    Managed to train t5-11b on 1x 40GB gpu w/ Deepspeed (A100-SXM4-40GB)

    Thank you, @PeterAJansen for letting me use your hardware!

    Thank you, @jeffra and @samyam, for not believing that it is not possible to train t5-11b on 1x 40GB gpu w/ Deepspeed and supporting me that lead me to find a few bugs in the integration.

    Sharing details for those who need.

    If you want to try this at home please make sure you use transformers master as some bug fixes were just merged in

    Well, it's similar to the t5-3b on 24GB success reported here and here. But this time t5-11b on 1x 40GB gpu (or 4x if you wanted things faster)

    As someone asked me before you need a huge amount of general RAM to use ZeRO-Offload for a huge model:

    • for t5-3b on 1x 24GB gpu: ~71GB RAM
    • for t5-11b on 1x 40GB gpu: ~234GB RAM

    I was using /usr/bin/time -v program to get the peak memory measurement - it's the Maximum resident set size entry in the final report.

    Question: I don't think /usr/bin/time does the right thing for multi-process - I think it only measures the parent process. e.g. with 4x gpus it reported only 102GB RAM, but I clearly saw in top that it was around 240GB. If you have an easy way to measure peak memory that takes into an account forked processes I'm all ears.

    Batch sizes on one gpu:

    • with buffers of 5e8 I was able to run BS=2, which might be too small for training,
    • but with 2e8 I managed to squeeze in BS=10 for training, but OOMed on prediction

    I'm referring to these batch sizes in ds_config.json:

            "allgather_bucket_size": 2e8,
            "reduce_bucket_size": 2e8,
    

    And I tested for 2x and 4x DDP as well, BS=16 OOMed, BS=8 was good so I used that - but could probably squeeze some more.

    edit1: later tests show that my test was too short and wasn't getting the CPU Adam optimizer kick in, as it skips the first 20 or so tests because of the overflow. So once it kicks in it takes more GPU memory, so the practical BS is much smaller - I think around 2 on this setup. So most likely you will need to use BS=2 for real work, until things get optimized even more.

    edit2: things are getting re-shuffling in the tests, so the default ds_config.json file has moved in master to a new, hopefully permanent home. It's now at examples/tests/deepspeed/ds_config.json so you will need to adjust the command line to reflect this new location or simply copy it over to where the old one used to be.

    here is the full benchmark:

    # 1 gpu: 
    # only training fits with this BS, eval needs a smaller BS
    
    export BS=8; rm -rf output_dir; PYTHONPATH=../../src USE_TF=0 /usr/bin/time -v deepspeed --num_gpus=1 ./finetune_trainer.py --model_name_or_path t5-11b --output_dir output_dir --adam_eps 1e-06 --data_dir wmt_en_ro --do_eval --do_predict --do_train --evaluation_strategy=steps --freeze_embeds --label_smoothing 0.1 --learning_rate 3e-5 --logging_first_step --logging_steps 1000 --max_source_length 128 --max_target_length 128 --num_train_epochs 1 --overwrite_output_dir --per_device_eval_batch_size $BS --per_device_train_batch_size $BS --predict_with_generate --eval_steps 25000  --sortish_sampler --task translation_en_to_ro --test_max_target_length 128 --val_max_target_length 128 --warmup_steps 5 --n_train 60 --n_val 10 --n_test 10 --deepspeed ds_config.json --fp16
    
    {'train_runtime': 31.0897, 'train_samples_per_second': 0.257, 'epoch': 1.0}
    
    # 2 gpus:
    
    export BS=8; rm -rf output_dir; PYTHONPATH=../../src USE_TF=0 /usr/bin/time -v deepspeed --num_gpus=2 ./finetune_trainer.py --model_name_or_path t5-11b --output_dir output_dir --adam_eps 1e-06 --data_dir wmt_en_ro --do_eval --do_predict --do_train --evaluation_strategy=steps --freeze_embeds --label_smoothing 0.1 --learning_rate 3e-5 --logging_first_step --logging_steps 1000 --max_source_length 128 --max_target_length 128 --num_train_epochs 1 --overwrite_output_dir --per_device_eval_batch_size $BS --per_device_train_batch_size $BS --predict_with_generate --eval_steps 25000  --sortish_sampler --task translation_en_to_ro --test_max_target_length 128 --val_max_target_length 128 --warmup_steps 5 --n_train 60 --n_val 10 --n_test 10 --deepspeed ds_config.json --fp16
    
    {'train_runtime': 17.9026, 'train_samples_per_second': 0.223, 'epoch': 1.0}
    
    # 4 gpus
    
    export BS=8; rm -rf output_dir; PYTHONPATH=../../src USE_TF=0 /usr/bin/time -v deepspeed --num_gpus=4 ./finetune_trainer.py --model_name_or_path t5-11b --output_dir output_dir --adam_eps 1e-06 --data_dir wmt_en_ro --do_eval --do_predict --do_train --evaluation_strategy=steps --freeze_embeds --label_smoothing 0.1 --learning_rate 3e-5 --logging_first_step --logging_steps 1000 --max_source_length 128 --max_target_length 128 --num_train_epochs 1 --overwrite_output_dir --per_device_eval_batch_size $BS --per_device_train_batch_size $BS --predict_with_generate --eval_steps 25000  --sortish_sampler --task translation_en_to_ro --test_max_target_length 128 --val_max_target_length 128 --warmup_steps 5 --n_train 60 --n_val 10 --n_test 10 --deepspeed ds_config.json --fp16
    
    {'train_runtime': 10.4404, 'train_samples_per_second': 0.192, 'epoch': 1.0}
    

    Checkpointing should allow making even bigger batch sizes.

  • FP16 overflow with GPT-Neo when using sequence lengths of 2048.

    FP16 overflow with GPT-Neo when using sequence lengths of 2048.

    Environment info

    • transformers version: 4.5.0.dev0
    • Platform: Linux-5.4.0-54-generic-x86_64-with-glibc2.29
    • Python version: 3.8.5
    • PyTorch version (GPU?): 1.8.0+cu111
    • Tensorflow version (GPU?): N/A
    • Using GPU in script?: Yes
    • Using distributed or parallel set-up in script?: No

    Who can help

    @stas00

    Models:

    • GPT-Neo 1.3b

    Library:

    • deepspeed: @stas00

    Information

    Model I am using (Bert, XLNet ...):

    The problem arises when using:

    • [ ] the official example scripts: (give details below)
    • [x] my own modified scripts: (give details below)

    The tasks I am working on is:

    • [ ] an official GLUE/SQUaD task: (give the name)
    • [x] my own task or dataset: (give details below)

    To reproduce

    Steps to reproduce the behavior:

    1. Use GPT-Neo 1.3b with The Pile dataset and built in trainer. Artificial data also suffices. It does not matter what the data is, as long as the attention mask spans all 2048 tokens.
    2. Enable FP16 and set max_length to 2048
    3. Observe that all loses reported are NaN

    Also reproducible using AMP or DeepSpeed. It seems like there is code to circumvent this outlined in the GPT-Neo implementation where q,k,v are casted to fp32 in the attention block.

    When the max_length is shorter (512) this overflow does not occur.

    Expected behavior

    I expected no overflows.

    Aside

    I'm reaching out on behalf of EleutherAI, Lysandre told us to create an issue about this.

  • [deepspeed] `bigscience/T0*` multi-gpu inference with ZeRO

    [deepspeed] `bigscience/T0*` multi-gpu inference with ZeRO

    Environment info

    • transformers version: 4.17.0.dev0
    • Platform: Linux-5.13.0-27-generic-x86_64-with-glibc2.10
    • Python version: 3.8.0
    • PyTorch version (GPU?): 1.10.1 (True)
    • Tensorflow version (GPU?): not installed (NA)
    • Flax version (CPU?/GPU?/TPU?): not installed (NA)
    • Jax version: not installed
    • JaxLib version: not installed
    • Using GPU in script?: yes
    • Using distributed or parallel set-up in script?: yes (deepspeed)
    • Note: I installed DeepSpeed from source

    Who can help

    Models: (I'm actually trying to use T0pp but T5 is close enough)

    • T5, BART, Marian, Pegasus, EncoderDecoder: @patrickvonplaten

    Library:

    • Deepspeed: @stas00
    • Text generation: @patrickvonplaten @narsil

    Information

    Model I am using (Bert, XLNet ...): T0pp / T0_3B

    The problem arises when using:

    • [ ] the official example scripts: (give details below)
    • [X] my own modified scripts: (give details below)

    The tasks I am working on is:

    • [ ] an official GLUE/SQUaD task: (give the name)
    • [X] my own task or dataset: (give details below)

    To reproduce

    I want to load T0pp across 2 24GB GPUs and only run inference. I know Deepspeed wit zeRO stage 3 is the way to go for this from reading documentation. I am following the HuggingFace example here to use Deepspeed without a Trainer object.

    The error I get is

    [2022-01-28 18:36:41,193] [INFO] [partition_parameters.py:456:__exit__] finished initializing model with 2.85B parameters
    Traceback (most recent call last):
      File "multi_gpu_T0pp.py", line 26, in <module>
        engine = deepspeed.initialize(model=model, config_params=ds_config)
    AttributeError: module 'transformers.deepspeed' has no attribute 'initialize'
    

    My code:

    Run with CUDA_VISIBLE_DEVICES="0,1" deepspeed <script.py>

    """
    Example code to load a PyTorch model across GPUs
    """
    from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
    from transformers.deepspeed import HfDeepSpeedConfig
    from transformers import deepspeed
    import pandas as pd
    import torch
    import pdb
    import os
    
    seed = 42
    torch.manual_seed(seed)
    
    ds_config = {
        "fp16": {
            "enabled": "auto",
            "loss_scale": 0,
            "loss_scale_window": 1000,
            "initial_scale_power": 16,
            "hysteresis": 2,
            "min_loss_scale": 1
        },
        "zero_optimization": {
            "stage": 3,
            "overlap_comm": true,
            "contiguous_gradients": true,
            "sub_group_size": 1e9,
            "reduce_bucket_size": "auto",
            "stage3_prefetch_bucket_size": "auto",
            "stage3_param_persistence_threshold": "auto",
            "stage3_max_live_parameters": 1e9,
            "stage3_max_reuse_distance": 1e9,
            "stage3_gather_fp16_weights_on_model_save": true
        },
        "gradient_accumulation_steps": 1,
        "gradient_clipping": 0,
        "steps_per_print": 2000,
        "train_batch_size": 2,
        "train_micro_batch_size_per_gpu": 1,
        "wall_clock_breakdown": false
    }
    
    if __name__ == "__main__":
        # must run before instantiating the model
        # ds_config is deepspeed config object or path to the file
        dschf = HfDeepSpeedConfig(ds_config)  # keep this object alive
    
        model_name = "bigscience/T0_3B"
        tokenizer = AutoTokenizer.from_pretrained(model_name)
        model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
    
        engine = deepspeed.initialize(model=model, config_params=ds_config)
    
        inputs = tokenizer.encode(
            "Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy",
            return_tensors="pt")
        outputs = model.generate(inputs)
        print(tokenizer.decode(outputs[0]))
    

    Expected behavior

    T0pp (or T0_3B) to load across 2 GPUs, generate an answer, and then quit.

  • How to use fine-tuned BART for prediction?

    How to use fine-tuned BART for prediction?

    ❓ Questions & Help

    Details

    I fine-tuned the BART model on a custom summarization dataset using the transformers/examples/summarization/bart/finetune.py and transformers/examples/summarization/bart/run_train.sh files in the repository for training (which generated three checkpointepoch=*.ckpt files) and prediction (which generated a .txt file with the test loss scores).

    I have two questions on using this model for prediction:

    • How can I modify finetune.py to generate predictions for the test set, in addition to the loss scores? I see some test functions in finetune.py, but I'm not sure how to use these for generating a .txt file with the predictions.

    • How can I load the generated .ckpt files into BartForConditionalGeneration()? A config.json file was not generated along with the checkpoint files; there doesn't seem to be a TFBartForConditionalGeneration; and the convert_tf_checkpoint_to_pytorch.py script in the repo doesn't seem to support BART yet.

    Thank you for your time!

  • Installation Error - Failed building wheel for tokenizers

    Installation Error - Failed building wheel for tokenizers

    🐛 Bug

    Information

    Model I am using (Bert, XLNet ...): N/A

    Language I am using the model on (English, Chinese ...): N/A

    The problem arises when using:

    • [X] the official example scripts: (give details below)

    Problem arises in transformers installation on Microsoft Windows 10 Pro, version 10.0.17763

    After creating and activating the virtual environment, installing transformers is not possible, because the following error occurs:

    "error: can not find Rust Compiler" "ERROR: Failed building wheel for tokenizers" Failed to build tokenizers ERROR: Could not build wheels for tokenizers which use PEP 517 and cannot be installed d

    The tasks I am working on is: [X ] transformers installation

    To reproduce

    Steps to reproduce the behavior:

    1. From command line interface, create and activate a virtual environment by following the steps in this URL: https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/
    2. Install transformers from source, by following the example in the topic From Source on this URL: https://github.com/huggingface/transformers
    -m pip --version
    -m pip install --upgrade pip
    -m pip install --user virtualenv
    -m venv env
    .\env\Scripts\activate
    pip install transformers
    
    ERROR: Command errored out with exit status 1:
       command: 'c:\users\vbrandao\env\scripts\python.exe' 'c:\users\vbrandao\env\lib\site-packages\pip\_vendor\pep517\_in_process.py' build_wheel 'C:\Users\vbrandao\AppData\Local\Temp\tmpj6evjmze'
           cwd: C:\Users\vbrandao\AppData\Local\Temp\pip-install-sza2_lmj\tokenizers
      Complete output (10 lines):
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build\lib
      creating build\lib\tokenizers
      copying tokenizers\__init__.py -> build\lib\tokenizers
      running build_ext
      running build_rust
      error: Can not find Rust compiler
      ----------------------------------------
      ERROR: Failed building wheel for tokenizers
    Failed to build tokenizers
    ERROR: Could not build wheels for tokenizers which use PEP 517 and cannot be installed directly
    
    

    Expected behavior

    Installation of transformers should be complete.

    Environment info

    • transformers version: N/A - installation step
    • Platform: Command Line Interface / Virtual Env
    • Python version: python 3.8
    • PyTorch version (GPU?): N/A
    • Tensorflow version (GPU?): N/A
    • Using GPU in script?: N/A
    • Using distributed or parallel set-up in script?: N/A tokenizers_intallation_error
  • Add TF ViT MAE

    Add TF ViT MAE

    This PR adds the MAE [1] model in TensorFlow. It was developed by @arig23498 and myself.

    Fun facts about this PR:

    • Probably the third pure vision model in TensorFlow in transformers.

    References:

    [1] Masked Autoencoders Are Scalable Vision Learners

    Update

    The PR is now ready for review. @gante @Rocketknight1 @sgugger

  • Add TFConvNextModel

    Add TFConvNextModel

    This PR adds the ConvNeXt [1] model in TensorFlow. It was developed by @arig23498, @gante, and myself.

    Fun facts about this PR:

    • Probably the first pure conv model in transformers.
    • Probably the second pure vision model in TensorFlow in transformers.

    References:

    [1] A ConvNet for the 2020s: https://arxiv.org/abs/2201.03545.

    @gante @LysandreJik @Rocketknight1

  • Whisper decoding returns exception about outputs.logits shape

    Whisper decoding returns exception about outputs.logits shape

    System Info

    transformers version: 4.26.0.dev0

    • Platform: Linux-5.10.0-20-amd64-x86_64-with-glibc2.31
    • Python version: 3.9.2
    • Huggingface_hub version: 0.11.1
    • PyTorch version (GPU?): 1.13.1+cu117 (False)
    • Tensorflow version (GPU?): not installed (NA)
    • Flax version (CPU?/GPU?/TPU?): not installed (NA)
    • Jax version: not installed
    • JaxLib version: not installed
    • Using GPU in script?:
    • Using distributed or parallel set-up in script?:

    Same error on cuda servers

    Who can help?

    No response

    Information

    • [X] The official example scripts
    • [ ] My own modified scripts

    Tasks

    • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
    • [ ] My own task or dataset (give details below)

    Reproduction

    Run simple decoding with Whisper large:

        speech_array, sampling_rate = torchaudio.load(fn)
        resampler = torchaudio.transforms.Resample(sampling_rate, 16_000)
        sound = resampler(speech_array).squeeze().numpy()
        input_features = processor(sound, return_tensors="pt", sampling_rate=16_000).input_features
    
        with torch.no_grad():
            generated_ids = model.generate(inputs=input_features, max_length=1000)
            transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
    

    Result is an exception:

    Traceback (most recent call last):
      File "/home/user/test_whisper_hf.py", line 37, in <module>
        generated_ids = model.generate(inputs=input_features, max_length=1000)
      File "/home/user/.local/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
        return func(*args, **kwargs)
      File "/home/user/.local/lib/python3.9/site-packages/transformers-4.26.0.dev0-py3.9.egg/transformers/generation/utils.py", line 1352, in generate
        return self.greedy_search(
      File "/home/user/.local/lib/python3.9/site-packages/transformers-4.26.0.dev0-py3.9.egg/transformers/generation/utils.py", line 2135, in greedy_search
        next_token_logits = outputs.logits[:, -1, :]
    IndexError: index -1 is out of bounds for dimension 1 with size 0
    

    The output on this problematic file is

    Seq2SeqLMOutput(loss=None, logits=tensor([], size=(1, 0, 51865)), past_key_values=((tensor([[[[ 1.3006e+00, -4.4066e-02, -2.5518e-02,  ...,  1.6218e-01,
    

    This happens only with a single file in the dataset of 10k files.

    Expected behavior

    No exception

  •  I have a problem  trained model with tensorflow on transformer pipeline male error

    I have a problem trained model with tensorflow on transformer pipeline male error

    Click to expand!

    Issue Type

    Bug

    Have you reproduced the bug with TF nightly?

    Yes

    Source

    source

    Tensorflow Version

    2.8

    Custom Code

    Yes

    OS Platform and Distribution

    No response

    Mobile device

    No response

    Python version

    No response

    Bazel version

    No response

    GCC/Compiler version

    No response

    CUDA/cuDNN version

    No response

    GPU model and memory

    No response

    Current Behaviour?

    i’m using this github text summarization and I have a problem. I have been struggling for two week and I could not figure that out. im using a notebook from this github repository: https://github.com/flogothetis/Abstractive-Summarization-T5-Keras

    notebook link: https://github.com/flogothetis/Abstractive-Summarization-T5-Keras/blob/main/AbstractiveSummarizationT5.ipynb

    after train model i wanna use huggingface transformer pipe line to generate summerization from transformers import pipeline summarizer = pipeline(“summarization”, model=model, tokenizer=“t5-small”, framework=“tf”) summarizer(“some text”)

    but it pop out an error:

    AttributeError: ‘Functional’ object has no attribute 'config’

    Anyone has any idea how can i solve it?

    full error: AttributeError Traceback (most recent call last) /tmp/ipykernel_20/1872405895.py in ----> 1 summarizer = pipeline(“summarization”, model=model, tokenizer=“t5-small”, framework=“tf”) 2 3 summarizer(“The US has passed the peak on new coronavirus cases, President Donald Trump said and predicted that some states would reopen”)

    /opt/conda/lib/python3.7/site-packages/transformers/pipelines/init.py in pipeline(task, model, config, tokenizer, framework, revision, use_fast, use_auth_token, model_kwargs, **kwargs) 432 break 433 → 434 return task_class(model=model, tokenizer=tokenizer, modelcard=modelcard, framework=framework, task=task, **kwargs)

    /opt/conda/lib/python3.7/site-packages/transformers/pipelines/text2text_generation.py in init(self, *args, **kwargs) 37 38 def init(self, *args, **kwargs): —> 39 super().init(*args, **kwargs) 40 41 self.check_model_type(

    /opt/conda/lib/python3.7/site-packages/transformers/pipelines/base.py in init(self, model, tokenizer, modelcard, framework, task, args_parser, device, binary_output) 548 549 # Update config with task specific parameters → 550 task_specific_params = self.model.config.task_specific_params 551 if task_specific_params is not None and task in task_specific_params: 552 self.model.config.update(task_specific_params.get(task))

    AttributeError: ‘Functional’ object has no attribute 'config’

    
    
    ### Standalone code to reproduce the issue
    
    ```shell
    summarizer = pipeline(“summarization”, model=model, tokenizer=“t5-small”, framework=“tf”)
    summarizer(“some text”)
    
    but it pop out an error:
    
    AttributeError: ‘Functional’ object has no attribute 'config’
    

    Relevant log output

    No response

  • Add Spanish translation to community.mdx

    Add Spanish translation to community.mdx

    What does this PR do?

    Adds Spanish translation to community.mdx

    Fixes #15947

    Before submitting

    • [x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
    • [ ] Did you read the contributor guideline, Pull Request section?
    • [ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
    • [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
    • [ ] Did you write any new necessary tests?

    Who can review?

    @osanseviero

  • X-CLIP and other video classification models can't be loaded into CUDA GPU for inference without crashing the kernel/process

    X-CLIP and other video classification models can't be loaded into CUDA GPU for inference without crashing the kernel/process

    System Info

    Originally:

    • transformers version: 4.25.1 (also tried 4.26.0-dev directly from the GitHub main branch)
    • Platform: Linux-6.0.12-76060006-generic-x86_64-with-glibc2.35
    • Python version: 3.10.6
    • Huggingface_hub version: 0.11.1
    • PyTorch version (GPU?): 1.13.0+cu117 (True)
    • Tensorflow version (GPU?): not installed (NA)
    • Flax version (CPU?/GPU?/TPU?): not installed (NA)
    • Jax version: not installed
    • JaxLib version: not installed
    • Using GPU in script?: yes
    • Using distributed or parallel set-up in script?: no

    Then, given this comment in the X-CLIP issues, I also tried:

    • transformers version: 4.25.1
    • Platform: Linux-6.0.12-76060006-generic-x86_64-with-glibc2.35
    • Python version: 3.8.16
    • Huggingface_hub version: 0.11.1
    • PyTorch version (GPU?): 1.8.0+cu111 (True)
    • Tensorflow version (GPU?): not installed (NA)
    • Flax version (CPU?/GPU?/TPU?): not installed (NA)
    • Jax version: not installed
    • JaxLib version: not installed
    • Using GPU in script?: yes
    • Using distributed or parallel set-up in script?: no

    Who can help?

    @NielsRogge tagging you since you've added the code for X-CLIP to the library and also commented in the X-CLIP issue I've mentioned above.

    Information

    • [X] The official example scripts
    • [X] My own modified scripts

    Tasks

    • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
    • [X] My own task or dataset (give details below)

    Reproduction

    1. I have first copied the example code provided in the library documentation, which worked.
    2. Then I've extended my notebook to process data from my (currently) private dataset, but still following exactly the example code. This is where I've noticed that the inference took a few seconds, so...
    3. I have compiled decord from source, which allowed me to run the data processing on the GPU. This worked, but it didn't provide any performance improvement, so I reverted to the PyPI version.
    4. I tried manually moving the model to the GPU, with model.to("cuda"), model.to("cuda:0"), model.to(torch.device("cuda")) and model.cuda(). All of these make the Jupyter Lab kernel crash with no error in the logs. If reloaded, the model still works, but only runs on CPU.
    5. I also tried replacing XClipModel with other video classification models, such as TimesformerForVideoClassification. Since this model is not included in the stable release yet, I uninstalled transformers v4.25.1 and installed the current main branch (v4.26.0-dev). This still only ran on CPU and refused to work on GPU.
    6. I have then found this comment about my exact problem in the microsoft/VideoX issues, saying they solved it by downgrading to PyTorch 1.8.0, which I did (from 1.13.0) after also downgrading Python (from 3.10 to 3.8 due to PyTorch compatibility). With this change, instantiating the model made the kernel crash immediately. My guess is that between PyTorch 1.8.0 and 1.13.0 a fallback to the CPU if the model couldn't be loaded into GPU was introduced.

    Other details:

    • Linux distro: Pop!_OS 22.04
    • CPU: Ryzen 5 5600X
    • GPU: NVIDIA RTX 3090
    • RAM: 16GB (even though limited, the model which I'm trying to load (microsoft/xclip-base-patch16-zero-shot) should fit with no problem)
    • NVIDIA driver 525.60.11
    • CUDA 11.2 (installed with the system76-cuda-latest metapackage) -- even though nvidia-smi reports CUDA 12.0, could this be an issue?

    Expected behavior

    The model should be loaded into the GPU automatically, like other models that currently work flawlessly for me such as BART. At least, manually moving the model to the GPU should work without segfaulting.

  • Token embedding resizing does not work for TFGPT2Model

    Token embedding resizing does not work for TFGPT2Model

    System Info

    • transformers version: 4.25.1
    • Platform: Linux-5.15.0-57-generic-x86_64-with-glibc2.35
    • Python version: 3.9.16
    • Huggingface_hub version: 0.11.1
    • PyTorch version (GPU?): not installed (NA)
    • Tensorflow version (GPU?): 2.11.0 (True)
    • Flax version (CPU?/GPU?/TPU?): not installed (NA)
    • Jax version: not installed
    • JaxLib version: not installed
    • Using GPU in script?: No
    • Using distributed or parallel set-up in script?: No

    Who can help?

    @gante and @Rocketknight1

    Information

    • [ ] The official example scripts
    • [X] My own modified scripts

    Tasks

    • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
    • [X] My own task or dataset (give details below)

    Reproduction

    After add_special_tokens to tokenizer and resize_token_embeddings on TFGPT2Model, evaluating the model results in an error that indicates that the embeddings are not resized as expected.

    Please see the example code and the execution output below:

    from transformers import GPT2Tokenizer, TFGPT2Model
    
    SPECIAL_TOKENS_MAPPING = {
        'bos_token': '<bos>',
        'eos_token': '<eos>',
        'pad_token': '<pad>',
        'additional_special_tokens': ['<speaker1>', '<speaker2>']
    }
    
    tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
    model = TFGPT2Model.from_pretrained("gpt2")
    
    print("Evaluating TFGPT2Model BEFORE extending the tokenizer and model with additional tokens ...")
    
    inputs = tokenizer("Hello, my dog is cute", return_tensors="tf")
    print(f"inputs = \n{inputs}\n")
    
    outputs = model(inputs)
    print(f"DONE!")
    
    print("Adding tokens...")
    orig_num_tokens = len(tokenizer.get_vocab())
    num_special_tokens = tokenizer.add_special_tokens(SPECIAL_TOKENS_MAPPING)
    print(f"orig_num_tokens = {orig_num_tokens}, num_special_tokens={num_special_tokens}")
    
    model.resize_token_embeddings(new_num_tokens=orig_num_tokens + num_special_tokens)
    
    print("Evaluating TFGPT2Model AFTER extending the tokenizer and model with additional tokens ...")
    
    inputs = tokenizer("<speaker1>Hello, my dog is cute<speaker2>I agree!", return_tensors="tf")
    print(f"inputs = \n{inputs}\n")
    
    outputs = model(inputs)
    print(f"DONE!")
    
    Evaluating TFGPT2Model BEFORE extending the tokenizer and model with additional tokens ...
    inputs = 
    {'input_ids': <tf.Tensor: shape=(1, 6), dtype=int32, numpy=array([[15496,    11,   616,  3290,   318, 13779]], dtype=int32)>, 'attention_mask': <tf.Tensor: shape=(1, 6), dtype=int32, numpy=array([[1, 1, 1, 1, 1, 1]], dtype=int32)>}
    
    DONE!
    
    Adding tokens...
    orig_num_tokens = 50257, num_special_tokens=5
    
    Evaluating TFGPT2Model AFTER extending the tokenizer and model with additional tokens ...
    inputs = 
    {'input_ids': <tf.Tensor: shape=(1, 11), dtype=int32, numpy=
    array([[50260, 15496,    11,   616,  3290,   318, 13779, 50261,    40,
             4236,     0]], dtype=int32)>, 'attention_mask': <tf.Tensor: shape=(1, 11), dtype=int32, numpy=array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], dtype=int32)>}
    
    Traceback (most recent call last):
      File "/home/freddy/workspace/Nuhame/mlpug/examples/chatbot/tensorflow/test_tf_resize_token_size.py", line 33, in <module>
        outputs = model(inputs)
      File "/home/freddy/.virtualenvs/mlpug-tf/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
        raise e.with_traceback(filtered_tb) from None
      File "/home/freddy/.virtualenvs/mlpug-tf/lib/python3.9/site-packages/transformers/modeling_tf_utils.py", line 432, in run_call_with_unpacked_inputs
        return func(self, **unpacked_inputs)
      File "/home/freddy/.virtualenvs/mlpug-tf/lib/python3.9/site-packages/transformers/models/gpt2/modeling_tf_gpt2.py", line 773, in call
        outputs = self.transformer(
      File "/home/freddy/.virtualenvs/mlpug-tf/lib/python3.9/site-packages/transformers/modeling_tf_utils.py", line 432, in run_call_with_unpacked_inputs
        return func(self, **unpacked_inputs)
      File "/home/freddy/.virtualenvs/mlpug-tf/lib/python3.9/site-packages/transformers/models/gpt2/modeling_tf_gpt2.py", line 447, in call
        tf.debugging.assert_less(
    tensorflow.python.framework.errors_impl.InvalidArgumentError: Exception encountered when calling layer 'transformer' (type TFGPT2MainLayer).
    
    input_ids must be smaller than the embedding layer's input dimension (got 50261 >= 50257)
    Condition x < y did not hold.
    First 3 elements of x:
    [50260 15496    11]
    First 1 elements of y:
    [50257]
    
    Call arguments received by layer 'transformer' (type TFGPT2MainLayer):
      • input_ids=tf.Tensor(shape=(1, 11), dtype=int32)
      • past_key_values=None
      • attention_mask=tf.Tensor(shape=(1, 11), dtype=int32)
      • token_type_ids=None
      • position_ids=None
      • head_mask=None
      • inputs_embeds=None
      • encoder_hidden_states=None
      • encoder_attention_mask=None
      • use_cache=True
      • output_attentions=False
      • output_hidden_states=False
      • return_dict=True
      • training=False
    

    Expected behavior

    The model should have 50257 + 5 = 50262 embeddings after resizing and thus an input ID with value 50261 should not result in any errors. The above code should run without errors.

  • Fine-tune GIT on custom dataset [Expected input batch_size to match target batch_size]

    Fine-tune GIT on custom dataset [Expected input batch_size to match target batch_size]

    System Info

    • transformers version: 4.26.0.dev0
    • Platform: Linux-5.10.147+-x86_64-with-glibc2.27
    • Python version: 3.8.16
    • Huggingface_hub version: 0.11.1
    • PyTorch version (GPU?): 1.13.0+cu116 (True)
    • Tensorflow version (GPU?): 2.9.2 (True)
    • Flax version (CPU?/GPU?/TPU?): not installed (NA)
    • Jax version: not installed
    • JaxLib version: not installed
    • Using GPU in script?: Yes
    • Using distributed or parallel set-up in script?: No

    Who can help?

    None

    Information

    • [X] My own modified scripts

    Tasks

    • [X] My own task or dataset (give details below)

    Reproduction

    from transformers import AutoProcessor, AutoModelForCausalLM
    import numpy as np
    from decord import VideoReader, cpu
    
    processor = AutoProcessor.from_pretrained("microsoft/git-base-vatex")
    model = AutoModelForCausalLM.from_pretrained("microsoft/git-base-vatex")
    
    np.random.seed(45)
    
    
    def sample_frame_indices(clip_len, frame_sample_rate, seg_len):
        converted_len = int(clip_len * frame_sample_rate)
        end_idx = np.random.randint(converted_len, seg_len)
        start_idx = end_idx - converted_len
        indices = np.linspace(start_idx, end_idx, num=clip_len)
        indices = np.clip(indices, start_idx, end_idx - 1).astype(np.int64)
        return indices
    
    
    def sample_frames(file_path, num_frames):
        videoreader = VideoReader(file_path, num_threads=1, ctx=cpu(0))
        videoreader.seek(0)
        indices = sample_frame_indices(clip_len=num_frames, frame_sample_rate=4, seg_len=len(videoreader))
        frames = videoreader.get_batch(indices).asnumpy()
        return list(frames)
    
    
    file_path = "path to video"
    
    video_caption = "any caption"
    
    num_frames = model.config.num_image_with_embedding
    frames = sample_frames(file_path, num_frames)
    
    batch_data = processor(images=frames, text=video_caption, return_tensors="pt", padding="max_length", truncation=True)
    pixel_values = batch_data.pixel_values
    input_ids = batch_data.input_ids
    attention_mask = batch_data.attention_mask
    
    labels = processor.tokenizer.encode(
                video_caption, max_length= 512, pad_to_max_length=True, return_tensors='pt'
            )
    
    outputs = model(input_ids=input_ids, pixel_values=pixel_values, attention_mask=attention_mask, labels=labels)
    

    Error:

    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    [<ipython-input-4-944c599a18af>](https://localhost:8080/#) in <module>
         43         )
         44 
    ---> 45 outputs = model(input_ids=input_ids, pixel_values=pixel_values, attention_mask=attention_mask, labels=labels)
    
    4 frames
    [/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
       1188         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
       1189                 or _global_forward_hooks or _global_forward_pre_hooks):
    -> 1190             return forward_call(*input, **kwargs)
       1191         # Do not call functions when jit is used
       1192         full_backward_hooks, non_full_backward_hooks = [], []
    
    [/usr/local/lib/python3.8/dist-packages/transformers/models/git/modeling_git.py](https://localhost:8080/#) in forward(self, input_ids, attention_mask, position_ids, pixel_values, head_mask, inputs_embeds, labels, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
       1494             labels = labels[:, 1:].contiguous()
       1495             loss_fct = CrossEntropyLoss()
    -> 1496             lm_loss = loss_fct(shifted_logits.view(-1, self.config.vocab_size), labels.view(-1))
       1497 
       1498         if not return_dict:
    
    [/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
       1188         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
       1189                 or _global_forward_hooks or _global_forward_pre_hooks):
    -> 1190             return forward_call(*input, **kwargs)
       1191         # Do not call functions when jit is used
       1192         full_backward_hooks, non_full_backward_hooks = [], []
    
    [/usr/local/lib/python3.8/dist-packages/torch/nn/modules/loss.py](https://localhost:8080/#) in forward(self, input, target)
       1172 
       1173     def forward(self, input: Tensor, target: Tensor) -> Tensor:
    -> 1174         return F.cross_entropy(input, target, weight=self.weight,
       1175                                ignore_index=self.ignore_index, reduction=self.reduction,
       1176                                label_smoothing=self.label_smoothing)
    
    [/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py](https://localhost:8080/#) in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
       3024     if size_average is not None or reduce is not None:
       3025         reduction = _Reduction.legacy_get_string(size_average, reduce)
    -> 3026     return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
       3027 
       3028 
    
    ValueError: Expected input batch_size (1693) to match target batch_size (511).
    

    Expected behavior

    What should I submit to labels with fine-tune GIT? Input_id - get from processor pixel_values - get from processor attention_mask - get from processor And what value should the labels variable take in order for me to get a loss? Trying to submit logits from tokenizer - error. What am I doing wrong?

CLOOB training (JAX) and inference (JAX and PyTorch)

cloob-training Pretrained models There are two pretrained CLOOB models in this repo at the moment, a 16 epoch and a 32 epoch ViT-B/16 checkpoint train

Nov 27, 2022
A state of the art of new lightweight YOLO model implemented by TensorFlow 2.
A state of the art of new lightweight YOLO model implemented by TensorFlow 2.

CSL-YOLO: A New Lightweight Object Detection System for Edge Computing This project provides a SOTA level lightweight YOLO called "Cross-Stage Lightwe

Dec 21, 2022
GAN JAX - A toy project to generate images from GANs with JAX
 GAN JAX - A toy project to generate images from GANs with JAX

GAN JAX - A toy project to generate images from GANs with JAX This project aims to bring the power of JAX, a Python framework developped by Google and

Nov 29, 2022
Mini-hmc-jax - A simple implementation of Hamiltonian Monte Carlo in JAX

mini-hmc-jax This is a simple implementation of Hamiltonian Monte Carlo in JAX t

Mar 3, 2022
tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.
tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.

Time series Timeseries Deep Learning Pytorch fastai - State-of-the-art Deep Learning with Time Series and Sequences in Pytorch / fastai

Jan 8, 2023
deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.
deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

Oct 17, 2022
LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models
LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models

LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models. Developers can reproduce these SOTA methods and build their own methods.

Jan 4, 2023
State-of-the-art data augmentation search algorithms in PyTorch
State-of-the-art data augmentation search algorithms in PyTorch

MuarAugment Description MuarAugment is a package providing the easiest way to a state-of-the-art data augmentation pipeline. How to use You can instal

Dec 12, 2022
😇A pyTorch implementation of the DeepMoji model: state-of-the-art deep learning model for analyzing sentiment, emotion, sarcasm etc

------ Update September 2018 ------ It's been a year since TorchMoji and DeepMoji were released. We're trying to understand how it's being used such t

Dec 24, 2022
Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch
Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch

NÜWA - Pytorch (wip) Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch. This repository will be popul

Dec 28, 2022
Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch
Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch

?? Flamingo - Pytorch Implementation of Flamingo, state-of-the-art few-shot visual question answering attention net, in Pytorch. It will include the p

Dec 28, 2022
TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.

TorchMultimodal (Alpha Release) Introduction TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.

Jan 6, 2023
Implementation of ETSformer, state of the art time-series Transformer, in Pytorch
Implementation of ETSformer, state of the art time-series Transformer, in Pytorch

ETSformer - Pytorch Implementation of ETSformer, state of the art time-series Transformer, in Pytorch Install $ pip install etsformer-pytorch Usage im

Dec 30, 2022
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Jan 3, 2023
Deep learning operations reinvented (for pytorch, tensorflow, jax and others)
Deep learning operations reinvented (for pytorch, tensorflow, jax and others)

This video in better quality. einops Flexible and powerful tensor operations for readable and reliable code. Supports numpy, pytorch, tensorflow, and

Jan 1, 2023
Propose a principled and practically effective framework for unsupervised accuracy estimation and error detection tasks with theoretical analysis and state-of-the-art performance.
Propose a principled and practically effective framework for unsupervised accuracy estimation and error detection tasks with theoretical analysis and state-of-the-art performance.

Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles This project is for the paper: Detecting Errors and Estimating

Nov 21, 2022
Deep Learning for Natural Language Processing SS 2021 (TU Darmstadt)

Deep Learning for Natural Language Processing SS 2021 (TU Darmstadt) Task Training huge unsupervised deep neural networks yields to strong progress in

Aug 5, 2022