🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.



Build GitHub Documentation GitHub release Contributor Covenant

State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0

🤗 Transformers provides thousands of pretrained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, text generation, etc in 100+ languages. Its aim is to make cutting-edge NLP easier to use for everyone.

🤗 Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets then share them with the community on our model hub. At the same time, each python module defining an architecture can be used as a standalone and modified to enable quick research experiments.

🤗 Transformers is backed by the two most popular deep learning libraries, PyTorch and TensorFlow, with a seamless integration between them, allowing you to train your models with one then load it for inference with the other.

Online demos

You can test most of our models directly on their pages from the model hub. We also offer private model hosting, versioning, & an inference API to use those models.

Here are a few examples:

Write With Transformer, built by the Hugging Face team, is the official demo of this repo’s text generation capabilities.

Quick tour

To immediately use a model on a given text, we provide the pipeline API. Pipelines group together a pretrained model with the preprocessing that was used during that model training. Here is how to quickly use a pipeline to classify positive versus negative texts

>>> from transformers import pipeline

# Allocate a pipeline for sentiment-analysis
>>> classifier = pipeline('sentiment-analysis')
>>> classifier('We are very happy to include pipeline into the transformers repository.')
[{'label': 'POSITIVE', 'score': 0.9978193640708923}]

The second line of code downloads and caches the pretrained model used by the pipeline, the third line evaluates it on the given text. Here the answer is "positive" with a confidence of 99.8%.

This is another example of pipeline used for that can extract question answers from some context:

>>> from transformers import pipeline

# Allocate a pipeline for question-answering
>>> question_answerer = pipeline('question-answering')
>>> question_answerer({
...     'question': 'What is the name of the repository ?',
...     'context': 'Pipeline have been included in the huggingface/transformers repository'
... })
{'score': 0.5135612454720828, 'start': 35, 'end': 59, 'answer': 'huggingface/transformers'}

On top of the answer, the pretrained model used here returned its confidence score, along with the start position and its end position in the tokenized sentence. You can learn more about the tasks supported by the pipeline API in this tutorial.

To download and use any of the pretrained models on your given task, you just need to use those three lines of codes (PyTorch version):

>>> from transformers import AutoTokenizer, AutoModel

>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
>>> model = AutoModel.from_pretrained("bert-base-uncased")

>>> inputs = tokenizer("Hello world!", return_tensors="pt")
>>> outputs = model(**inputs)

or for TensorFlow:

>>> from transformers import AutoTokenizer, TFAutoModel

>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
>>> model = TFAutoModel.from_pretrained("bert-base-uncased")

>>> inputs = tokenizer("Hello world!", return_tensors="tf")
>>> outputs = model(**inputs)

The tokenizer is responsible for all the preprocessing the pretrained model expects, and can be called directly on one (or list) of texts (as we can see on the fourth line of both code examples). It will output a dictionary you can directly pass to your model (which is done on the fifth line).

The model itself is a regular Pytorch nn.Module or a TensorFlow tf.keras.Model (depending on your backend) which you can use normally. For instance, this tutorial explains how to integrate such a model in classic PyTorch or TensorFlow training loop, or how to use our Trainer API to quickly fine-tune the on a new dataset.

Why should I use transformers?

  1. Easy-to-use state-of-the-art models:

    • High performance on NLU and NLG tasks.
    • Low barrier to entry for educators and practitioners.
    • Few user-facing abstractions with just three classes to learn.
    • A unified API for using all our pretrained models.
  2. Lower compute costs, smaller carbon footprint:

    • Researchers can share trained models instead of always retraining.
    • Practitioners can reduce compute time and production costs.
    • Dozens of architectures with over 2,000 pretrained models, some in more than 100 languages.
  3. Choose the right framework for every part of a model's lifetime:

    • Train state-of-the-art models in 3 lines of code.
    • Move a single model between TF2.0/PyTorch frameworks at will.
    • Seamlessly pick the right framework for training, evaluation, production.
  4. Easily customize a model or an example to your needs:

    • Examples for each architecture to reproduce the results by the official authors of said architecture.
    • Expose the models internal as consistently as possible.
    • Model files can be used independently of the library for quick experiments.

Why shouldn't I use transformers?

  • This library is not a modular toolbox of building blocks for neural nets. The code in the model files is not refactored with additional abstractions on purpose, so that researchers can quickly iterate on each of the models without diving in additional abstractions/files.
  • The training API is not intended to work on any model but is optimized to work with the models provided by the library. For generic machine learning loops, you should use another library.
  • While we strive to present as many use cases as possible, the scripts in our examples folder are just that: examples. It is expected that they won't work out-of-the box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs.

Installation

With pip

This repository is tested on Python 3.6+, PyTorch 1.0.0+ (PyTorch 1.3.1+ for examples) and TensorFlow 2.0.

You should install 🤗 Transformers in a virtual environment. If you're unfamiliar with Python virtual environments, check out the user guide.

First, create a virtual environment with the version of Python you're going to use and activate it.

Then, you will need to install at least one of TensorFlow 2.0, PyTorch or Flax. Please refer to TensorFlow installation page, PyTorch installation page regarding the specific install command for your platform and/or Flax installation page.

When TensorFlow 2.0 and/or PyTorch has been installed, 🤗 Transformers can be installed using pip as follows:

pip install transformers

If you'd like to play with the examples or need the bleeding edge of the code and can't wait for a new release, you must install the library from source.

With conda

Since Transformers version v4.0.0, we now have a conda channel: huggingface.

🤗 Transformers can be installed using conda as follows:

conda install -c huggingface transformers

Follow the installation pages of TensorFlow, PyTorch or Flax to see how to install them with conda.

Models architectures

All the model checkpoints provided by 🤗 Transformers are seamlessly integrated from the huggingface.co model hub where they are uploaded directly by users and organizations.

Current number of checkpoints:

🤗 Transformers currently provides the following architectures (see here for a high-level summary of each them):

  1. ALBERT (from Google Research and the Toyota Technological Institute at Chicago) released with the paper ALBERT: A Lite BERT for Self-supervised Learning of Language Representations, by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
  2. BART (from Facebook) released with the paper BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
  3. BARThez (from École polytechnique) released with the paper BARThez: a Skilled Pretrained French Sequence-to-Sequence Model by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.
  4. BERT (from Google) released with the paper BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
  5. BERT For Sequence Generation (from Google) released with the paper Leveraging Pre-trained Checkpoints for Sequence Generation Tasks by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
  6. Blenderbot (from Facebook) released with the paper Recipes for building an open-domain chatbot by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
  7. BlenderbotSmall (from Facebook) released with the paper Recipes for building an open-domain chatbot by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
  8. BORT (from Alexa) released with the paper Optimal Subarchitecture Extraction For BERT by Adrian de Wynter and Daniel J. Perry.
  9. CamemBERT (from Inria/Facebook/Sorbonne) released with the paper CamemBERT: a Tasty French Language Model by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
  10. ConvBERT (from YituTech) released with the paper ConvBERT: Improving BERT with Span-based Dynamic Convolution by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
  11. CTRL (from Salesforce) released with the paper CTRL: A Conditional Transformer Language Model for Controllable Generation by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
  12. DeBERTa (from Microsoft Research) released with the paper DeBERTa: Decoding-enhanced BERT with Disentangled Attention by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
  13. DialoGPT (from Microsoft Research) released with the paper DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
  14. DistilBERT (from HuggingFace), released together with the paper DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into DistilGPT2, RoBERTa into DistilRoBERTa, Multilingual BERT into DistilmBERT and a German version of DistilBERT.
  15. DPR (from Facebook) released with the paper Dense Passage Retrieval for Open-Domain Question Answering by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
  16. ELECTRA (from Google Research/Stanford University) released with the paper ELECTRA: Pre-training text encoders as discriminators rather than generators by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
  17. FlauBERT (from CNRS) released with the paper FlauBERT: Unsupervised Language Model Pre-training for French by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
  18. Funnel Transformer (from CMU/Google Brain) released with the paper Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
  19. GPT (from OpenAI) released with the paper Improving Language Understanding by Generative Pre-Training by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
  20. GPT-2 (from OpenAI) released with the paper Language Models are Unsupervised Multitask Learners by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
  21. LayoutLM (from Microsoft Research Asia) released with the paper LayoutLM: Pre-training of Text and Layout for Document Image Understanding by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
  22. LED (from AllenAI) released with the paper Longformer: The Long-Document Transformer by Iz Beltagy, Matthew E. Peters, Arman Cohan.
  23. Longformer (from AllenAI) released with the paper Longformer: The Long-Document Transformer by Iz Beltagy, Matthew E. Peters, Arman Cohan.
  24. LXMERT (from UNC Chapel Hill) released with the paper LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering by Hao Tan and Mohit Bansal.
  25. MarianMT Machine translation models trained using OPUS data by Jörg Tiedemann. The Marian Framework is being developed by the Microsoft Translator Team.
  26. MBart (from Facebook) released with the paper Multilingual Denoising Pre-training for Neural Machine Translation by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
  27. MPNet (from Microsoft Research) released with the paper MPNet: Masked and Permuted Pre-training for Language Understanding by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
  28. MT5 (from Google AI) released with the paper mT5: A massively multilingual pre-trained text-to-text transformer by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
  29. Pegasus (from Google) released with the paper PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization> by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
  30. ProphetNet (from Microsoft Research) released with the paper ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
  31. Reformer (from Google Research) released with the paper Reformer: The Efficient Transformer by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
  32. RoBERTa (from Facebook), released together with the paper a Robustly Optimized BERT Pretraining Approach by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
  33. SqueezeBert released with the paper SqueezeBERT: What can computer vision teach NLP about efficient neural networks? by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
  34. T5 (from Google AI) released with the paper Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
  35. TAPAS (from Google AI) released with the paper TAPAS: Weakly Supervised Table Parsing via Pre-training by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
  36. Transformer-XL (from Google/CMU) released with the paper Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
  37. Wav2Vec2 (from Facebook AI) released with the paper wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
  38. XLM (from Facebook) released together with the paper Cross-lingual Language Model Pretraining by Guillaume Lample and Alexis Conneau.
  39. XLM-ProphetNet (from Microsoft Research) released with the paper ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
  40. XLM-RoBERTa (from Facebook AI), released together with the paper Unsupervised Cross-lingual Representation Learning at Scale by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
  41. XLNet (from Google/CMU) released with the paper ​XLNet: Generalized Autoregressive Pretraining for Language Understanding by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
  42. Want to contribute a new model? We have added a detailed guide and templates to guide you in the process of adding a new model. You can find them in the templates folder of the repository. Be sure to check the contributing guidelines and contact the maintainers or open an issue to collect feedbacks before starting your PR.

To check if each model has an implementation in PyTorch/TensorFlow/Flax or has an associated tokenizer backed by the 🤗 Tokenizers library, refer to this table

These implementations have been tested on several datasets (see the example scripts) and should match the performances of the original implementations. You can find more details on the performances in the Examples section of the documentation.

Learn more

Section Description
Documentation Full API documentation and tutorials
Task summary Tasks supported by 🤗 Transformers
Preprocessing tutorial Using the Tokenizer class to prepare data for the models
Training and fine-tuning Using the models provided by 🤗 Transformers in a PyTorch/TensorFlow training loop and the Trainer API
Quick tour: Fine-tuning/usage scripts Example scripts for fine-tuning models on a wide range of tasks
Model sharing and uploading Upload and share your fine-tuned models with the community
Migration Migrate to 🤗 Transformers from pytorch-transformers or pytorch-pretrained-bert

Citation

We now have a paper you can cite for the 🤗 Transformers library:

@inproceedings{wolf-etal-2020-transformers,
    title = "Transformers: State-of-the-Art Natural Language Processing",
    author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = oct,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.emnlp-demos.6",
    pages = "38--45"
}
Owner
Comments
  • How to use BERT for finding similar sentences or similar news?

    How to use BERT for finding similar sentences or similar news?

    I have used BERT NextSentencePredictor to find similar sentences or similar news, However, It's super slow. Even on Tesla V100 which is the fastest GPU till now. It takes around 10secs for a query title with around 3,000 articles. Is there a way to use BERT better for finding similar sentences or similar news given a corpus of news articles?

  • Summarization Fine Tuning

    Summarization Fine Tuning

    ❓ Questions & Help

    Details

    I tried using T5 and Bart but the abstraction summarization on scientific texts does not seem to give the results I want since I think they are both trained on news corpora. I have scraped all of the free PMC articles and I am thinking about fine-tuning a seq2seq model between the articles and their abstracts to make an abstractive summarizer for scientific texts. This Medium article (https://medium.com/huggingface/encoder-decoders-in-transformers-a-hybrid-pre-trained-architecture-for-seq2seq-af4d7bf14bb8) provides a bit of an introduction to how to approach this but does not quite go into detail so I am wondering how to approach this.

    I'm not really asking for help being stuck but I just don't really know how to approach this problem.

    A link to original question on Stack Overflow: https://stackoverflow.com/questions/61826443/train-custom-seq2seq-transformers-model

  • ONNXConfig: Add a configuration for all available models

    ONNXConfig: Add a configuration for all available models

    This issue is about the working group specially created for this task. If you are interested in helping out, take a look at this organization, or add me on Discord: ChainYo#3610

    We want to contribute to HuggingFace's ONNX implementation for all available models on HF's hub. There are already a lot of architectures implemented for converting PyTorch models to ONNX, but we need more! We need them all!

    Feel free to join us in this adventure! Join the org by clicking here

    Here is a non-exhaustive list of models that all models available:

    • [x] Albert
    • [x] BART
    • [x] BeiT
    • [x] BERT
    • [x] BigBird
    • [x] BigBirdPegasus
    • [x] Blenderbot
    • [x] BlenderbotSmall
    • [x] BLOOM
    • [x] CamemBERT
    • [ ] CANINE
    • [x] CLIP
    • [x] CodeGen
    • [x] ConvNext
    • [x] ConvBert
    • [ ] CTRL
    • [ ] CvT
    • [x] Data2VecText
    • [x] Data2VecVision
    • [x] Deberta
    • [x] DebertaV2
    • [x] DeiT
    • [ ] DecisionTransformer
    • [x] DETR
    • [x] Distilbert
    • [ ] DPR
    • [ ] DPT
    • [x] ELECTRA
    • [ ] FNet
    • [ ] FSMT
    • [x] Flaubert
    • [ ] FLAVA
    • [ ] Funnel Transformer
    • [ ] GLPN
    • [x] GPT2
    • [x] GPTJ
    • [x] GPT-Neo
    • [ ] GPT-NeoX
    • [ ] Hubert
    • [x] I-Bert
    • [ ] ImageGPT
    • [ ] LED
    • [x] LayoutLM
    • [ ] 🛠️ LayoutLMv2
    • [x] LayoutLMv3
    • [ ] LayoutXLM
    • [ ] LED
    • [x] LeViT
    • [x] Longformer
    • [x] LongT5
    • [ ] 🛠️ Luke
    • [ ] Lxmert
    • [x] M2M100
    • [ ] MaskFormer
    • [x] mBart
    • [ ] MCTCT
    • [ ] MPNet
    • [x] MT5
    • [x] MarianMT
    • [ ] MegatronBert
    • [x] MobileBert
    • [x] MobileViT
    • [ ] Nyströmformer
    • [x] OpenAIGPT-2
    • [ ] 🛠️ OPT
    • [x] OWLViT
    • [x] PLBart
    • [ ] Pegasus
    • [x] Perceiver
    • [ ] PoolFormer
    • [ ] ProphetNet
    • [ ] QDQBERT
    • [ ] RAG
    • [ ] REALM
    • [ ] 🛠️ Reformer
    • [x] RemBert
    • [x] ResNet
    • [ ] RegNet
    • [ ] RetriBert
    • [x] RoFormer
    • [x] RoBERTa
    • [ ] SEW
    • [ ] SEW-D
    • [ ] SegFormer
    • [ ] Speech2Text
    • [ ] Speech2Text2
    • [ ] Splinter
    • [x] SqueezeBERT
    • [ ] Swin Transformer
    • [x] T5
    • [ ] TAPAS
    • [ ] TAPEX
    • [ ] Transformer XL
    • [x] TrOCR
    • [ ] UniSpeech
    • [ ] UniSpeech-SAT
    • [ ] VAN
    • [x] ViT
    • [ ] Vilt
    • [ ] VisualBERT
    • [ ] Wav2Vec2
    • [ ] WavLM
    • [ ] XGLM
    • [x] XLM
    • [ ] XLMProphetNet
    • [x] XLM-RoBERTa
    • [x] XLM-RoBERTa-XL
    • [ ] 🛠️ XLNet
    • [x] YOLOS
    • [ ] Yoso

    🛠️ next to a model suggests that the PR is in progress. If there is nothing next to a model, it means that ONNX does not yet support the model, and thus we need to add support for it.

    If you need help implementing an unsupported model, here is a guide from HuggingFace's documentation.

    If you want an example of implementation, I did one for CamemBERT months ago.

  • GPT-J-6B

    GPT-J-6B

    What does this PR do?

    Introduces the long awaited GPT J model class to HuggingFace! Concurrently with this PR being merged I will make a GPT J 6B checkpoint public on the EleutherAI HF page for people to use. The model has been evaluated as being within error tolerances of the GPT J 6B model we released in Jax two months ago.

    @patil-suraj was very helpful in assisting me to understand HF philosophy and how to make this PR most in line with the rest of the codebase. Other than that, the major design consideration was to make the configs compatible with GPT-2 rather than GPT-Neo. GPT-Neo has some usability limitations due to its configs having names unrelated to GPT-2’s (see #12183 for details). Given those problems and my hope that GPT-Neo will have it’s configs updated in the future, it seemed like a clear choice to align GPT J with GPT-2.

    Shout outs to @finetuneanon whose implementation this one is based off of, as well as @kumuruz for assistence optimizing and debugging.

    Supersedes #12243 #13010 #13022

    Closes #12098

    Before submitting

    • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
    • [X] Did you read the contributor guideline, Pull Request section?
    • [X] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case. It was discussed in Slack with @patil-suraj
    • [X] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
    • [X] Did you write any new necessary tests?

    Who can review?

    • gpt2: @patrickvonplaten, @LysandreJik, @patil-suraj
  • [DeepSpeed] [success] trained t5-11b on 1x 40GB gpu

    [DeepSpeed] [success] trained t5-11b on 1x 40GB gpu

    Managed to train t5-11b on 1x 40GB gpu w/ Deepspeed (A100-SXM4-40GB)

    Thank you, @PeterAJansen for letting me use your hardware!

    Thank you, @jeffra and @samyam, for not believing that it is not possible to train t5-11b on 1x 40GB gpu w/ Deepspeed and supporting me that lead me to find a few bugs in the integration.

    Sharing details for those who need.

    If you want to try this at home please make sure you use transformers master as some bug fixes were just merged in

    Well, it's similar to the t5-3b on 24GB success reported here and here. But this time t5-11b on 1x 40GB gpu (or 4x if you wanted things faster)

    As someone asked me before you need a huge amount of general RAM to use ZeRO-Offload for a huge model:

    • for t5-3b on 1x 24GB gpu: ~71GB RAM
    • for t5-11b on 1x 40GB gpu: ~234GB RAM

    I was using /usr/bin/time -v program to get the peak memory measurement - it's the Maximum resident set size entry in the final report.

    Question: I don't think /usr/bin/time does the right thing for multi-process - I think it only measures the parent process. e.g. with 4x gpus it reported only 102GB RAM, but I clearly saw in top that it was around 240GB. If you have an easy way to measure peak memory that takes into an account forked processes I'm all ears.

    Batch sizes on one gpu:

    • with buffers of 5e8 I was able to run BS=2, which might be too small for training,
    • but with 2e8 I managed to squeeze in BS=10 for training, but OOMed on prediction

    I'm referring to these batch sizes in ds_config.json:

            "allgather_bucket_size": 2e8,
            "reduce_bucket_size": 2e8,
    

    And I tested for 2x and 4x DDP as well, BS=16 OOMed, BS=8 was good so I used that - but could probably squeeze some more.

    edit1: later tests show that my test was too short and wasn't getting the CPU Adam optimizer kick in, as it skips the first 20 or so tests because of the overflow. So once it kicks in it takes more GPU memory, so the practical BS is much smaller - I think around 2 on this setup. So most likely you will need to use BS=2 for real work, until things get optimized even more.

    edit2: things are getting re-shuffling in the tests, so the default ds_config.json file has moved in master to a new, hopefully permanent home. It's now at examples/tests/deepspeed/ds_config.json so you will need to adjust the command line to reflect this new location or simply copy it over to where the old one used to be.

    here is the full benchmark:

    # 1 gpu: 
    # only training fits with this BS, eval needs a smaller BS
    
    export BS=8; rm -rf output_dir; PYTHONPATH=../../src USE_TF=0 /usr/bin/time -v deepspeed --num_gpus=1 ./finetune_trainer.py --model_name_or_path t5-11b --output_dir output_dir --adam_eps 1e-06 --data_dir wmt_en_ro --do_eval --do_predict --do_train --evaluation_strategy=steps --freeze_embeds --label_smoothing 0.1 --learning_rate 3e-5 --logging_first_step --logging_steps 1000 --max_source_length 128 --max_target_length 128 --num_train_epochs 1 --overwrite_output_dir --per_device_eval_batch_size $BS --per_device_train_batch_size $BS --predict_with_generate --eval_steps 25000  --sortish_sampler --task translation_en_to_ro --test_max_target_length 128 --val_max_target_length 128 --warmup_steps 5 --n_train 60 --n_val 10 --n_test 10 --deepspeed ds_config.json --fp16
    
    {'train_runtime': 31.0897, 'train_samples_per_second': 0.257, 'epoch': 1.0}
    
    # 2 gpus:
    
    export BS=8; rm -rf output_dir; PYTHONPATH=../../src USE_TF=0 /usr/bin/time -v deepspeed --num_gpus=2 ./finetune_trainer.py --model_name_or_path t5-11b --output_dir output_dir --adam_eps 1e-06 --data_dir wmt_en_ro --do_eval --do_predict --do_train --evaluation_strategy=steps --freeze_embeds --label_smoothing 0.1 --learning_rate 3e-5 --logging_first_step --logging_steps 1000 --max_source_length 128 --max_target_length 128 --num_train_epochs 1 --overwrite_output_dir --per_device_eval_batch_size $BS --per_device_train_batch_size $BS --predict_with_generate --eval_steps 25000  --sortish_sampler --task translation_en_to_ro --test_max_target_length 128 --val_max_target_length 128 --warmup_steps 5 --n_train 60 --n_val 10 --n_test 10 --deepspeed ds_config.json --fp16
    
    {'train_runtime': 17.9026, 'train_samples_per_second': 0.223, 'epoch': 1.0}
    
    # 4 gpus
    
    export BS=8; rm -rf output_dir; PYTHONPATH=../../src USE_TF=0 /usr/bin/time -v deepspeed --num_gpus=4 ./finetune_trainer.py --model_name_or_path t5-11b --output_dir output_dir --adam_eps 1e-06 --data_dir wmt_en_ro --do_eval --do_predict --do_train --evaluation_strategy=steps --freeze_embeds --label_smoothing 0.1 --learning_rate 3e-5 --logging_first_step --logging_steps 1000 --max_source_length 128 --max_target_length 128 --num_train_epochs 1 --overwrite_output_dir --per_device_eval_batch_size $BS --per_device_train_batch_size $BS --predict_with_generate --eval_steps 25000  --sortish_sampler --task translation_en_to_ro --test_max_target_length 128 --val_max_target_length 128 --warmup_steps 5 --n_train 60 --n_val 10 --n_test 10 --deepspeed ds_config.json --fp16
    
    {'train_runtime': 10.4404, 'train_samples_per_second': 0.192, 'epoch': 1.0}
    

    Checkpointing should allow making even bigger batch sizes.

  • FP16 overflow with GPT-Neo when using sequence lengths of 2048.

    FP16 overflow with GPT-Neo when using sequence lengths of 2048.

    Environment info

    • transformers version: 4.5.0.dev0
    • Platform: Linux-5.4.0-54-generic-x86_64-with-glibc2.29
    • Python version: 3.8.5
    • PyTorch version (GPU?): 1.8.0+cu111
    • Tensorflow version (GPU?): N/A
    • Using GPU in script?: Yes
    • Using distributed or parallel set-up in script?: No

    Who can help

    @stas00

    Models:

    • GPT-Neo 1.3b

    Library:

    • deepspeed: @stas00

    Information

    Model I am using (Bert, XLNet ...):

    The problem arises when using:

    • [ ] the official example scripts: (give details below)
    • [x] my own modified scripts: (give details below)

    The tasks I am working on is:

    • [ ] an official GLUE/SQUaD task: (give the name)
    • [x] my own task or dataset: (give details below)

    To reproduce

    Steps to reproduce the behavior:

    1. Use GPT-Neo 1.3b with The Pile dataset and built in trainer. Artificial data also suffices. It does not matter what the data is, as long as the attention mask spans all 2048 tokens.
    2. Enable FP16 and set max_length to 2048
    3. Observe that all loses reported are NaN

    Also reproducible using AMP or DeepSpeed. It seems like there is code to circumvent this outlined in the GPT-Neo implementation where q,k,v are casted to fp32 in the attention block.

    When the max_length is shorter (512) this overflow does not occur.

    Expected behavior

    I expected no overflows.

    Aside

    I'm reaching out on behalf of EleutherAI, Lysandre told us to create an issue about this.

  • [deepspeed] `bigscience/T0*` multi-gpu inference with ZeRO

    [deepspeed] `bigscience/T0*` multi-gpu inference with ZeRO

    Environment info

    • transformers version: 4.17.0.dev0
    • Platform: Linux-5.13.0-27-generic-x86_64-with-glibc2.10
    • Python version: 3.8.0
    • PyTorch version (GPU?): 1.10.1 (True)
    • Tensorflow version (GPU?): not installed (NA)
    • Flax version (CPU?/GPU?/TPU?): not installed (NA)
    • Jax version: not installed
    • JaxLib version: not installed
    • Using GPU in script?: yes
    • Using distributed or parallel set-up in script?: yes (deepspeed)
    • Note: I installed DeepSpeed from source

    Who can help

    Models: (I'm actually trying to use T0pp but T5 is close enough)

    • T5, BART, Marian, Pegasus, EncoderDecoder: @patrickvonplaten

    Library:

    • Deepspeed: @stas00
    • Text generation: @patrickvonplaten @narsil

    Information

    Model I am using (Bert, XLNet ...): T0pp / T0_3B

    The problem arises when using:

    • [ ] the official example scripts: (give details below)
    • [X] my own modified scripts: (give details below)

    The tasks I am working on is:

    • [ ] an official GLUE/SQUaD task: (give the name)
    • [X] my own task or dataset: (give details below)

    To reproduce

    I want to load T0pp across 2 24GB GPUs and only run inference. I know Deepspeed wit zeRO stage 3 is the way to go for this from reading documentation. I am following the HuggingFace example here to use Deepspeed without a Trainer object.

    The error I get is

    [2022-01-28 18:36:41,193] [INFO] [partition_parameters.py:456:__exit__] finished initializing model with 2.85B parameters
    Traceback (most recent call last):
      File "multi_gpu_T0pp.py", line 26, in <module>
        engine = deepspeed.initialize(model=model, config_params=ds_config)
    AttributeError: module 'transformers.deepspeed' has no attribute 'initialize'
    

    My code:

    Run with CUDA_VISIBLE_DEVICES="0,1" deepspeed <script.py>

    """
    Example code to load a PyTorch model across GPUs
    """
    from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
    from transformers.deepspeed import HfDeepSpeedConfig
    from transformers import deepspeed
    import pandas as pd
    import torch
    import pdb
    import os
    
    seed = 42
    torch.manual_seed(seed)
    
    ds_config = {
        "fp16": {
            "enabled": "auto",
            "loss_scale": 0,
            "loss_scale_window": 1000,
            "initial_scale_power": 16,
            "hysteresis": 2,
            "min_loss_scale": 1
        },
        "zero_optimization": {
            "stage": 3,
            "overlap_comm": true,
            "contiguous_gradients": true,
            "sub_group_size": 1e9,
            "reduce_bucket_size": "auto",
            "stage3_prefetch_bucket_size": "auto",
            "stage3_param_persistence_threshold": "auto",
            "stage3_max_live_parameters": 1e9,
            "stage3_max_reuse_distance": 1e9,
            "stage3_gather_fp16_weights_on_model_save": true
        },
        "gradient_accumulation_steps": 1,
        "gradient_clipping": 0,
        "steps_per_print": 2000,
        "train_batch_size": 2,
        "train_micro_batch_size_per_gpu": 1,
        "wall_clock_breakdown": false
    }
    
    if __name__ == "__main__":
        # must run before instantiating the model
        # ds_config is deepspeed config object or path to the file
        dschf = HfDeepSpeedConfig(ds_config)  # keep this object alive
    
        model_name = "bigscience/T0_3B"
        tokenizer = AutoTokenizer.from_pretrained(model_name)
        model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
    
        engine = deepspeed.initialize(model=model, config_params=ds_config)
    
        inputs = tokenizer.encode(
            "Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy",
            return_tensors="pt")
        outputs = model.generate(inputs)
        print(tokenizer.decode(outputs[0]))
    

    Expected behavior

    T0pp (or T0_3B) to load across 2 GPUs, generate an answer, and then quit.

  • How to use fine-tuned BART for prediction?

    How to use fine-tuned BART for prediction?

    ❓ Questions & Help

    Details

    I fine-tuned the BART model on a custom summarization dataset using the transformers/examples/summarization/bart/finetune.py and transformers/examples/summarization/bart/run_train.sh files in the repository for training (which generated three checkpointepoch=*.ckpt files) and prediction (which generated a .txt file with the test loss scores).

    I have two questions on using this model for prediction:

    • How can I modify finetune.py to generate predictions for the test set, in addition to the loss scores? I see some test functions in finetune.py, but I'm not sure how to use these for generating a .txt file with the predictions.

    • How can I load the generated .ckpt files into BartForConditionalGeneration()? A config.json file was not generated along with the checkpoint files; there doesn't seem to be a TFBartForConditionalGeneration; and the convert_tf_checkpoint_to_pytorch.py script in the repo doesn't seem to support BART yet.

    Thank you for your time!

  • Installation Error - Failed building wheel for tokenizers

    Installation Error - Failed building wheel for tokenizers

    🐛 Bug

    Information

    Model I am using (Bert, XLNet ...): N/A

    Language I am using the model on (English, Chinese ...): N/A

    The problem arises when using:

    • [X] the official example scripts: (give details below)

    Problem arises in transformers installation on Microsoft Windows 10 Pro, version 10.0.17763

    After creating and activating the virtual environment, installing transformers is not possible, because the following error occurs:

    "error: can not find Rust Compiler" "ERROR: Failed building wheel for tokenizers" Failed to build tokenizers ERROR: Could not build wheels for tokenizers which use PEP 517 and cannot be installed d

    The tasks I am working on is: [X ] transformers installation

    To reproduce

    Steps to reproduce the behavior:

    1. From command line interface, create and activate a virtual environment by following the steps in this URL: https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/
    2. Install transformers from source, by following the example in the topic From Source on this URL: https://github.com/huggingface/transformers
    -m pip --version
    -m pip install --upgrade pip
    -m pip install --user virtualenv
    -m venv env
    .\env\Scripts\activate
    pip install transformers
    
    ERROR: Command errored out with exit status 1:
       command: 'c:\users\vbrandao\env\scripts\python.exe' 'c:\users\vbrandao\env\lib\site-packages\pip\_vendor\pep517\_in_process.py' build_wheel 'C:\Users\vbrandao\AppData\Local\Temp\tmpj6evjmze'
           cwd: C:\Users\vbrandao\AppData\Local\Temp\pip-install-sza2_lmj\tokenizers
      Complete output (10 lines):
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build\lib
      creating build\lib\tokenizers
      copying tokenizers\__init__.py -> build\lib\tokenizers
      running build_ext
      running build_rust
      error: Can not find Rust compiler
      ----------------------------------------
      ERROR: Failed building wheel for tokenizers
    Failed to build tokenizers
    ERROR: Could not build wheels for tokenizers which use PEP 517 and cannot be installed directly
    
    

    Expected behavior

    Installation of transformers should be complete.

    Environment info

    • transformers version: N/A - installation step
    • Platform: Command Line Interface / Virtual Env
    • Python version: python 3.8
    • PyTorch version (GPU?): N/A
    • Tensorflow version (GPU?): N/A
    • Using GPU in script?: N/A
    • Using distributed or parallel set-up in script?: N/A tokenizers_intallation_error
  • Add TF ViT MAE

    Add TF ViT MAE

    This PR adds the MAE [1] model in TensorFlow. It was developed by @arig23498 and myself.

    Fun facts about this PR:

    • Probably the third pure vision model in TensorFlow in transformers.

    References:

    [1] Masked Autoencoders Are Scalable Vision Learners

    Update

    The PR is now ready for review. @gante @Rocketknight1 @sgugger

  • Add TFConvNextModel

    Add TFConvNextModel

    This PR adds the ConvNeXt [1] model in TensorFlow. It was developed by @arig23498, @gante, and myself.

    Fun facts about this PR:

    • Probably the first pure conv model in transformers.
    • Probably the second pure vision model in TensorFlow in transformers.

    References:

    [1] A ConvNet for the 2020s: https://arxiv.org/abs/2201.03545.

    @gante @LysandreJik @Rocketknight1

  • Add `min_new_tokens` argument in generate() (implementation based on `MinNewTokensLengthLogitsProcessor`)

    Add `min_new_tokens` argument in generate() (implementation based on `MinNewTokensLengthLogitsProcessor`)

    What does this PR do?

    Fixes #20756 #20814 #20614 (cc @gonced8 @kotikkonstantin)

    As many said, it is better to add an argument min_new_tokens to the .generate() method to limit the length of newly generated tokens. The current parameter min_length limits the length of prompt + newly generated tokens, not the length of newly generated tokens.

    I closed my old PR #20819 and implement this feature based on MinNewTokensLengthLogitsProcessor (see #20892) as suggested by @gante .

    Before submitting

    • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
    • [x] Did you read the contributor guideline, Pull Request section?
    • [x] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
    • [x] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
    • [ ] Did you write any new necessary tests?

    Who can review?

    Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

    • @gante
  • ConvNeXT V2

    ConvNeXT V2

    Model description

    Short Description Just released - ConvNeXt with a new internal layer.

    In this paper, Authors propose a fully convolutional masked autoencoder framework and a new Global Response Normalization (GRN) layer that can be added to the ConvNeXt architecture to enhance inter-channel feature competition.

    Contribution

    I would like to work on this!

    Open source status

    • [X] The model implementation is available
    • [X] The model weights are available

    Provide useful links for the implementation

    Papers https://arxiv.org/abs/2301.00808

    Official Implementations https://github.com/facebookresearch/ConvNeXt-V2

    @NielsRogge

  • pytorch-pretrained-bert and transformers give different results

    pytorch-pretrained-bert and transformers give different results

    System Info

    - `transformers` version: 4.25.1
    - Platform: Linux-5.4.0-122-generic-x86_64-with-glibc2.31
    - Python version: 3.9.5
    - Huggingface_hub version: 0.11.1
    - PyTorch version (GPU?): 1.13.1+cu117 (True)
    - Tensorflow version (GPU?): not installed (NA)
    - Flax version (CPU?/GPU?/TPU?): not installed (NA)
    - Jax version: not installed
    - JaxLib version: not installed
    - Using GPU in script?: <fill in>
    - Using distributed or parallel set-up in script?: <fill in>
    

    Information

    • [ ] The official example scripts
    • [X] My own modified scripts

    Tasks

    • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
    • [X] My own task or dataset (give details below)

    Reproduction

    from pytorch_pretrained_bert import BertTokenizer, BertModel import torch torch.manual_seed(0) tokenizer = BertTokenizer.from_pretrained('bert-large-cased', do_lower_case=False) model = BertModel.from_pretrained('bert-large-cased') model.eval() model.to(device)

    text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]" tokenized_text = tokenizer.tokenize(text) indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text) segments_ids = [1 for x in tokenized_text] tokens_tensor = torch.tensor([indexed_tokens]) segments_tensors = torch.tensor([segments_ids]) tokens_tensor = tokens_tensor.to('cuda') segments_tensors = segments_tensors.to('cuda') with torch.no_grad(): outputs = model(tokens_tensor, segments_tensors)

    #%% import torch from transformers import BertTokenizer, BertModel, BertForMaskedLM torch.manual_seed(0) tokenizer = BertTokenizer.from_pretrained('bert-large-cased', do_lower_case=False) model = BertModel.from_pretrained('bert-large-cased', output_hidden_states=True) model.eval() model.to(device)

    text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]" tokenized_text = tokenizer.tokenize(text) indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text) segments_ids = [1 for x in tokenized_text] tokens_tensor = torch.tensor([indexed_tokens]) segments_tensors = torch.tensor([segments_ids]) tokens_tensor = tokens_tensor.to('cuda') segments_tensors = segments_tensors.to('cuda') with torch.no_grad(): outputs = model(tokens_tensor, segments_tensors)

    Expected behavior

    The outputs from the 24 encoding layers should be identical for transformer and pytorch_pretrained_bert
    

    Checklist

  • low_cpu_mem_usage raises KeyError with modified GPT2 model

    low_cpu_mem_usage raises KeyError with modified GPT2 model

    System Info

    - `transformers` version: 4.25.1
    - Platform: Linux-5.4.0-135-generic-x86_64-with-glibc2.10
    - Python version: 3.8.13
    - Huggingface_hub version: 0.11.1
    - PyTorch version (GPU?): 1.13.0+cu117 (True)
    - Tensorflow version (GPU?): not installed (NA)
    - Flax version (CPU?/GPU?/TPU?): not installed (NA)
    - Jax version: not installed
    - JaxLib version: not installed
    - Using GPU in script?: Not yet
    - Using distributed or parallel set-up in script?: Not yet
    

    Who can help?

    No response

    Information

    • [ ] The official example scripts
    • [X] My own modified scripts

    Tasks

    • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
    • [X] My own task or dataset (give details below)

    Reproduction

    I'm trying to test GPT2 models with different layer numbers, head numbers, and head sizes. The following code works with no errors. And the model is loaded successfully into the CPU with random weights, which is expected.

    import torch
    from transformers import AutoModelForCausalLM, AutoConfig
    
    if __name__ == "__main__":
        model_id = "gpt2"
        model_config = AutoConfig.from_pretrained(pretrained_model_name_or_path=model_id)
    
        model_config.n_layer = 48
        model_config.n_head = 25
        model_config.n_embd = 1600
        model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path=model_id,
                                                     config=model_config,
                                                     ignore_mismatched_sizes=True,
                                                     torch_dtype=torch.float16)
    

    However, when I set the flag low_cpu_mem_usage=True in from_pretrained() like this:

    import torch
    from transformers import AutoModelForCausalLM, AutoConfig
    
    if __name__ == "__main__":
        model_id = "gpt2"
        model_config = AutoConfig.from_pretrained(pretrained_model_name_or_path=model_id)
    
        model_config.n_layer = 48
        model_config.n_head = 25
        model_config.n_embd = 1600
        model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path=model_id,
                                                     config=model_config,
                                                     ignore_mismatched_sizes=True,
                                                     torch_dtype=torch.float16,
                                                     low_cpu_mem_usage=True)
    

    I get below errors:

    /opt/conda/lib/python3.8/site-packages/scipy/__init__.py:138: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.23.5)
      warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion} is required for this version of "
    Traceback (most recent call last):
      File "tmp.py", line 11, in <module>
        model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path=model_id,
      File "/home/wenhant/.local/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 463, in from_pretrained
        return model_class.from_pretrained(
      File "/home/wenhant/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2379, in from_pretrained
        ) = cls._load_pretrained_model(
      File "/home/wenhant/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2512, in _load_pretrained_model
        param = model_state_dict[key]
    KeyError: 'h.45.attn.c_proj.bias'
    

    Expected behavior

    I expect my code to run with no errors doesn't matter if I set low_cpu_mem_usage to True or False.

  • [WIP] add: tensorflow example for image classification task guide

    [WIP] add: tensorflow example for image classification task guide

    This PR addresses https://github.com/huggingface/transformers/issues/21037 It adds a Tensorflow example to the existing task guide on image classification.

    State of the PR: The example illustrates preprocessing in TF, training, and pushing to Hub. The code samples have been tested and they work/reproduce.

    Todo:
    Add data augmentation part. Find out if there is a better way to push a Keras model to Hub with all the config files, weights, etc.

  • Add tensorflow example for the image classification task guide

    Add tensorflow example for the image classification task guide

    For the same dataset and steps as in https://huggingface.co/docs/transformers/tasks/image_classification, add sample code for the TensorFlow part.

    This example can supplement the existing guide and can be helpful to those who choose TensorFlow over PyTorch and would like to use Transformers for image classification. Related PR: https://github.com/huggingface/transformers/pull/21038

🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.
🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

English | 简体中文 | 繁體中文 State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow ?? Transformers provides thousands of pretrained mo

Jan 3, 2023
State of the art faster Natural Language Processing in Tensorflow 2.0 .
State of the art faster Natural Language Processing in Tensorflow 2.0 .

tf-transformers: faster and easier state-of-the-art NLP in TensorFlow 2.0 ****************************************************************************

Dec 5, 2022
:mag: End-to-End Framework for building natural language search interfaces to data by utilizing Transformers and the State-of-the-Art of NLP. Supporting DPR, Elasticsearch, HuggingFace’s Modelhub and much more!
:mag: End-to-End Framework for building natural language search interfaces to data by utilizing Transformers and the State-of-the-Art of NLP. Supporting DPR, Elasticsearch, HuggingFace’s Modelhub and much more!

Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want

Feb 18, 2021
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

A Deep Learning NLP/NLU library by Intel® AI Lab Overview | Models | Installation | Examples | Documentation | Tutorials | Contributing NLP Architect

Jan 2, 2023
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

A Deep Learning NLP/NLU library by Intel® AI Lab Overview | Models | Installation | Examples | Documentation | Tutorials | Contributing NLP Architect

Feb 18, 2021
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

A Deep Learning NLP/NLU library by Intel® AI Lab Overview | Models | Installation | Examples | Documentation | Tutorials | Contributing NLP Architect

Dec 31, 2022
Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5
Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5

NLP-Summarizer Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5 This project aimed to provide in

Feb 7, 2022
A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. IMPORTANT: (30.08.2020) We moved our models

Dec 31, 2022
State of the Art Natural Language Processing

Spark NLP: State of the Art Natural Language Processing Spark NLP is a Natural Language Processing library built on top of Apache Spark ML. It provide

Jan 5, 2023
A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. IMPORTANT: (30.08.2020) We moved our models

Feb 18, 2021
State of the Art Natural Language Processing

Spark NLP: State of the Art Natural Language Processing Spark NLP is a Natural Language Processing library built on top of Apache Spark ML. It provide

Feb 18, 2021
A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. Flair is: A powerful NLP library. Flair allo

Jan 2, 2023
Natural Language Processing with transformers

we want to create a repo to illustrate usage of transformers in chinese

Dec 27, 2022
Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG)

Indobenchmark Toolkit Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG) resources fo

Aug 26, 2022
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow.  This is part of the CASL project: http://casl-project.ai/

Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides

Jan 7, 2023
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow.  This is part of the CASL project: http://casl-project.ai/

Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides

Feb 17, 2021
A list of NLP(Natural Language Processing) tutorials built on Tensorflow 2.0.
A list of NLP(Natural Language Processing) tutorials built on Tensorflow 2.0.

A list of NLP(Natural Language Processing) tutorials built on Tensorflow 2.0.

Jan 4, 2023
LegalNLP - Natural Language Processing Methods for the Brazilian Legal Language

LegalNLP - Natural Language Processing Methods for the Brazilian Legal Language ⚖️ The library of Natural Language Processing for Brazilian legal lang

Dec 20, 2022
A design of MIDI language for music generation task, specifically for Natural Language Processing (NLP) models.

MIDI Language Introduction Reference Paper: Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions: code This

May 25, 2022