Guide to using pre-trained large language models of source code

Large Models of Source Code

I occasionally train and publicly release large neural language models on programs, including PolyCoder. Here, I describe how to use these.

  1. Setup
  2. Models (incl. PolyCoder)
  3. Datasets
  4. Evaluation
  5. How to cite

Getting Started

All current models were trained using the GPT NeoX toolkit. First, download a pretrained checkpoint as described below and then use this either with a Docker image or through our fork of this toolkit from source to generate code or replicate our evaluation.

Retrieving Checkpoints

Checkpoint files for training PolyCoder are hosted on this public Zenodo repository. See this section for details on currently available models. Model checkpoints range up to 6GB, which is also the amount of GPU memory they require to run (running on CPU is neither tested nor recommended). Download and untar a checkpoint file (in this case for a 2.7B parameter model trained for 150K steps) to a directory called checkpoints/, using:

mkdir checkpoints
cd checkpoints
wget https://zenodo.org/record/6363556/files/2-7B-150K.tar
tar -xvf 2-7B-150K.tar

From Source

We maintain a public fork of the NeoX repository here, which includes the (minor) changes we made to the codebase to allow for tabs & newlines in the tokenization, and also includes instructions for running the perplexity and HumanEval tasks. Note that this repository uses a forked version of the LM Evaluation Harness with the code benchmark from our work.

Building this repository should match the process for GPT-NeoX almost exactly. You may also use the Docker image mentioned next, but mounting a checkout of the latest version of this fork over the /gpt-neox directory inside the container. Once set up generate.py entrypoint (described below) for free-form code generation, or use one of the commands here to calculate perplexity and HumanEval results as in the paper.

Via Docker

A base Docker image containing a slightly modified version of the gpt-neox repository is available via DockerHub:

docker pull vhellendoorn/code-lms-neox:base

This image can be used together with a checkpoint file hosted on this public Zenodo repository. The base Docker image size is 5.4GB. Once a checkpoint has been retrieved, start the container with the following commands (substituting another GPU device index if needed):

nvidia-docker run --rm -it -e NVIDIA_VISIBLE_DEVICES=0 --shm-size=1g --ulimit memlock=-1 --mount type=bind,src=$PWD/checkpoints,dst=/gpt-neox/checkpoints vhellendoorn/code-lms-neox:base

Code Generation

The following command can be used to generate code from a prompt:

sudo ./deepy.py generate.py configs/text_generation.yml checkpoints/configs/local_setup.yml checkpoints/configs/2-7B.yml

Note: if not using the 2.7B parameter model, replace the final config file with the appropriate model size (e.g., small = 160M parameters, medium = 405M).

Once the checkpoint has been loaded, you can feed it an example such as def return1():\n """Returns 1."""\n (note the whitespace tokens) and watch it predict return 1 (and then probably a bunch of other returnX methods, depending on the sample).

The modifications to gpt-neox mentioned above center around the need to allow tabs and newlines in the prompt input. For the interactive mode, these can be added using their escaped versions (\t, \n); when using file-based input, the project will read the entire file instead of treating each line as a prompt. By default, the command below will create an interactive prompt and return relatively short outputs (256 tokens) with a sampling temperature of 0.5; this behavior can be changed in /gpt-neox/checkpoints/configs/text_generation.yml.

A lower temperature (e.g., 0.2) will produce more consistent and plausible (to the model) predictions; a higher temperature such as the default may be useful for generating and evaluating many candidates (see our paper for recommendations). For the latter setting, consider switching to the input-file mode and providing an entire snippet (without escaping whitespace) in the corresponding file

Multi-lingual Models

Several models have been trained on a large corpus of code spanning 12 programming languages. This includes a 2.7B parameter model (nick-named PolyCoder, trained for 100K and 150K steps), a 405M parameter model (100K & 150K steps) and a 160M parameter model (150K steps).

Available Models

All models are available at a public Zenodo repository, in the form of .tar files with fairly self-explanatory names (e.g., 2-7B-100K => a 2.7B parameter model trained for 100K steps). Currently available models include:

  • GPT2 - 2.7B: A 32 layer, 2,560 dimensional Transformer model, trained with a batch size of 128 sequences (256K tokens). Models available both at 100K and at 150K steps steps.
    • Note that GPT-Neox' default config for this model was modified to reduce the number of training steps (and learning rate decay steps accordingly) to 160K, down from 320K, to better match the available training resources. Hence, this model may not have reached its peak performance.
  • GPT2 - 0.4B: A 24 layer, 1,024 dimensional Transformer model based on the medium config, trained with 256K tokens per batch.
  • GPT2 - 160M: A 12 layer, 768 dimensional Transformer model based on the small config, trained with 256K tokens per batch.

Training Process

Training was done on 4 to 8 NVIDIA RTX 8000 GPUs, largely following the standard config values, except also enabling "scaled-upper-triang-masked-softmax-fusion" and "bias-gelu-fusion" for performance and slightly changing the batch size (see model details), data split (changed to 98.9%, 0.1%, 1%), initial loss scale (2^16), and print/eval intervals.

The below image shows the loss curve of the various models' training process in terms of validation loss. image

Caveats

The trained models come with a few minor known limitations:

  • This model was not trained to solve programming problems and may not perform well on a benchmark such as HumanEval. Models like Codex (powering Copilot) are pretrained on natural language, which may boost their ability to interpret NL prompts; this model only learned language from comments in code.
  • The model appears to start generating a random new file once it reaches the (predicted) end of the current one. It is possible that the end-of-document token was not properly added to the training data.
  • Whitespace is very important to the model, since no preprocessing was done on the input files. For instance, the following snippet will yield poor predictions, because in Java we would never expect an instance-method at the top-level, as is indicated by the single level of (\t) indentation of the two lines within this method:
public int getTotalWeight(List<Integer> weights) {\n\t// Sum weights in parallel.\n\treturn 

Adjusting the indentation makes it predict more reasonable continuations:

public int getTotalWeight(List<Integer> weights) {\n\t\t// Sum weights in parallel.\n\t\treturn 

The Codex model discusses controlling for this to increase usability; this may be worth doing in a future version of the model.

Datasets

249GB Multi-Lingual Corpus

This is the corpus used to train PolyCoder.

The datasets were cloned overnight on October 9-10, 2021. To mine a similar training set, see Data.

The list of file paths can be downloaded from: https://zenodo.org/record/6363556/files/index.zip. Each row in the file is the file path along with its SHA-256 hash, to ease deduplication. That is, the hashes allow checking if files from any future test set were already contained in the training set.

The data collection and filtering process is described in detail in the paper and below. The final, filtered dataset statistics are:

Language Repositories Size(GB) Files
C 10,749 55G 3,037,112
C# 9,511 21G 2,514,494
C++ 13,726 52G 4,289,506
Go 12,371 15G 1,416,789
Java 15,044 41G 5,120,129
JavaScript 25,144 22G 1,774,174
PHP 9,960 13G 1,714,058
Python 25,446 16G 1,550,208
Ruby 5,826 4.1G 674,343
Rust 4,991 3.5G 304,842
Scala 1,497 1.8G 245,100
TypeScript 12,830 9.2G 1,441,926

Data Collection & Filtering

I cloned the most popular repositories for 12 popular programming languages with at least 50 stars (stopping at ~25K per langauge) from GitHub in October 2021. For each project, each file belonging to the majority-language of that project was extracted, yielding the training set below (after cleaning). This initial, unfiltered dataset spanned 631GB and 38.9M files.

Next, similar to Codex and CodeParrot, very large (>1MB) and very short (<100 tokens) files were filtered out, reducing the dataset to 424GB. Files were then deduplicated based on a hash of their content, which reduced the number of files by another 30% or so, leaving 249GB of data and 24.1M files. No tokenization filters were applied; the model processes entire files including all comments. A code-specific vocabulary was constructed on a random 5% subset of the files above.

Evaluation

Please find detailed instructions for replicating our perplexity and HumanEval results on our public fork of the NeoX repository. This in turn leverages our extension of the LM Evaluation Harness.

Evaluating Codex

To download the test sets that we used in the paper (12 programming languages), use:

wget https://zenodo.org/record/6363556/files/unseen_test_sets.tar.gz
tar -xvzf unseen_test_sets.tar.gz

To get perplexity results on these samples using Codex' API, use:

export OPENAI_API_KEY=<YOUR OPEN AI API KEY>
python3 -u Evaluation/eval_codex_all.py --dirs Code-sampled100

Where <YOUR OPEN AI API KEY> is a private string that can be obtained by signing up for OpenAI's beta.

As of March 2022, getting an API Key is free for 3 months, and afterwards a credit card needs to be entered. However, even after entering a credit card, using our evaluation script does not lead to any costs.

Results - HumanEval

These are PolyCoder's results on the HumanEval benchmark:

Model [email protected] [email protected] [email protected]
PolyCoder (160M) 2.13% 3.35% 4.88%
PolyCoder (400M) 2.96% 5.29% 11.59%
PolyCoder (2.7B) 5.59% 9.87% 17.68%
CodeParrot (110M) 3.80% 6.57% 12.78%
CodeParrot (1.5B) 3.58% 8.03% 14.96%
GPT-Neo (125M) 0.75% 1.88% 2.97%
GPT-Neo (1.3B) 4.79% 7.47% 16.30%
GPT-Neo (2.7B) 6.41% 11.27% 21.37%
GPT-J (6B) 11.62% 15.74% 27.74%
Codex (300M) 13.17% 20.37% 36.27%
Codex (2.5B) 21.36% 35.42% 59.50%
Codex (12B) 28.81% 46.81% 72.31%

Results - Multilingual Language Modeling

These are the perplexity results of PolyCoder on the multilingual test sets:

Language Perplexity
C 2.3464
C# 2.5832
C++ 2.9189
Go 2.567
Java 2.9194
JavaScript 3.0611
PHP 3.6954
Python 3.1767
Ruby 3.9742
Rust 3.2449
Scala 3.8735
TypeScript 3.6143

A comparison with the other models is available in Figure 6 in the paper: image

Citation

A Systematic Evaluation of Large Language Models of Code

@article{xu2022systematic,
  title={A Systematic Evaluation of Large Language Models of Code},
  author={Xu, Frank F and Alon, Uri and Neubig, Graham and Hellendoorn, Vincent J},
  journal={arXiv preprint arXiv:2202.13169},
  year={2022}
}
Owner
Vincent Hellendoorn
AI4SE Researcher, Assistant Prof. at CMU
Vincent Hellendoorn
Comments
  • Docker Image code generation fails

    Docker Image code generation fails

    Hello!

    While trying to run polycoder in a dockerized setup, we bumped into the error: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)

    Could you help us to get over this problem?

    This is odd, because I guess the docker solution should have been error free.

    We're trying to run the 160M version on a Ubuntu 22.04 machine with a GTX 1060 6GB by using the provided command to start the container.

    Here is the full log: https://pastebin.com/JzayrXUr

  • How to evaluate the perplexity of PolyCoder/CodeParrot?

    How to evaluate the perplexity of PolyCoder/CodeParrot?

    Hello,

    Thank you for the awesome PolyCoder project, I really think it helps a lot for evaluating all the Code PTMs. However, I only found a script to evaluate the perplexity of codex, it's hard to reproduce the perplexity benchmark results when compared among codex, Polycoder and other PTMs. Is it possible that you guys update a polycoder/codeparrot perplexity evalution script?

    Thanks again, Sen

  • Dataset index.txt file contains some corrupted entries

    Dataset index.txt file contains some corrupted entries

    Hi,

    Thanks for open sourcing the dataset and PolyCoder.

    I'm looking at the dataset: https://zenodo.org/record/6363556/files/index.zip , From the README, it seems that each line in the index.txt should be in the form of {language}__{organization}__{project}__{full__file__path}\tSHA, however after parsing there are few lines that seems to be malformed.

    I've attached the malformed entries below

    Line 2818009 : Upside Down Numbers
    Line 2818010 : Upside Down Numbers
    Line 2818011 : Upside Down Numbers
    Line 2818012 : Upside Down Numbers
    Line 2818013 : Upside Down Numbers__main.cpp    0070bf300d9f1bf6ec6533142fbbaa4de8ff65374da8d29e6a85cba5d0ad38df
    Line 2818158 : Phone Number Combinations__main.cpp      39ae7219b6377846a3792efcf6db5a9cc1b949652a3cc76edbe5c368f37b90a1
    Line 11100410 : .java   3f49f41560cd7a5ea7c2d31120d98dfd2f56da204b6286e099af69d629ba3041```
    
  • cuda error

    cuda error

    hi,i run 2-7B model ,this is my command :sudo ./deepy.py generate.py configs/text_generation.yml checkpoints/configs/local_setup.yml checkpoints/configs/2-7B.yml,and I get this error message. [2022-03-28 14:07:59,567] [INFO] [module.py:576:load_state_dir] RANK=0 Loaded layer=0 file=checkpoints/global_step150000/layer_00-model_00-model_states.pt

    would help me solve this problem!! thanks.

  • 运行问题

    运行问题

    您好 可以帮我看下报错吗 No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' NeoXArgs.from_ymls() ['configs/text_generation.yml', 'checkpoints/configs/local_setup.yml', 'checkpoints/configs/2-7B.yml'] INFO:root:NeoXArgs.calculate_derived() Total number of GPUs determined to be: 0 Traceback (most recent call last): File "./deepy.py", line 29, in <module> neox_args = NeoXArgs.consume_deepy_args() File "/gpt-neox/megatron/neox_arguments/arguments.py", line 304, in consume_deepy_args neox_args = cls.from_ymls(paths_to_yml_files=conf_files, overwrite_values=overwrite_values) File "/gpt-neox/megatron/neox_arguments/arguments.py", line 199, in from_ymls return cls(**config) File "<string>", line 184, in __init__ File "/gpt-neox/megatron/neox_arguments/arguments.py", line 106, in __post_init__ self.calculate_derived() File "/gpt-neox/megatron/neox_arguments/arguments.py", line 656, in calculate_derived self.check_batch_parameters( File "/gpt-neox/megatron/neox_arguments/arguments.py", line 570, in check_batch_parameters assert ( AssertionError: Train batch size: 0 has to be greater than 0 谢谢~

  • Code Completion Support

    Code Completion Support

    Hi, thanks for your nice trained model~ I would like to add it to a code completion(not code generation) backend, but I cannot find an API to do this, such as using gpt cache to decode one step and go through the whole code files to calculate top-1 accuracy.

  • Plan on Releasing Generated Sample C Code

    Plan on Releasing Generated Sample C Code

    Hi there, nice job on this work! I'm wondering are you planning on releasing some generated sample C code? Just be curious about what they look like. For judging the functional correctness of the generated Python code, I know you used HumanEval to evaluate that, did you conduct a similar functionality check on C code also? Thanks much!

  • Dataset availability

    Dataset availability

    First of all, thanks so much for making your model openly available! This is a great resource. We are hoping to use it in some evaluations of vulnerability repair, and one thing we would like to evaluate is whether the fixes generated by the language model are already present in the training data.

    Would it be possible to make the training dataset available somehow? Either the actual data or (if that's too big) the github repositories + commit hashes + filenames would be fine. I can even stop by CMU with a hard drive if uploading it is too inconvenient :)

  • Downloading issue

    Downloading issue

    Sorry, the downloading of https://zenodo.org/record/6363556/files/2-7B-150K.tar seems failed. I tried many times, and most are failed because of a 503 error. Even if it starts downloading, it can only maintain a stream with 10KB/s and after a while, it can be a timeout. Can you please provide another link so we can download the checkpoint more easily? I would really appreciate it.

  • Missing LICENSE

    Missing LICENSE

    There is not a LICENSE or COPYING file in the root directory so GitHub cannot parse what the license is. Without a license, the default is copyright. I see a wide variety of licenses included in subdirectories including GPL-3.0-or-later, AGPL-3.0-or-later, LGPL-2.1-or-later, Apache-2.0, MIT, and others.

    I am not a lawyer, but I believe that would make the repository AGPL-3.0-or-later.

  • PlaidML support for (MacOS)

    PlaidML support for (MacOS)

    I am currently on a Mac Pro with a AMD Radeon Pro 580X but the code generation doesn't complete due to this error.

    NeoXArgs.from_ymls() ['configs/text_generation.yml', 'configs/local_setup.yml', 'configs/2-7B.yml']
    INFO:root:NeoXArgs.calculate_derived() Total number of GPUs determined to be: 0
    Traceback (most recent call last):
      File "/Users/xoxrumblelorexox/Desktop/Workbash/polyai/checkpoints/checkpoints/./gpt-neox-main/deepy.py", line 28, in <module>
        neox_args = NeoXArgs.consume_deepy_args()
      File "/Users/xoxrumblelorexox/Desktop/Workbash/polyai/checkpoints/checkpoints/gpt-neox-main/megatron/neox_arguments/arguments.py", line 321, in consume_deepy_args
        neox_args = cls.from_ymls(
      File "/Users/xoxrumblelorexox/Desktop/Workbash/polyai/checkpoints/checkpoints/gpt-neox-main/megatron/neox_arguments/arguments.py", line 201, in from_ymls
        return cls(**config)
      File "<string>", line 186, in __init__
      File "/Users/xoxrumblelorexox/Desktop/Workbash/polyai/checkpoints/checkpoints/gpt-neox-main/megatron/neox_arguments/arguments.py", line 106, in __post_init__
        self.calculate_derived()
      File "/Users/xoxrumblelorexox/Desktop/Workbash/polyai/checkpoints/checkpoints/gpt-neox-main/megatron/neox_arguments/arguments.py", line 675, in calculate_derived
        self.check_batch_parameters(
      File "/Users/xoxrumblelorexox/Desktop/Workbash/polyai/checkpoints/checkpoints/gpt-neox-main/megatron/neox_arguments/arguments.py", line 589, in check_batch_parameters
        assert (
    AssertionError: Train batch size: 0 has to be greater than 0
    

    After a while i realised that tensorflow doesn't recognise non-Nividia GPUs. There should be a disclaimer about this on the page. The only way around i found was to use PlaidML, but i am still getting the same error (probably because PlaidML only supports Keras)

    Sorry, if already known.

  • CUDA out of memory error on training

    CUDA out of memory error on training

    I was trying to train Polycoder using the preconfigured dataset, from the checkpoint checkpoints-2-7B, I used the following command as per the instructions in the repo (only changing the configs as appropriate):

    sudo python ./deepy.py train.py -d configs 2-7B.yml local_setup.yml

    which gave the following error:

    RuntimeError: CUDA out of memory. Tried to allocate 1.86 GiB (GPU 0; 23.70 GiB total capacity; 20.49 GiB already allocated; 1.74 GiB free; 20.50 GiB reserved in total by PyTorch)

    Interestingly, the full 25 Gigs of our GPU is free, as per nvidia-smi.

    I tried updating the batch size, and the the only location I found to update batch size in the config files was train_micro_batch_size_per_gpu: 8, in 2-7B.yml.

    It was 8, I changed it to 4, and then also to 1, but in both cases got the same error.

    I am running all this in docker, as per the containerized setup instructions.

    Appreciate any help!

RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2

RoNER RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2. It is meant to be an easy to use, hi

Apr 28, 2022
Must-read papers on improving efficiency for pre-trained language models.

Must-read papers on improving efficiency for pre-trained language models.

Apr 21, 2022
The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models

Graformer The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models Graformer (also named BridgeTransformer in t

May 20, 2022
Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks
Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks

Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks, which modifies the input text with a textual template and directly uses PLMs to conduct pre-trained tasks. This library provides a standard, flexible and extensible framework to deploy the prompt-learning pipeline. OpenPrompt supports loading PLMs directly from huggingface transformers. In the future, we will also support PLMs implemented by other libraries.

May 18, 2022
Chinese Pre-Trained Language Models (CPM-LM) Version-I

CPM-Generate 为了促进中文自然语言处理研究的发展,本项目提供了 CPM-LM (2.6B) 模型的文本生成代码,可用于文本生成的本地测试,并以此为基础进一步研究零次学习/少次学习等场景。[项目首页] [模型下载] [技术报告] 若您想使用CPM-1进行推理,我们建议使用高效推理工具BMI

May 21, 2022
PyTorch Implementation of "Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging" (Findings of ACL 2022)

Feature_CRF_AE Feature_CRF_AE provides a implementation of Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging

Apr 29, 2022
Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple
Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

May 17, 2022
Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

Data Augmentation using Pre-trained Transformer Models Code associated with the Data Augmentation using Pre-trained Transformer Models paper Code cont

May 13, 2022
Implementation of Natural Language Code Search in the project CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

CodeBERT-Implementation In this repo we have replicated the paper CodeBERT: A Pre-Trained Model for Programming and Natural Languages. We are interest

May 3, 2022
DziriBERT: a Pre-trained Language Model for the Algerian Dialect
DziriBERT: a Pre-trained Language Model for the Algerian Dialect

DziriBERT is the first Transformer-based Language Model that has been pre-trained specifically for the Algerian Dialect.

May 18, 2022
May 11, 2022
Code for CodeT5: a new code-aware pre-trained encoder-decoder model.
Code for CodeT5: a new code-aware pre-trained encoder-decoder model.

CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation This is the official PyTorch implementation

May 18, 2022
Code for text augmentation method leveraging large-scale language models

HyperMix Code for our paper GPT3Mix and conducting classification experiments using GPT-3 prompt-based data augmentation. Getting Started Installing P

May 16, 2022
TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset.
TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset.

TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset. TunBERT was applied to three NLP downstream tasks: Sentiment Analysis (SA), Tunisian Dialect Identification (TDI) and Reading Comprehension Question-Answering (RCQA)

Apr 30, 2022
One Stop Anomaly Shop: Anomaly detection using two-phase approach: (a) pre-labeling using statistics, Natural Language Processing and static rules; (b) anomaly scoring using supervised and unsupervised machine learning.

One Stop Anomaly Shop (OSAS) Quick start guide Step 1: Get/build the docker image Option 1: Use precompiled image (might not reflect latest changes):

May 19, 2022
BMInf (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs).
BMInf (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs).

BMInf (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs).

May 16, 2022
Google and Stanford University released a new pre-trained model called ELECTRA
Google and Stanford University released a new pre-trained model called ELECTRA

Google and Stanford University released a new pre-trained model called ELECTRA, which has a much compact model size and relatively competitive performance compared to BERT and its variants. For further accelerating the research of the Chinese pre-trained model, the Joint Laboratory of HIT and iFLYTEK Research (HFL) has released the Chinese ELECTRA models based on the official code of ELECTRA. ELECTRA-small could reach similar or even higher scores on several NLP tasks with only 1/10 parameters compared to BERT and its variants.

May 16, 2022
Use PaddlePaddle to reproduce the paper:mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer
Use PaddlePaddle to reproduce the paper:mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

MT5_paddle Use PaddlePaddle to reproduce the paper:mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer English | 简体中文 mT5: A Massively

Oct 17, 2021
ElasticBERT: A pre-trained model with multi-exit transformer architecture.

This repository contains finetuning code and checkpoints for ElasticBERT. Towards Efficient NLP: A Standard Evaluation and A Strong Baseli

Apr 21, 2022