NLP made easy

GluonNLP Logo

GluonNLP: Your Choice of Deep Learning for NLP

GluonNLP is a toolkit that helps you solve NLP problems. It provides easy-to-use tools that helps you load the text data, process the text data, and train models.

See our documents at https://nlp.gluon.ai/master/index.html.

Features

  • Easy-to-use Text Processing Tools and Modular APIs
  • Pretrained Model Zoo
  • Write Models with Numpy-like API
  • Fast Inference via Apache TVM (incubating) (Experimental)
  • AWS Integration via SageMaker

Installation

First of all, install the latest MXNet. You may use the following commands:

# Install the version with CUDA 10.1
python3 -m pip install -U --pre "mxnet-cu101>=2.0.0b20210121" -f https://dist.mxnet.io/python

# Install the version with CUDA 10.2
python3 -m pip install -U --pre "mxnet-cu102>=2.0.0b20210121" -f https://dist.mxnet.io/python

# Install the version with CUDA 11
python3 -m pip install -U --pre "mxnet-cu110>=2.0.0b20210121" -f https://dist.mxnet.io/python

# Install the cpu-only version
python3 -m pip install -U --pre "mxnet>=2.0.0b20210121" -f https://dist.mxnet.io/python

To install GluonNLP, use

python3 -m pip install -U -e .

# Also, you may install all the extra requirements via
python3 -m pip install -U -e ."[extras]"

If you find that you do not have the permission, you can also install to the user folder:

python3 -m pip install -U -e . --user

For Windows users, we recommend to use the Windows Subsystem for Linux.

Access the Command-line Toolkits

To facilitate both the engineers and researchers, we provide command-line-toolkits for downloading and processing the NLP datasets. For more details, you may refer to GluonNLP Datasets and GluonNLP Data Processing Tools.

# CLI for downloading / preparing the dataset
nlp_data help

# CLI for accessing some common data processing scripts
nlp_process help

# Also, you can use `python -m` to access the toolkits
python3 -m gluonnlp.cli.data help
python3 -m gluonnlp.cli.process help

Run Unittests

You may go to tests to see how to run the unittests.

Use Docker

You can use Docker to launch a JupyterLab development environment with GluonNLP installed.

# GPU Instance
docker pull gluonai/gluon-nlp:gpu-latest
docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 --shm-size=2g gluonai/gluon-nlp:gpu-latest

# CPU Instance
docker pull gluonai/gluon-nlp:cpu-latest
docker run --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 --shm-size=2g gluonai/gluon-nlp:cpu-latest

For more details, you can refer to the guidance in tools/docker.

Owner
Distributed (Deep) Machine Learning Community
A Community of Awesome Machine Learning Projects
Distributed (Deep) Machine Learning Community
Comments
  • [AMP] Add AMP support to Machine Translation

    [AMP] Add AMP support to Machine Translation

    Description

    • Fix the horovod support and add the amp support to machine translation.
    • Support TN in training and inference
    • Update training results of SQuAD, transformer-base, transformer-large, transformer-t2t-big.
    • Add Deep Encoder, Shallow Decoder

    Checklist

    Essentials

    • [x] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
    • [x] Changes are complete (i.e. I finished coding on this PR)
    • [ ] All changes have test coverage
    • [x] Code is well-documented

    Changes

    • [x] Add AMP, tests

    Comments

    • If this change is a backward incompatible change, why must this change be made.
    • Interesting edge cases to note here

    cc @dmlc/gluon-nlp-team

  • [Fix][Docker] Fix the docker image + Fix pretrain_corpus document.

    [Fix][Docker] Fix the docker image + Fix pretrain_corpus document.

    Description

    Since the horovod support has been fixed, improve our docker image. Now, the CI docker will depend on the base docker image, which supports:

    • horovod training
    • TVM

    Checklist

    Essentials

    • [x] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
    • [x] Changes are complete (i.e. I finished coding on this PR)
    • [ ] All changes have test coverage
    • [ ] Code is well-documented

    cc @dmlc/gluon-nlp-team

  • [TVM] Add TVM Support

    [TVM] Add TVM Support

    Description

    Add TVM test case + profiling after https://github.com/apache/incubator-tvm/pull/6699 is merged.

    • [x] Test case
    • [x] Profile

    Checklist

    Essentials

    • [x] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
    • [x] Changes are complete (i.e. I finished coding on this PR)
    • [x] All changes have test coverage
    • [x] Code is well-documented

    cc @dmlc/gluon-nlp-team

  • [FEATURE] Add transformer inference code

    [FEATURE] Add transformer inference code

    Description

    Add transformer inference code to make inference easy and convenient to analysis the performance of transformer inference. @TaoLv @juliusshufan @pengzhao-intel

    can use below command to do inference: python inference_transformer.py --dataset WMT2014BPE --src_lang en --tgt_lang de --batch_size 2700 --scaled --average_start 5 --num_buckets 20 --bucket_scheme exp --bleu 13a --model_parameter PATH/TO/valid_best.params

    will get output:

    2019-08-19 22:03:57,600 - root - batch id=10, batch_bleu=26.0366 2019-08-19 22:04:45,904 - root - batch id=20, batch_bleu=30.8409 2019-08-19 22:05:26,991 - root - batch id=30, batch_bleu=25.3955 2019-08-19 22:06:11,089 - root - batch id=40, batch_bleu=21.9322 2019-08-19 22:06:58,313 - root - batch id=50, batch_bleu=29.7584 2019-08-19 22:07:49,634 - root - batch id=60, batch_bleu=26.5373 2019-08-19 22:08:33,846 - root - batch id=70, batch_bleu=23.2735 2019-08-19 22:09:24,003 - root - batch id=80, batch_bleu=22.8065 2019-08-19 22:10:03,324 - root - batch id=90, batch_bleu=26.0000 2019-08-19 22:10:41,997 - root - batch id=100, batch_bleu=27.7887 2019-08-19 22:11:26,346 - root - batch id=110, batch_bleu=22.6277 2019-08-19 22:12:10,353 - root - batch id=120, batch_bleu=25.9580 2019-08-19 22:12:47,614 - root - batch id=130, batch_bleu=22.6479 2019-08-19 22:13:20,316 - root - batch id=140, batch_bleu=26.6224 2019-08-19 22:13:54,895 - root - batch id=150, batch_bleu=30.2036 2019-08-19 22:14:32,938 - root - batch id=160, batch_bleu=22.4694 2019-08-19 22:15:09,624 - root - batch id=170, batch_bleu=26.4245 2019-08-19 22:15:39,387 - root - batch id=180, batch_bleu=28.8940 2019-08-19 22:16:11,217 - root - batch id=190, batch_bleu=26.2148 2019-08-19 22:16:47,089 - root - batch id=200, batch_bleu=24.3723 2019-08-19 22:17:22,472 - root - batch id=210, batch_bleu=27.1375 2019-08-19 22:18:00,030 - root - batch id=220, batch_bleu=25.5695 2019-08-19 22:18:32,847 - root - batch id=230, batch_bleu=25.9404 2019-08-19 22:19:01,637 - root - batch id=240, batch_bleu=25.6699 2019-08-19 22:19:29,690 - root - batch id=250, batch_bleu=22.1795 2019-08-19 22:19:58,859 - root - batch id=260, batch_bleu=21.1670 2019-08-19 22:20:28,113 - root - batch id=270, batch_bleu=24.0742 2019-08-19 22:20:53,027 - root - batch id=280, batch_bleu=27.6126 2019-08-19 22:21:20,014 - root - batch id=290, batch_bleu=25.6340 2019-08-19 22:21:50,416 - root - batch id=300, batch_bleu=22.7178 2019-08-19 22:22:14,171 - root - batch id=310, batch_bleu=30.1331 2019-08-19 22:22:37,462 - root - batch id=320, batch_bleu=23.2388 2019-08-19 22:23:01,075 - root - batch id=330, batch_bleu=27.9605 2019-08-19 22:23:22,236 - root - batch id=340, batch_bleu=23.9418 2019-08-19 22:23:40,851 - root - batch id=350, batch_bleu=22.2135 2019-08-19 22:24:01,679 - root - batch id=360, batch_bleu=23.6225 2019-08-19 22:24:15,178 - root - Inference at test dataset. inference bleu=26.0137, throughput=0.1236K wps

    Checklist

    Essentials

    • [x] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
    • [x] Changes are complete (i.e. I finished coding on this PR)
    • [x] All changes have test coverage
    • [x] Code is well-documented

    Changes

    • [ ] Feature1, tests, (and when applicable, API doc)
    • [ ] Feature2, tests, (and when applicable, API doc)

    Comments

    • If this change is a backward incompatible change, why must this change be made.
    • Interesting edge cases to note here
  • [SCRIPT] XLNet finetuning scripts for glue

    [SCRIPT] XLNet finetuning scripts for glue

    Description

    XLNet finetuning scripts for glue

    Checklist

    Essentials

    • [ ] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
    • [ ] Changes are complete (i.e. I finished coding on this PR)
    • [ ] All changes have test coverage
    • [ ] Code is well-documented

    Changes

    • [ ] Feature1, tests, (and when applicable, API doc)
    • [ ] Feature2, tests, (and when applicable, API doc)

    Comments

    • If this change is a backward incompatible change, why must this change be made.
    • Interesting edge cases to note here

    cc @dmlc/gluon-nlp-team

  • [SCRIPT]QA Fine-tuning Example for BERT

    [SCRIPT]QA Fine-tuning Example for BERT

    Description

    add QA Fine-tuning Example for BERT #476 use Bert Tokenizer from #464

    In squad1.1, use bert_base uncased model, dev_dataset has F1 of 88.52% and EM of 80.98%.(Based on mxnet-cu90-1.5.0b20190216). Ues bert_large uncased model, dev_dataset has F1 of 90.97% and EM of 84.04%.(Based on mxnet-cu90-1.5.0b20190216). In the mxnet-cu90-1.5.0b20190112 , use bert_base uncased model, dev_dataset has F1 of 88.45% and EM of 81.21%.
    Using mxnet-cu90-1.5.0b20190216 because dropout uses cudnn implementation, training speed is increased by one hour (base model, epochs=2). Log in https://github.com/dmlc/web-data/pull/161

    In squad2.0 use bert_large uncased model, null_score_diff_threshold=-2.0,The results of the dev data set are as follows((Based on mxnet-cu90-1.5.0b20190216):

    {
      "exact": 77.958392992504,
      "f1": 81.02012658815627,
      "total": 11873,
      "HasAns_exact": 73.3974358974359,
      "HasAns_f1": 79.52968336389662,
      "HasAns_total": 5928,
      "NoAns_exact": 82.50630782169891,
      "NoAns_f1": 82.50630782169891,
      "NoAns_total": 5945
    }
    

    Log in https://github.com/dmlc/web-data/pull/164

    optimizer is adam, lr=3e-5, beta1=0.9, beta2=0.999, epsilon=1e-08.(original repo optimizer is adamw, lr=3e-5,wd=0.01,beta_1=0.9,beta_2=0.999,epsilon=1e-6,)

    Checklist

    Essentials

    • [x] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
    • [x] Changes are complete (i.e. I finished coding on this PR)
    • [ ] All changes have test coverage
    • [x] Code is well-documented

    Comments

    • If this change is a backward incompatible change, why must this change be made.
    • Interesting edge cases to note here
  • Numerous doc updates

    Numerous doc updates

    Description

    Numerous doc updates

    Checklist

    Essentials

    • [x] Changes are complete (i.e. I finished coding on this PR)
    • [x] All changes have test coverage
    • [x] Code is well-documented

    Changes

    • [x] add —upgrade flag in installation for mxnet. (official answer from pypa people: https://pypi.org/help/#tls-deprecation) (might not be advisable for forcing an upgrade, [email protected])
    • [x] maybe reduce doc nested level in gluon-nlp.mxnet.io (See pytorch doc http://pytorch.org/docs/stable/index.html) ([email protected])
    • [x] get rid of unused namespace in doc (e.g. data for submodules that are already import *) ([email protected])
    • [x] use api package namespace directly in API doc. API doc is for reference purpose ([email protected])
    • [x] Link is using markdown format so it's not displaying properly http://gluon-nlp.mxnet.io/api/data.html#gluonnlp.data.transforms.NLTKMosesTokenizer ([email protected])
    • [x] Separate public datasets from data API in API doc ([email protected])
    • [x] for scripts, we should link to compressed archive for the whole folder. view source of script should link to the folder. (szha@)
    • [x] Examples should have download links for the ipynb. ([email protected])
  • Add nt-asgd for language model

    Add nt-asgd for language model

    Description

    1. Add nt-asgd for language model
    2. Online update of nt-asgd

    Checklist

    Essentials

    • [ ] Changes are complete (i.e. I finished coding on this PR)
    • [ ] All changes have test coverage
    • [ ] Code is well-documented

    Changes

    • [ ] Feature1, tests, (and when applicable, API doc)
    • [ ] Feature2, tests, (and when applicable, API doc)

    Comments

    • If this change is a backward incompatible change, why must this change be made.
    • Interesting edge cases to note here
  • [FEATURE] INT8 Quantization for BERT Sentence Classification and Question Answering

    [FEATURE] INT8 Quantization for BERT Sentence Classification and Question Answering

    Description

    Quantization solution for BERT SC and QA with Intel DLBoost.

    Main Code Changes:

    • [x] change inputs order in BERT SC dataloader to make it align with the inputs order in symbolic model(data0=input_ids, data1=segment_ids, data2=valid_length)
    • [x] implement BertLayerCollector to support output clipping while calibration. Now we clip the max_range of GeLU output to 10 and the min_range of layer_norm output to -50 by default.
    • [x] add calibration pass and symbolblock inference pass in finetune_classification.py.
    • [x] add calibration pass and symbolblock inference pass in finetune_squad.py.
    • [x] Quantization Readme
    • [x] Document
    • [ ] accuracy wait to remeasure

    Dependency:

    https://github.com/apache/incubator-mxnet/pull/17161 https://github.com/apache/incubator-mxnet/pull/17187 https://github.com/dmlc/gluon-nlp/pull/1091 https://github.com/dmlc/gluon-nlp/issues/1127 https://github.com/dmlc/gluon-nlp/pull/1124 ...

    FP32 and INT8 Accuracy:

    will remeasure on c5 when pending PRs are ready.

    | Task | maxLength | FP32 Accuracy | INT8 Accuracy | FP32 F1 | INT8 F1 | |-------|-----------|---------------|---------------|---------|---------| | SQUAD | 128 | 77.32 | 76.61 | 84.84 | 84.26 | | SQUAD | 384 | 80.86 | 80.56 | 88.31 | 88.14 | | MRPC | 128 | 87.75 | 87.25 | 70.50 | 70.56 |

    @pengzhao-intel @TaoLv @eric-haibin-lin @szha

    Checklist

    Essentials

    • [ ] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
    • [ ] Changes are complete (i.e. I finished coding on this PR)
    • [ ] All changes have test coverage
    • [ ] Code is well-documented

    Changes

    • [ ] Feature1, tests, (and when applicable, API doc)
    • [ ] Feature2, tests, (and when applicable, API doc)

    Comments

    • If this change is a backward incompatible change, why must this change be made.
    • Interesting edge cases to note here
  • [SCRIPT] Reproducing GLUE score on 8 tasks

    [SCRIPT] Reproducing GLUE score on 8 tasks

    Description

    [BERT] Reproducing GLUE score on 8 tasks

    • Add scripts for RTE, QQP, QNLI, STS-B, CoLA, WNLI, SST tasks and specific metric(mcc, accuracy, F1, pearson corr) for each task.
    • Modify example tutorial.
    • Split trainer with bias and weight for simplicity.

    Checklist

    Essentials

    • [x] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
    • [x] Changes are complete (i.e. I finished coding on this PR)
    • [x] All changes have test coverage
    • [x] Code is well-documented

    Changes

    • [x] Feature1, tests, (and when applicable, API doc)
    • [x] Feature2, tests, (and when applicable, API doc)

    Comments

    • If this change is a backward incompatible change, why must this change be made.
    • Interesting edge cases to note here
  • Make BERT-GPU deploy compatible with MXNet 1.8

    Make BERT-GPU deploy compatible with MXNet 1.8

    Description

    Change custom graph pass implementation to make it compatible with MXNet 1.8 Solving issue https://github.com/dmlc/gluon-nlp/issues/1388

    Checklist

    Essentials

    • [x] Changes are complete (i.e. I finished coding on this PR)
    • [x] All changes have test coverage
    • [x] Code is well-documented

    Changes

    • [x] Change custom graph pass to support both MXNet 1.7 & MXNet 1.8
    • [x] Change setup and deploy scripts accordingly
    • [x] Activate CUDA Graphs for MXNet 1.8 (> 30% speedup with small batch sizes)

    cc @dmlc/gluon-nlp-team, @samskalicky, @Kh4L

  • Add sorting of chunks to evaluation

    Add sorting of chunks to evaluation

    Description

    This change introduces sorting of chunks before executing evaluation to reduce padding to minimum and in this way improve performance.

    How the change works

    As every input feature has unique qas_id it can be used for sorting. With the sorting evaluation function goes like this:

    1. sort input features by qas_id
    2. chunk data
    3. sort chunks by their length (to reduce padding to minimum)
    4. perform inference
    5. sort chunks and results by qas_id
    6. evaluate data

    Step number 1 is performed so that chunks and their inference results can be easily put in proper order in step number 5 for evaluation in step 6.

    Performance

    Results for max_seq_length=128, doc_stride=32: no sort: image

    sorted: image

    Performance did not improve much due to most of the chunks being of same 128 length due to relatively small values of max_seq_length and doc_stride.

    Results for max_seq_length=512, doc_stride=128 (default values in run_squad.py script): no sort: image

    sorted: image

    As you can see the performance improved significantly (~20%) without any loss of accuracy.

    cc @dmlc/gluon-nlp-team

  • Add assert for doc_stride, max_seq_length and max_query_length

    Add assert for doc_stride, max_seq_length and max_query_length

    Description

    This change adds assert for doc_stride, max_seq_length and max_query_length relation (args.doc_stride <= args.max_seq_length - args.max_query_length - 3) as incautious setting of them can cause data loss when chunking input features and ultimately significantly lower accuracy.

    Example

    Without the assert when one sets max_seq_length to e.g. 128 and keeps default 128 value for doc_stride this happens for the input feature of qas_id == "572fe53104bcaa1900d76e6b" when running bash ~/gluon-nlp/scripts/question_answering/commands/run_squad2_uncased_bert_base.sh: image

    As you can see we are losing some of the context_tokens_ids (in red rectangle) as they are not included in any of the ChunkFeatures due to too high doc_stride in comparison to max_seq_length and user does not get notified even with a simple warning. This can lead to significant accuracy drop as this kind of data losses happen for all input features which do not fit entirely into single chunk.

    This change introduces an assert popping when there is a possible data loss and forces the user to set proper/safe values for doc_stride, max_seq_length and max_query_length.

    Error message

    image

    Chunk from example above with doc_stride reduced to 32

    image

    As you can see when values of doc_stride, max_seq_length and max_query_length satisfy abovementioned equation no data is lost during chunking and we avoid accuracy loss.

    cc @dmlc/gluon-nlp-team

  • Wrong ETA for max_seq_length != 512

    Wrong ETA for max_seq_length != 512

    Description

    When you change max_seq_length value from 512 the ETA in eval_validation function does not end on 0.

    Error Message

    image

    To Reproduce

    Change max_seq_length to e.g. 128 and run e.g ~/gluon-nlp/scripts/question_answering/commands/run_squad2_uncased_bert_base.sh.

  • Fix ETA for eval_validation

    Fix ETA for eval_validation

    Description

    Fixes ETA for eval_validation when max_seq_length has been changed.

    Fixes: https://github.com/dmlc/gluon-nlp/issues/1586

    w/o the fix: image

    with: image

    cc @dmlc/gluon-nlp-team

  • Upgrade to use MXNet2.0.0.beta1

    Upgrade to use MXNet2.0.0.beta1

    Description

    (Brief description on what this PR is about)

    Checklist

    Essentials

    • [ ] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
    • [ ] Changes are complete (i.e. I finished coding on this PR)
    • [ ] All changes have test coverage
    • [ ] Code is well-documented

    Changes

    • [ ] Feature1, tests, (and when applicable, API doc)
    • [ ] Feature2, tests, (and when applicable, API doc)

    Comments

    • If this change is a backward incompatible change, why must this change be made.
    • Interesting edge cases to note here

    cc @dmlc/gluon-nlp-team

  • [Decoding] Update incremental decoding implementation

    [Decoding] Update incremental decoding implementation

    Description

    Try to fix #1582: Update incremental decoding caching mechanism.

    Checklist

    Essentials

    • [ ] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
    • [ ] Changes are complete (i.e. I finished coding on this PR)
    • [ ] All changes have test coverage
    • [ ] Code is well-documented

    Changes

    • [ ] Feature1, tests, (and when applicable, API doc)
    • [ ] Feature2, tests, (and when applicable, API doc)

    Comments

    • If this change is a backward incompatible change, why must this change be made.
    • Interesting edge cases to note here

    cc @dmlc/gluon-nlp-team

Grading tools for Advanced NLP (11-711)Grading tools for Advanced NLP (11-711)

Grading tools for Advanced NLP (11-711) Installation You'll need docker and unzip to use this repo. For docker, visit the official guide to get starte

Oct 3, 2021
Super easy library for BERT based NLP models
Super easy library for BERT based NLP models

Fast-Bert New - Learning Rate Finder for Text Classification Training (borrowed with thanks from https://github.com/davidtvs/pytorch-lr-finder) Suppor

May 14, 2022
:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.
:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

(Framework for Adapting Representation Models) What is it? FARM makes Transfer Learning with BERT & Co simple, fast and enterprise-ready. It's built u

May 14, 2022
Super easy library for BERT based NLP models
Super easy library for BERT based NLP models

Fast-Bert New - Learning Rate Finder for Text Classification Training (borrowed with thanks from https://github.com/davidtvs/pytorch-lr-finder) Suppor

Feb 18, 2021
:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.
:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

(Framework for Adapting Representation Models) What is it? FARM makes Transfer Learning with BERT & Co simple, fast and enterprise-ready. It's built u

Feb 14, 2021
🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools
🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools

?? The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools

May 15, 2022
Graph4nlp is the library for the easy use of Graph Neural Networks for NLP
Graph4nlp is the library for the easy use of Graph Neural Networks for NLP

Graph4NLP Graph4NLP is an easy-to-use library for R&D at the intersection of Deep Learning on Graphs and Natural Language Processing (i.e., DLG4NLP).

May 17, 2022
An easy-to-use framework for BERT models, with trainers, various NLP tasks and detailed annonations

FantasyBert English | 中文 Introduction An easy-to-use framework for BERT models, with trainers, various NLP tasks and detailed annonations. You can imp

May 20, 2022
Spam filtering made easy for you
Spam filtering made easy for you

spammy Author: Tasdik Rahman Latest version: 1.0.3 Contents 1 Overview 2 Features 3 Example 3.1 Accuracy of the classifier 4 Installation 4.1 Upgradin

May 19, 2022
Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser)

Frog for Python This is a Python binding to the Natural Language Processing suite Frog. Frog is intended for Dutch and performs part-of-speech tagging

Sep 4, 2021
💫 Industrial-strength Natural Language Processing (NLP) in Python

spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc

May 16, 2022
NLP, before and after spaCy

textacy: NLP, before and after spaCy textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the hig

May 17, 2022
Multilingual text (NLP) processing toolkit

polyglot Polyglot is a natural language pipeline that supports massive multilingual applications. Free software: GPLv3 license Documentation: http://p

May 12, 2022
Basic Utilities for PyTorch Natural Language Processing (NLP)

Basic Utilities for PyTorch Natural Language Processing (NLP) PyTorch-NLP, or torchnlp for short, is a library of basic utilities for PyTorch NLP. tor

May 23, 2022
Official Stanford NLP Python Library for Many Human Languages
Official Stanford NLP Python Library for Many Human Languages

Stanza: A Python NLP Library for Many Human Languages The Stanford NLP Group's official Python NLP library. It contains support for running various ac

May 14, 2022
运小筹公众号是致力于分享运筹优化(LP、MIP、NLP、随机规划、鲁棒优化)、凸优化、强化学习等研究领域的内容以及涉及到的算法的代码实现。
运小筹公众号是致力于分享运筹优化(LP、MIP、NLP、随机规划、鲁棒优化)、凸优化、强化学习等研究领域的内容以及涉及到的算法的代码实现。

OlittleRer 运小筹公众号是致力于分享运筹优化(LP、MIP、NLP、随机规划、鲁棒优化)、凸优化、强化学习等研究领域的内容以及涉及到的算法的代码实现。编程语言和工具包括Java、Python、Matlab、CPLEX、Gurobi、SCIP 等。 关注我们: 运筹小公众号 有问题可以直接在

May 19, 2022
NLP Core Library and Model Zoo based on PaddlePaddle 2.0
NLP Core Library and Model Zoo based on PaddlePaddle 2.0

PaddleNLP 2.0拥有丰富的模型库、简洁易用的API与高性能的分布式训练的能力,旨在为飞桨开发者提升文本建模效率,并提供基于PaddlePaddle 2.0的NLP领域最佳实践。

May 19, 2022