pip install pyrelational
Organisation of repository
pyrelationalcontains the source code for the pyrelational package. It contains the main sub-packages for active learning strategies, various informativeness measures, and methods for estimating posterior uncertainties.
examplescontains various example scripts and notebooks detailing how the package can be used
testsunit tests for pyrelational package
docsdocs and assets for docs
# Active Learning package import pyrelational as pal from pyrelational.data.data_manager import GenericDataManager from pyrelational.strategies.generic_al_strategy import GenericActiveLearningStrategy from pyrelational.models.generic_model import GenericModel # Instantiate data-loaders, models, trainers the usual Pytorch/PytorchLightning way # In most cases, no change is needed to current workflow to incorporate # active learning data_manager = GenericDataManager(dataset, train_mask, validation_mask, test_mask) # Create a model class that will handle model instantiation model = GenericModel(ModelConstructor, model_config, trainer_config, **kwargs) # Use the various implemented active learning strategies or define your own al_manager = GenericActiveLearningStrategy(data_manager=data_manager, model=model) al_manager.theoretical_performance(test_loader=test_loader) al_manager.full_active_learning_run(num_annotate=100, test_loader=test_loader)
The pyrelational package offers a flexible workflow to enable active learning with as little change to the models and datasets as possible. It is partially inspired by Robert (Munro) Monarch's book: "Human-In-The-Loop Machine Learning" and shares some vocabulary from there. It is principally designed with PyTorch in mind, but can be easily extended to work with other libraries.
For a primer on active learning, we refer the reader to Burr Settles's survey [reference]. In his own words
The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer training labels if it is allowed to choose the data from which it learns. An active learner may pose queries, usually in the form of unlabeled data instances to be labeled by an oracle (e.g., a human annotator). Active learning is well-motivated in many modern machine learning problems, where unlabeled data may be abundant or easily obtained, but labels are difficult, time-consuming, or expensive to obtain.
pyrelational package decomposes the active learning workflow into four main components: 1) a data manager, 2) a model, 3) an acquisition strategy built around informativeness scorer, and 4) an oracle (see Figure above). Note that the oracle is external to the package.
The data manager (defined in
pyrelational.data.data_manager.GenericDataManager) wraps around a PyTorch Dataset and handles dataloader instantiation as well as tracking and updating of labelled and unlabelled sample pools.
The model (subclassed from
pyrelational.models.generic_model.GenericModel) wraps a user defined ML model (e.g. PyTorch Module, Pytorch Lightning Module, or scikit-learn estimator) and handles instantiation, training, testing, as well as uncertainty quantification (e.g. ensembling, MC-dropout). It also enables using ML models that directly estimate their uncertainties such as Gaussian Processes (see
The active learning strategy (which subclass
pyrelational.strategies.generic_al_strategy.GenericActiveLearningStrategy) revolves around an informativeness score that serve as the basis for the selection of the query sent to the oracle for labelling. We define various strategies for classification, regression, and task-agnostic scenarios based on informativeness scorer defined in
Prerequisites and setup
For those just using the package, installation only requires standard ML packages and PyTorch. Starting with a new virtual environment (miniconda environment recommended), install standard learning packages and numerical tools.
pip install -r requirements.txt
If you wish to contribute to the code, run
pre-commit install after the above step.
Building the docs
Make sure you have
sphinx-rtd-theme packages installed (
pip install sphinx sphinx_rtd_theme will install this).
To generate the docs,
cd into the
docs/ directory and run
make html. This will generate the docs at
Quickstart & examples
examples/ folder contains multiple scripts and notebooks demonstrating how to use pyrelational effectively.
The diverse examples scripts and notebooks aim to showcase how to use pyrelational in various scenario. Specifically,
examples with regression
examples with classification tasks
examples with task-agnostic acquisition
examples showcasing different uncertainty estimator
examples custom acquisition strategy
examples custom model
- Ensemble of models (a.k.a. commitee)
- DropConnect (coming soon)
- SWAG (coming soon)
- MultiSWAG (coming soon)
Informativeness scorer included in the library
Regression (N.B. pyrelational currently only supports single scalar regression tasks)
- Least confidence
- Expected improvement
- Thompson Sampling
- Upper confidence bound (UCB)
- BatchBALD (coming soon)
Classification (N.B. pyrelational does not support multi-label classification at the moment)
- Least confidence
- Margin confidence
- Entropy based confidence
- Ratio based confidence
- Thompson Sampling (coming soon)
- BatchBALD (coming soon)
Model agnostic and diversity sampling based approaches
- Representative sampling
- Diversity sampling
- Random acquisition