PyGCL is a PyTorchbased opensource Graph Contrastive Learning (GCL) library, which features modularized GCL components from published papers, standardized evaluation, and experiment management.
What is Graph Contrastive Learning?
Graph Contrastive Learning (GCL) establishes a new paradigm for learning graph representations without human annotations. A typical GCL algorithm firstly constructs multiple graph views via stochastic augmentation of the input and then learns representations by contrasting positive samples against negative ones.
Install
Prerequisites
PyGCL needs the following packages to be installed beforehand:
 Python 3.8+
 PyTorch 1.9+
 PyTorchGeometric 1.7
 DGL 0.7+
 Scikitlearn 0.24+
 Numpy
 tqdm
 NetworkX
Installation via PyPI
To install PyGCL with pip
, simply run:
pip install PyGCL
Then, you can import GCL
from your current environment.
A note regarding DGL
Currently the DGL team maintains two versions,
dgl
for CPU support anddglcu***
for CUDA support. Sincepip
treats them as different packages, it is hard for PyGCL to check for the version requirement ofdgl
. We have removed such dependency checks fordgl
in our setup configuration and require the users to install a proper version by themselves.
Package Overview
Our PyGCL implements four main components of graph contrastive learning algorithms:
 Graph augmentation: transforms input graphs into congruent graph views.
 Contrasting architectures and modes: generate positive and negative pairs according to node and graph embeddings.
 Contrastive objectives: computes the likelihood score for positive and negative pairs.
 Negative mining strategies: improves the negative sample set by considering the relative similarity (the hardness) of negative sample.
We also implement utilities for training models, evaluating model performance, and managing experiments.
Implementations and Examples
For a quick start, please check out the examples
folder. We currently implemented the following methods:
 DGI (P. Veličković et al., Deep Graph Infomax, ICLR, 2019) [Example1, Example2]
 InfoGraph (F.Y. Sun et al., InfoGraph: Unsupervised and Semisupervised GraphLevel Representation Learning via Mutual Information Maximization, ICLR, 2020) [Example]
 MVGRL (K. Hassani et al., Contrastive MultiView Representation Learning on Graphs, ICML, 2020) [Example1, Example2]
 GRACE (Y. Zhu et al., Deep Graph Contrastive Representation Learning, [email protected], 2020) [Example]
 GraphCL (Y. You et al., Graph Contrastive Learning with Augmentations, NeurIPS, 2020) [Example]
 SupCon (P. Khosla et al., Supervised Contrastive Learning, NeurIPS, 2020) [Example]
 HardMixing (Y. Kalantidis et al., Hard Negative Mixing for Contrastive Learning, NeurIPS, 2020)
 DCL (C.Y. Chuang et al., Debiased Contrastive Learning, NeurIPS, 2020)
 HCL (J. Robinson et al., Contrastive Learning with Hard Negative Samples, ICLR, 2021)
 Ring (M. Wu et al., Conditional Negative Sampling for Contrastive Learning of Visual Representations, ICLR, 2021)
 Exemplar (N. Zhao et al., What Makes Instance Discrimination Good for Transfer Learning?, ICLR, 2021)
 BGRL (S. Thakoor et al., Bootstrapped Representation Learning on Graphs, arXiv, 2021) [Example1, Example2]
 GBT (P. Bielak et al., Graph Barlow Twins: A SelfSupervised Representation Learning Framework for Graphs, arXiv, 2021) [Example]
 VICReg (A. Bardes et al., VICReg: VarianceInvarianceCovariance Regularization for SelfSupervised Learning, arXiv, 2021)
Building Your Own GCL Algorithms
Besides try the above examples for node and graph classification tasks, you can also build your own graph contrastive learning algorithms straightforwardly.
Graph Augmentation
In GCL.augmentors
, PyGCL provides the Augmentor
base class, which offers a universal interface for graph augmentation functions. Specifically, PyGCL implements the following augmentation functions:
Augmentation  Class name 

Edge Adding (EA)  EdgeAdding 
Edge Removing (ER)  EdgeRemoving 
Feature Masking (FM)  FeatureMasking 
Feature Dropout (FD)  FeatureDropout 
Edge Attribute Masking (EAR)  EdgeAttrMasking 
Personalized PageRank (PPR)  PPRDiffusion 
Markov Diffusion Kernel (MDK)  MarkovDiffusion 
Node Dropping (ND)  NodeDropping 
Node Shuffling (NS)  NodeShuffling 
Subgraphs induced by Random Walks (RWS)  RWSampling 
Egonet Sampling (ES)  Identity 
Call these augmentation functions by feeding with a Graph
in a tuple form of node features, edge index, and edge features (x, edge_index, edge_attrs)
will produce corresponding augmented graphs.
Composite Augmentations
PyGCL supports composing arbitrary numbers of augmentations together. To compose a list of augmentation instances augmentors
, you need to use the Compose
class:
import GCL.augmentors as A
aug = A.Compose([A.EdgeRemoving(pe=0.3), A.FeatureMasking(pf=0.3)])
You can also use the RandomChoice
class to randomly draw a few augmentations each time:
import GCL.augmentors as A
aug = A.RandomChoice([A.RWSampling(num_seeds=1000, walk_length=10),
A.NodeDropping(pn=0.1),
A.FeatureMasking(pf=0.1),
A.EdgeRemoving(pe=0.1)],
num_choices=1)
Customizing Your Own Augmentation
You can write your own augmentation functions by inheriting the base Augmentor
class and defining the augment
function.
Contrasting Architectures and Modes
Existing GCL architectures could be grouped into two lines: negativesamplebased methods and negativesamplefree ones.
 Negativesamplebased approaches can either have one single branch or two branches. In singlebranch contrasting, we only need to construct one graph view and perform contrastive learning within this view. In dualbranch models, we generate two graph views and perform contrastive learning within and across views.
 Negativesamplefree approaches eschew the need of explicit negative samples. Currently, PyGCL supports the bootstrapstyle contrastive learning as well contrastive learning within embeddings (such as Barlow Twins and VICReg).
Contrastive architectures  Supported contrastive modes  Need negative samples  Class name  Examples 

Singlebranch contrasting  G2L only 

SingleBranchContrast 
DGI, InfoGraph 
Dualbranch contrasting  L2L, G2G, and G2L 

DualBranchContrast 
GRACE 
Bootstrapped contrasting  L2L, G2G, and G2L 

BootstrapContrast 
BGRL 
Withinembedding contrasting  L2L and G2G 

WithinEmbedContrast 
GBT 
Moreover, you can use add_extra_mask
if you want to add positives or remove negatives. This function performs bitwise ADD to extra positive masks specified by extra_pos_mask
and bitwise OR to extra negative masks specified by extra_neg_mask
. It is helpful, for example, when you have supervision signals from labels and want to train the model in a semisupervised manner.
Internally, PyGCL calls Sampler
classes in GCL.models
that receive embeddings and produce positive/negative masks. PyGCL implements three contrasting modes: (a) LocalLocal (L2L), (b) GlobalGlobal (G2G), and (c) GlobalLocal (G2L) modes. L2L and G2G modes contrast embeddings at the same scale and the latter G2L one performs crossscale contrasting. To implement your own GCL model, you may also use these provided sampler models:
Contrastive modes  Class name 

Samescale contrasting (L2L and G2G)  SameScaleSampler 
Crossscale contrasting (G2L)  CrossScaleSampler 
 For L2L and G2G, embedding pairs of the same node/graph in different views constitute positive pairs. You can refer to GRACE and GraphCL for examples.
 For G2L, nodegraph embedding pairs form positives. Note that for singlegraph datasets, the G2L mode requires explicit negative sampling (otherwise no negatives for contrasting). You can refer to DGI for an example.
 Some models (e.g., GRACE) add extra intraview negative samples. You may manually call
sampler.add_intraview_negs
to enlarge the negative sample set.  Note that the bootstrapping latent model involves some special model design (asymmetric online/offline encoders and momentum weight updates). You may refer to BGRL for details.
Contrastive Objectives
In GCL.losses
, PyGCL implements the following contrastive objectives:
Contrastive objectives  Class name 

InfoNCE loss  InfoNCE 
JensenShannon Divergence (JSD) loss  JSD 
Triplet Margin (TM) loss  Triplet 
Bootstrapping Latent (BL) loss  BootstrapLatent 
Barlow Twins (BT) loss  BarlowTwins 
VICReg loss  VICReg 
All these objectives are able to contrast any arbitrary positive and negative pairs, except for Barlow Twins and VICReg losses that perform contrastive learning within embeddings. Moreover, for InfoNCE and Triplet losses, we further provide SP
variants that computes contrastive objectives given only one positive pair per sample to speed up computation and avoid excessive memory consumption.
Negative Sampling Strategies
PyGCL further implements several negative sampling strategies:
Negative sampling strategies  Class name 

Subsampling  GCL.models.SubSampler 
Hard negative mixing  GCL.models.HardMixing 
Conditional negative sampling  GCL.models.Ring 
Debiased contrastive objective  GCL.losses.DebiasedInfoNCE , GCL.losses.DebiasedJSD 
Hardnessbiased negative sampling  GCL.losses.HardnessInfoNCE , GCL.losses.HardnessJSD 
The former three models serve as an additional sampling step similar to existing Sampler
ones and can be used in conjunction with any objectives. The last two objectives are only for InfoNCE and JSD losses.
Utilities
PyGCL provides a variety of evaluator functions to evaluate the embedding quality:
Evaluator  Class name 

Logistic regression  LREvaluator 
Support vector machine  SVMEvaluator 
Random forest  RFEvaluator 
To use these evaluators, you first need to generate dataset splits by get_split
(random split) or by from_predefined_split
(according to preset splits).
Contribution
Feel free to open an issue should you find anything unexpected or create pull requests to add your own work! We are motivated to continuously make PyGCL even better.
Citation
Please cite our paper if you use this code in your own work:
@article{Zhu:2021tu,
author = {Zhu, Yanqiao and Xu, Yichen and Liu, Qiang and Wu, Shu},
title = {{An Empirical Study of Graph Contrastive Learning}},
journal = {arXiv.org},
year = {2021},
eprint = {2109.01116v1},
eprinttype = {arxiv},
eprintclass = {cs.LG},
month = sep,
}