A Python Library for Graph Outlier Detection (Anomaly Detection)

PyGOD Logo

PyPI version Documentation status GitHub stars GitHub forks testing Coverage Status License

PyGOD is a Python library for graph outlier detection (anomaly detection). This exciting yet challenging field has many key applications, e.g., detecting suspicious activities in social networks [1] and security systems [2].

PyGOD includes more than 10 latest graph-based detection algorithms, such as DOMINANT (SDM'19) and GUIDE (BigData'21). For consistency and accessibility, PyGOD is developed on top of PyTorch Geometric (PyG) and PyTorch, and follows the API design of PyOD. See examples below for detecting outliers with PyGOD in 5 lines!

PyGOD is featured for:

  • Unified APIs, detailed documentation, and interactive examples across various graph-based algorithms.
  • Comprehensive coverage of more than 10 latest graph outlier detectors.
  • Full support of detections at multiple levels, such as node-, edge- (WIP), and graph-level tasks (WIP).
  • Scalable design for processing large graphs via mini-batch and sampling.
  • Streamline data processing with PyG--fully compatible with PyG data objects.

Outlier Detection Using PyGOD with 5 Lines of Code:

# train a dominant detector
from pygod.models import DOMINANT

model = DOMINANT(num_layers=4, epoch=20)  # hyperparameters can be set here
model.fit(data)  # data is a Pytorch Geometric data object

# get outlier scores on the input data
outlier_scores = model.decision_scores # raw outlier scores on the input data

# predict on the new data in the inductive setting
outlier_scores = model.decision_function(test_data) # raw outlier scores on the input data  # predict raw outlier scores on test

Citing PyGOD:

PyGOD paper is available on arxiv. If you use PyGOD in a scientific publication, we would appreciate citations to the following paper:

@article{pygod2022,
  author  = {Liu, Kay and Dou, Yingtong and Zhao, Yue and Ding, Xueying and Hu, Xiyang and Zhang, Ruitong and Ding, Kaize and Chen, Canyu and Peng, Hao and Shu, Kai and Chen, George H. and Jia, Zhihao and Yu, Philip S.},
  title   = {PyGOD: A Python Library for Graph Outlier Detection},
  journal = {arXiv preprint arXiv:2204.12095},
  year    = {2022},
}

or:

Liu, K., Dou, Y., Zhao, Y., Ding, X., Hu, X., Zhang, R., Ding, K., Chen, C., Peng, H., Shu, K., Chen, G.H., Jia, Z., and Yu, P.S. 2022. PyGOD: A Python Library for Graph Outlier Detection. arXiv preprint arXiv:2204.12095.

Installation

It is recommended to use pip or conda (wip) for installation. Please make sure the latest version is installed, as PyGOD is updated frequently:

pip install pygod            # normal install
pip install --upgrade pygod  # or update if needed

Alternatively, you could clone and run setup.py file:

git clone https://github.com/pygod-team/pygod.git
cd pygod
pip install .

Required Dependencies:

  • Python 3.6 +
  • numpy>=1.19.4
  • scikit-learn>=0.22.1
  • scipy>=1.5.2
  • setuptools>=50.3.1.post20201107

Note on PyG and PyTorch Installation: PyGOD depends on PyTorch Geometric (PyG), PyTorch, and networkx. To streamline the installation, PyGOD does NOT install these libraries for you. Please install them from the above links for running PyGOD:

  • torch>=1.10
  • pytorch_geometric>=2.0.3
  • networkx>=2.6.3

API Cheatsheet & Reference

Full API Reference: (https://docs.pygod.org). API cheatsheet for all detectors:

  • fit(X): Fit detector.
  • decision_function(G): Predict raw anomaly score of PyG data G using the fitted detector.

Key Attributes of a fitted model:

  • decision_scores_: The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores.
  • labels_: The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies.

For the inductive setting:

  • predict(G): Predict if nodes in PyG data G is an outlier or not using the fitted detector.
  • predict_proba(G): Predict the probability of nodes in PyG data G being outlier using the fitted detector.
  • predict_confidence(G): Predict the model's node-wise confidence (available in predict and predict_proba) [3].

Input of PyGOD: Please pass in a PyTorch Geometric (PyG) data object. See PyG data processing examples.

Implemented Algorithms

PyGOD toolkit consists of two major functional groups:

(i) Node-level detection :

Type Backbone Abbr Year Sampling Ref
Unsupervised MLP MLPAE 2014 Yes [4]
Unsupervised GNN GCNAE 2016 Yes [5]
Unsupervised MF ONE 2019 No [6]
Unsupervised GNN DOMINANT 2019 Yes [7]
Unsupervised GNN DONE 2020 Yes [8]
Unsupervised GNN AdONE 2020 Yes [8]
Unsupervised GNN AnomalyDAE 2020 Yes [9]
Unsupervised GAN GAAN 2020 Yes [10]
Unsupervised GNN OCGNN 2021 Yes [11]
Unsupervised/SSL GNN CoLA (beta) 2021 In progress [12]
Unsupervised/SSL GNN ANEMONE (beta) 2021 In progress [13]
Unsupervised GNN GUIDE 2021 Yes [14]
Unsupervised/SSL GNN CONAD 2022 Yes [15]

(ii) Utility functions :

Type Name Function Documentation
Metric eval_precision_at_k Calculating [email protected] eval_precision_at_k
Metric eval_recall_at_k Calculating [email protected] eval_recall_at_k
Metric eval_roc_auc Calculating ROC-AUC Score eval_roc_auc
Metric eval_average_precision Calculating average precision eval_average_precision
Data gen_structure_outliers Generating structural outliers gen_structure_outliers
Data gen_attribute_outliers Generating attribute outliers gen_attribute_outliers

Quick Start for Outlier Detection with PyGOD

"A Blitz Introduction" demonstrates the basic API of PyGOD using the dominant detector. It is noted that the API across all other algorithms are consistent/similar.


How to Contribute

You are welcome to contribute to this exciting project:

See contribution guide for more information.


PyGOD Team

PyGOD is a great team effort by researchers from UIC, IIT, BUAA, ASU, and CMU. Our core team members include:

Kay Liu (UIC), Yingtong Dou (UIC), Yue Zhao (CMU), Xueying Ding (CMU), Xiyang Hu (CMU), Ruitong Zhang (BUAA), Kaize Ding (ASU), Canyu Chen (IIT),

Reach out us by submitting an issue report or send an email to [email protected].


Reference

[1] Dou, Y., Liu, Z., Sun, L., Deng, Y., Peng, H. and Yu, P.S., 2020, October. Enhancing graph neural network-based fraud detectors against camouflaged fraudsters. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM).
[2] Cai, L., Chen, Z., Luo, C., Gui, J., Ni, J., Li, D. and Chen, H., 2021, October. Structural temporal graph neural networks for anomaly detection in dynamic graphs. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (CIKM).
[3] Perini, L., Vercruyssen, V., Davis, J. Quantifying the confidence of anomaly detectors in their example-wise predictions. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), 2020.
[4] Sakurada, M. and Yairi, T., 2014, December. Anomaly detection using autoencoders with nonlinear dimensionality reduction. In Proceedings of the MLSDA 2014 2nd workshop on machine learning for sensory data analysis.
[5] Kipf, T.N. and Welling, M., 2016. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308.
[6] Bandyopadhyay, S., Lokesh, N. and Murty, M.N., 2019, July. Outlier aware network embedding for attributed networks. In Proceedings of the AAAI conference on artificial intelligence (AAAI).
[7] Ding, K., Li, J., Bhanushali, R. and Liu, H., 2019, May. Deep anomaly detection on attributed networks. In Proceedings of the SIAM International Conference on Data Mining (SDM).
[8] (1, 2) Bandyopadhyay, S., Vivek, S.V. and Murty, M.N., 2020, January. Outlier resistant unsupervised deep architectures for attributed network embedding. In Proceedings of the International Conference on Web Search and Data Mining (WSDM).
[9] Fan, H., Zhang, F. and Li, Z., 2020, May. AnomalyDAE: Dual autoencoder for anomaly detection on attributed networks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Chen, Z., Liu, B., Wang, M., Dai, P., Lv, J. and Bo, L., 2020, October. Generative adversarial attributed network anomaly detection. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM).
[11] Wang, X., Jin, B., Du, Y., Cui, P., Tan, Y. and Yang, Y., 2021. One-class graph neural networks for anomaly detection in attributed networks. Neural computing and applications.
[12] Liu, Y., Li, Z., Pan, S., Gong, C., Zhou, C. and Karypis, G., 2021. Anomaly detection on attributed networks via contrastive self-supervised learning. IEEE transactions on neural networks and learning systems (TNNLS).
[13] Jin, M., Liu, Y., Zheng, Y., Chi, L., Li, Y. and Pan, S., 2021. ANEMONE: Graph Anomaly Detection with Multi-Scale Contrastive Learning. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (CIKM).
[14] Yuan, X., Zhou, N., Yu, S., Huang, H., Chen, Z. and Xia, F., 2021, December. Higher-order Structure Based Anomaly Detection on Attributed Networks. In 2021 IEEE International Conference on Big Data (Big Data).
[15] Xu, Z., Huang, X., Zhao, Y., Dong, Y., and Li, J., 2022. Contrastive Attributed Network Anomaly Detection with Data Augmentation. In Proceedings of the 26th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD).
Owner
PyGOD Team
Maintaining A Python Library for Graph Outlier Detection (Anomaly Detection)
PyGOD Team
Comments
  • Query on Anomaly Prediction and Outlier labels

    Query on Anomaly Prediction and Outlier labels

    Hi,

    Given a graph object in the prediction API, What does the outlier labels mentioned hereas outlier_labels (numpy array of shape (n_samples,)) indicate from a graph perspective?

    Does the contents in the numpy array as 1 or 0 indicate the Nodes in the graph that are normal or anomalous? for example Labels: [0 0 0 ... 0 0 0] . Does each 0 value pertain to a node in graph?

    So, How should this prediction output be interpreted from a graph perspective? Thanks in advance.

  • remove external (non-core-python) library `argparse` as a dependency

    remove external (non-core-python) library `argparse` as a dependency

    Describe the bug

    The current dependencies include installing an external library: argparse.

    It must be noted that argparse is a part of the core python libraries. There is no need for installing it alike other external libraries.

    • docs for core-library, argparse: https://docs.python.org/3/library/argparse.html

    :fire: EDIT: The library (argparse) that you are installing from PyPI is no longer maintained as it is now a part of standard python3. See my comment here.

    See further details:

    • https://gitter.im/conda-forge/conda-forge.github.io?at=62598b5b0466b352a46afd25

    https://github.com/pygod-team/pygod/blob/d037b67bd3001f4d45be5093b3717700aa79d953/requirements.txt#L1-L5

  • adone get unexpected keyword argument

    adone get unexpected keyword argument

    Running examples\adone.py for replication

    C:\Users\yuezh\Anaconda3\envs\torch19\python.exe C:/Users/yuezh/PycharmProjects/pygod/examples/adone.py training... Traceback (most recent call last): File "C:/Users/yuezh/PycharmProjects/pygod/examples/adone.py", line 35, in model.fit(data) File "C:\Users\yuezh\PycharmProjects\pygod\pygod\models\adone.py", line 158, in fit act=self.act).to(self.device) File "C:\Users\yuezh\PycharmProjects\pygod\pygod\models\adone.py", line 331, in init act=act) TypeError: init() got an unexpected keyword argument 'in_channels'

  • ONE does not accept negative value

    ONE does not accept negative value

    It appears that it would throw an error if the input x contains negative values. If this is expected, we should probably mention it somewhere. image

    check pygod/test/test_one.py

  • Enabling different hidden dimension for attribute autoencoder and structure autoencoder

    Enabling different hidden dimension for attribute autoencoder and structure autoencoder

    Is your feature request related to a problem? Please describe. For now, some detectors (e.g., GUIDE) has two separate autoencoders for attribute and structure, but two autoencoders share the same hidden layer dimension. In many cases, there are a significant difference between the dimension of the node attributes and the dimension of structure information (e.g., adjacency matrix). Using the same hidden dimension may hampers the performance of the detectors.

    Describe the solution you'd like Enabling different hidden dimension for attribute autoencoder and structure autoencoder

  • Add tutorial for load the data from other formats

    Add tutorial for load the data from other formats

    We can add a tutorial in the document about loading data from numpy, scipy, matlab, networkx and other common data formats. Some of the data loaders can be found in https://pytorch-geometric.readthedocs.io/en/latest/modules/utils.html

  • GUIDE Bug on Cora dataset

    GUIDE Bug on Cora dataset

    File "/hdisk2/pygod_benchmark/pygod/models/guide.py", line 158, in fit
        x_, s_ = self.model(x, s, edge_index)
      File "/hdisk2/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/hdisk2/pygod_benchmark/pygod/models/guide.py", line 369, in forward
        s_ = self.struct_ae(s, edge_index)
      File "/hdisk2/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/hdisk2/pygod_benchmark/pygod/models/guide.py", line 394, in forward
        s = layer(s, edge_index)
      File "/hdisk2/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/hdisk2/pygod_benchmark/pygod/models/guide.py", line 411, in forward
        out = self.propagate(edge_index, s=self.w2(s))
      File "/hdisk2/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/hdisk2/anaconda3/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 103, in forward
        return F.linear(input, self.weight, self.bias)
      File "/hdisk2/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1848, in linear
        return torch._C._nn.linear(input, weight, bias)
    RuntimeError: expected scalar type Float but found Long
    
  • OCGNN code issue

    OCGNN code issue

    • The model will run 3* default_epochs during training.
    • In fit() function, the epoch and loss value print should move into the verbose condition.
    • The current performance looks weird on wiki and Cora datasets (see the shared document).
    • Unused code at lines 236, 260-263.
    • Correct the torch.Tensor type hint in docstrings.
  • MLPAE bug when set contamination=0.03 during model initialization

    MLPAE bug when set contamination=0.03 during model initialization

    File "/hdisk2/pygod_benchmark/pygod/models/mlpae.py", line 137, in fit
        self._process_decision_scores()
      File "/hdisk2/pygod_benchmark/pygod/models/base.py", line 278, in _process_decision_scores
        100 * (1 - self.contamination))
      File "<__array_function__ internals>", line 6, in percentile
      File "/hdisk2/anaconda3/lib/python3.7/site-packages/numpy/lib/function_base.py", line 3733, in percentile
        a, q, axis, out, overwrite_input, interpolation, keepdims)
      File "/hdisk2/anaconda3/lib/python3.7/site-packages/numpy/lib/function_base.py", line 3853, in _quantile_unchecked
        interpolation=interpolation)
      File "/hdisk2/anaconda3/lib/python3.7/site-packages/numpy/lib/function_base.py", line 3404, in _ureduce
        a = np.asanyarray(a)
      File "/hdisk2/anaconda3/lib/python3.7/site-packages/numpy/core/_asarray.py", line 136, in asanyarray
        return array(a, dtype, copy=False, order=order, subok=True)
      File "/hdisk2/anaconda3/lib/python3.7/site-packages/torch/_tensor.py", line 678, in __array__
        return self.numpy()
    RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.
    
  • data shape issue in anaomalydae

    data shape issue in anaomalydae

    replicate by running examples/anomalydae.py

    Please make sure the example could run :)

    predicting for probability Traceback (most recent call last): File "C:/Users/yuezh/PycharmProjects/pygod/examples/anomalydae.py", line 39, in prob = model.predict_proba(data) File "C:\Users\yuezh\PycharmProjects\pygod\pygod\models\base.py", line 176, in predict_proba test_scores = self.decision_function(G) File "C:\Users\yuezh\PycharmProjects\pygod\pygod\models\anomalydae.py", line 360, in decision_function A_hat, X_hat = self.model(attrs, adj) File "C:\Users\yuezh\Anaconda3\envs\torch19\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\yuezh\PycharmProjects\pygod\pygod\models\anomalydae.py", line 169, in forward A_hat, embed_x = self.structure_AE(x, edge_index) File "C:\Users\yuezh\Anaconda3\envs\torch19\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\yuezh\PycharmProjects\pygod\pygod\models\anomalydae.py", line 70, in forward embed_x = self.attention_layer(x, edge_index) File "C:\Users\yuezh\Anaconda3\envs\torch19\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\yuezh\Anaconda3\envs\torch19\lib\site-packages\torch_geometric\nn\conv\gat_conv.py", line 230, in forward num_nodes=num_nodes) File "C:\Users\yuezh\Anaconda3\envs\torch19\lib\site-packages\torch_geometric\utils\loop.py", line 144, in add_self_loops edge_index = torch.cat([edge_index, loop_index], dim=1) RuntimeError: Sizes of tensors must match except in dimension 0. Got 2 and 2708 (The offending index is 0)

  • Dominant model data loading and training problems

    Dominant model data loading and training problems

    @kayzliu When I write the Dominant example, I find the following issues. Please fix/answer them accordingly.

    1. The current process_graph function is dedicated to the BlogCatalog dataset, we need to write a general dataloader that could handle any PyG data object. The preprocessing code for BlogCatalog can be put into the dominant.py under /example.
    2. When I run model.fit(), train_loss became NaN after 5-6 epochs.
    3. How is the outlier label of BlogCatalog generated?
    4. Should we train the model on clean data and evaluate it on data with outliers?
  • Add tutorials for hyperparamer tuning

    Add tutorials for hyperparamer tuning

    Hi, wide collection of unsupervised algorithms is amazing. But if there aren't sufficient examples on tuning them, other developers may never use it.

    I am planning to use these algorithms on publicly available graphs and write tutorials on the same.

    I have major experience in deep learning but not in graph neural networks. I can pull this off with sufficient amount of help on underlying algorithms

Streaming Anomaly Detection Framework in Python (Outlier Detection for Streaming Data)

Python Streaming Anomaly Detection (PySAD) PySAD is an open-source python framework for anomaly detection on streaming multivariate data. Documentatio

May 13, 2022
Deep Anomaly Detection with Outlier Exposure (ICLR 2019)
Deep Anomaly Detection with Outlier Exposure (ICLR 2019)

Outlier Exposure This repository contains the essential code for the paper Deep Anomaly Detection with Outlier Exposure (ICLR 2019). Requires Python 3

May 15, 2022
Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy" (ICLR 2022 Spotlight)

About Code release for Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy (ICLR 2022 Spotlight)

May 13, 2022
Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks"

LUNAR Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks" Adam Goodge, Bryan Hooi, Ng See Kiong and

May 11, 2022
SSD: A Unified Framework for Self-Supervised Outlier Detection [ICLR 2021]

SSD: A Unified Framework for Self-Supervised Outlier Detection [ICLR 2021] Pdf: https://openreview.net/forum?id=v5gjXpmR8J Code for our ICLR 2021 pape

May 15, 2022
Outlier Exposure with Confidence Control for Out-of-Distribution Detection
Outlier Exposure with Confidence Control for Out-of-Distribution Detection

OOD-detection-using-OECC This repository contains the essential code for the paper Outlier Exposure with Confidence Control for Out-of-Distribution De

May 1, 2022
(Py)TOD: Tensor-based Outlier Detection, A General GPU-Accelerated Framework
 (Py)TOD: Tensor-based Outlier Detection, A General GPU-Accelerated Framework

(Py)TOD: Tensor-based Outlier Detection, A General GPU-Accelerated Framework Background: Outlier detection (OD) is a key data mining task for identify

May 16, 2022
A PyTorch implementation of "ANEMONE: Graph Anomaly Detection with Multi-Scale Contrastive Learning", CIKM-21
A PyTorch implementation of

ANEMONE A PyTorch implementation of "ANEMONE: Graph Anomaly Detection with Multi-Scale Contrastive Learning", CIKM-21 Dependencies python==3.6.1 dgl==

Mar 1, 2022
A gesture recognition system powered by OpenPose, k-nearest neighbours, and local outlier factor.
A gesture recognition system powered by OpenPose, k-nearest neighbours, and local outlier factor.

OpenHands OpenHands is a gesture recognition system powered by OpenPose, k-nearest neighbours, and local outlier factor. Currently the system can iden

Jan 10, 2022
Certifiable Outlier-Robust Geometric Perception

Certifiable Outlier-Robust Geometric Perception About This repository holds the implementation for certifiably solving outlier-robust geometric percep

May 15, 2022
VOS: Learning What You Don’t Know by Virtual Outlier Synthesis

VOS This is the source code accompanying the paper VOS: Learning What You Don’t

May 18, 2022
Demo project for real time anomaly detection using kafka and python
Demo project for real time anomaly detection using kafka and python

kafkaml-anomaly-detection Project for real time anomaly detection using kafka and python It's assumed that zookeeper and kafka are running in the loca

Apr 29, 2022
Real-world Anomaly Detection in Surveillance Videos- pytorch Re-implementation

Real world Anomaly Detection in Surveillance Videos : Pytorch RE-Implementation This repository is a re-implementation of "Real-world Anomaly Detectio

Apr 18, 2022
Paper list of log-based anomaly detection

Paper list of log-based anomaly detection

May 11, 2022
This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.
This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.

This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.

May 13, 2022
Unofficial implementation of PatchCore anomaly detection
Unofficial implementation of PatchCore anomaly detection

PatchCore anomaly detection Unofficial implementation of PatchCore(new SOTA) anomaly detection model Original Paper : Towards Total Recall in Industri

May 18, 2022
MemStream: Memory-Based Anomaly Detection in Multi-Aspect Streams with Concept Drift
MemStream: Memory-Based Anomaly Detection in Multi-Aspect Streams with Concept Drift

MemStream Implementation of MemStream: Memory-Based Anomaly Detection in Multi-Aspect Streams with Concept Drift . Siddharth Bhatia, Arjit Jain, Shivi

May 15, 2022
USAD - UnSupervised Anomaly Detection on multivariate time series

USAD - UnSupervised Anomaly Detection on multivariate time series Scripts and utility programs for implementing the USAD architecture. Implementation

May 17, 2022
LogDeep is an open source deeplearning-based log analysis toolkit for automated anomaly detection.
LogDeep is an open source deeplearning-based log analysis toolkit for automated anomaly detection.

LogDeep is an open source deeplearning-based log analysis toolkit for automated anomaly detection.

May 17, 2022