cuGraph - RAPIDS Graph Analytics Library

 cuGraph - GPU Graph Analytics

Build Status

The RAPIDS cuGraph library is a collection of GPU accelerated graph algorithms that process data found in GPU DataFrames. The vision of cuGraph is to make graph analysis ubiquitous to the point that users just think in terms of analysis and not technologies or frameworks. To realize that vision, cuGraph operates, at the Python layer, on GPU DataFrames, thereby allowing for seamless passing of data between ETL tasks in cuDF and machine learning tasks in cuML. Data scientists familiar with Python will quickly pick up how cuGraph integrates with the Pandas-like API of cuDF. Likewise, users familiar with NetworkX will quickly recognize the NetworkX-like API provided in cuGraph, with the goal to allow existing code to be ported with minimal effort into RAPIDS. For users familiar with C++/CUDA and graph structures, a C++ API is also provided. However, there is less type and structure checking at the C++ layer.

For more project details, see rapids.ai.

NOTE: For the latest stable README.md ensure you are on the latest branch.

As an example, the following Python snippet loads graph data and computes PageRank:

import cugraph

# read data into a cuDF DataFrame using read_csv
gdf = cudf.read_csv("graph_data.csv", names=["src", "dst"], dtype=["int32", "int32"])

# We now have data as edge pairs
# create a Graph using the source (src) and destination (dst) vertex pairs
G = cugraph.Graph()
G.from_cudf_edgelist(gdf, source='src', destination='dst')

# Let's now get the PageRank score of each vertex by calling cugraph.pagerank
df_page = cugraph.pagerank(G)

# Let's look at the PageRank Score (only do this on small graphs)
for i in range(len(df_page)):
	print("vertex " + str(df_page['vertex'].iloc[i]) +
		" PageRank is " + str(df_page['pagerank'].iloc[i]))

Getting cuGraph

There are 3 ways to get cuGraph :

  1. Quick start with Docker Repo
  2. Conda Installation
  3. Build from Source


Currently Supported Features

As of Release 0.18 - including 0.18 nightly

Supported Algorithms

Category Algorithm Scale Notes
Centrality
Katz Multi-GPU
Betweenness Centrality Single-GPU
Edge Betweenness Centrality Single-GPU
Community
EgoNet Single-GPU
Leiden Single-GPU
Louvain Multi-GPU
Ensemble Clustering for Graphs Single-GPU
Spectral-Clustering - Balanced Cut Single-GPU
Spectral-Clustering - Modularity Single-GPU
Subgraph Extraction Single-GPU
Triangle Counting Single-GPU
K-Truss Single-GPU
Components
Weakly Connected Components Single-GPU
Strongly Connected Components Single-GPU
Core
K-Core Single-GPU
Core Number Single-GPU
Layout
Force Atlas 2 Single-GPU
Linear Assignment
Hungarian Single-GPU README
Link Analysis
Pagerank Multi-GPU
Personal Pagerank Multi-GPU
HITS Single-GPU leverages Gunrock
Link Prediction
Jaccard Similarity Single-GPU
Weighted Jaccard Similarity Single-GPU
Overlap Similarity Single-GPU
Traversal
Breadth First Search (BFS) Multi-GPU with cutoff support
Single Source Shortest Path (SSSP) Multi-GPU
Traveling Salesperson Problem (TSP) Single-GPU
Structure
Renumbering Single-GPU multiple columns, any data type
Symmetrize Multi-GPU
Other
Minimum Spanning Tree Single-GPU
Maximum Spanning Tree Single-GPU



Supported Graph

Type Description
Graph An undirected Graph
DiGraph A Directed Graph
Multigraph A Graph with multiple edges between a vertex pair
MultiDigraph A Directed Graph with multiple edges between a vertex pair



Supported Data Types

cuGraph supports graph creation with Source and Destination being expressed as:

  • cuDF DataFrame
  • Pandas DataFrame

cuGraph supports execution of graph algorithms from different graph objects

  • cuGraph Graph classes
  • NetworkX graph classes
  • CuPy sparse matrix
  • SciPy sparse matrix

cuGraph tries to match the return type based on the input type. So a NetworkX input will return the same data type that NetworkX would have.

cuGraph Notice

The current version of cuGraph has some limitations:

  • Vertex IDs are expected to be contiguous integers starting from 0.

cuGraph provides the renumber function to mitigate this problem, which is by default automatically called when data is addted to a graph. Input vertex IDs for the renumber function can be any type, can be non-contiguous, can be multiple columns, and can start from an arbitrary number. The renumber function maps the provided input vertex IDs to 32-bit contiguous integers starting from 0. cuGraph still requires the renumbered vertex IDs to be representable in 32-bit integers. These limitations are being addressed and will be fixed soon.

Additionally, when using the auto-renumbering feature, vertices are automatically un-renumbered in results.

cuGraph is constantly being updated and improved. Please see the Transition Guide if errors are encountered with newer versions

Graph Sizes and GPU Memory Size

The amount of memory required is dependent on the graph structure and the analytics being executed. As a simple rule of thumb, the amount of GPU memory should be about twice the size of the data size. That gives overhead for the CSV reader and other transform functions. There are ways around the rule but using smaller data chunks.

Size Recommended GPU Memory
500 million edges 32 GB
250 million edges 16 GB

The use of managed memory for oversubscription can also be used to exceed the above memory limitations. See the recent blog on Tackling Large Graphs with RAPIDS cuGraph and CUDA Unified Memory on GPUs: https://medium.com/rapids-ai/tackling-large-graphs-with-rapids-cugraph-and-unified-virtual-memory-b5b69a065d4




Quick Start

Please see the Docker Repository, choosing a tag based on the NVIDIA CUDA version you’re running. This provides a ready to run Docker container with example notebooks and data, showcasing how you can utilize all of the RAPIDS libraries: cuDF, cuML, and cuGraph.

Conda

It is easy to install cuGraph using conda. You can get a minimal conda installation with Miniconda or get the full installation with Anaconda.

Install and update cuGraph using the conda command:

# CUDA 10.1
conda install -c nvidia -c rapidsai -c numba -c conda-forge -c defaults cugraph cudatoolkit=10.1

# CUDA 10.2
conda install -c nvidia -c rapidsai -c numba -c conda-forge -c defaults cugraph cudatoolkit=10.2

# CUDA 11.0
conda install -c nvidia -c rapidsai -c numba -c conda-forge -c defaults cugraph cudatoolkit=11.0

Note: This conda installation only applies to Linux and Python versions 3.7/3.8.

Build from Source and Contributing

Please see our guide for building cuGraph from source

Please see our guide for contributing to cuGraph.

Documentation

Python API documentation can be generated from docs directory.


Open GPU Data Science

The RAPIDS suite of open source software libraries aims to enable execution of end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization but exposing that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.

Apache Arrow on GPU

The GPU version of Apache Arrow is a common API that enables efficient interchange of tabular data between processes running on the GPU. End-to-end computation on the GPU avoids unnecessary copying and converting of data off the GPU, reducing compute time and cost for high-performance analytics common in artificial intelligence workloads. As the name implies, cuDF uses the Apache Arrow columnar data format on the GPU. Currently, a subset of the features in Apache Arrow are supported.

Owner
RAPIDS
Open GPU Data Science
RAPIDS
Comments
  • Update to changed `rmm::device_scalar` API

    Update to changed `rmm::device_scalar` API

    After rapidsai/rmm/#789 is a breaking API change for rmm::device_scalar. This PR fixes a couple of uses of rmm::device_scalar to fix the build of cuGraph, and should be merged immediately after rapidsai/rmm/#789.

    Also fixes an unrelated narrowing conversion warning.

  • [FEA] Remove FAISS dependency, inherit other common dependencies from raft

    [FEA] Remove FAISS dependency, inherit other common dependencies from raft

    Originally this PR was about building FAISS shared libs via CPM, but @rlratzel mentioned cuGraph doesn't need FAISS anymore, and it'd be better if we remove it.

    If we need FAISS again in the future, we can add faiss::faiss back to target_link_libraries without needing extra CPM configuration as it will be available via raft.

    Edit: Removed more dependencies that we inherit from raft and/or rmm. Depends on https://github.com/rapidsai/raft/pull/345.

    Edit 2: I think the CUDA 11.0 thrust issue will be solved by https://github.com/rapidsai/rapids-cmake/pull/98

  • cuGraph Readme pages and Documentation API structure refactoring

    cuGraph Readme pages and Documentation API structure refactoring

    Refactoring the API and adding new landing pages for each cuGraph component

    Please just go to: https://github.com/acostadon/cugraph/tree/README_issue_2663 to visualize the changes

    closes #2663

  • Design an approach for vertex and edge masking

    Design an approach for vertex and edge masking

    Starts work for EPIC #2104

    We need a design for how we are going to handle vertex and edge masking. The design should sketch out:

    • Our implementation strategy
    • Define the API for masking
    • Define how the primitives will be adapted to support this feature
    • Define a roadmap for implementation (we
  • [ENH] Refactor spectral clustering and transfer the backend to RAFT

    [ENH] Refactor spectral clustering and transfer the backend to RAFT

    Motivation Spectral clustering is used by cuML and should be in RAFT to avoid a circular dependency. A lot of the backend building blocks can be used independently ( like lanczos solver and kmeans). However, this is a legacy code from nvgraph that can't be transferred as is.

    The following items should be taken care of :

    • [ ] Decouple from nvgraph structures, errors etc. These should not be transferred to RAFT.
    • [ ] Refactor to remove deprecated cusparse calls (next CUDA version support)
    • [ ] Use RAFT's cusparse wrapper
    • [ ] Drop the sparse matrix class.
    • [ ] Transfer code to RAFT
    • [ ] Make it deterministic.
    • [ ] adjust cugraph code to get Spectral Clustering building blocks from RAFT (cython and C++ API remain in cuGraph).
    • [ ] adjust cuML code to get Spectral Clustering from RAFT in UMAP.
  • [REVIEW] Pattern accelerator based implementation of PageRank, Katz Centrality, BFS, & SSSP

    [REVIEW] Pattern accelerator based implementation of PageRank, Katz Centrality, BFS, & SSSP

    OK, I will try to merge this and plan to address multi-GPU extensions & performance tuning in separate PRs.

    This PR is already very large and also there are multiple works dependent on this, so I think this works better (and this code is not linked to any python user code yet, so there isn't much risk in premature merging).

    This API aims to achieve

    1. thrust-like API for graph algorithms
    2. Abstract out implementation issues in different target systems (Single GPU, multi-GPU, ...) inside the pattern accelerator API, Graph, and Handle; Same analytics code will be used for different target systems.
    3. Minimize redundancy in cuGraph codebase and better enforce consistency.
  • [REVIEW] Refactored Graph Class with RAII

    [REVIEW] Refactored Graph Class with RAII

    ~In this proposal I am adding a typed wrapper around rmm device buffer which is movable and returns a pointer that can be used internally by algorithms.~

    ~The other addition is a returnable graph object that owns data abstracted by this typed wrapper.~

    A returnable graph object that wraps rmm device_buffer objects. The pointers to these buffers are returned by the objects. They also have a release function so that the internal contents can be moved by the python layer.

  • [BUG] ECG 0.12 CUDA error iinvalid value

    [BUG] ECG 0.12 CUDA error iinvalid value

    Describe the bug cannot execute ECG example

    To Reproduce https://docs.rapids.ai/api/cugraph/stable/ for ECG has the following example:

    import cudf
    import cugraph
    
    M = cudf.read_csv('path/to/karate.csv',
                          delimiter = ' ',
                          dtype=['int32', 'int32', 'float32'],
                          header=None)
    sources = cudf.Series(M['0'])
    destinations = cudf.Series(M['1'])
    values = cudf.Series(M['2'])
    G = cugraph.Graph()
    G.add_edge_list(sources, destinations, values)
    parts = cugraph.ecg(G)
    

    fails with:

    /home/at/heilerg/.conda/envs/phd2-graph-econ/lib/python3.7/site-packages/cugraph/structure/graph.py:191: UserWarning: add_edge_list will be deprecated in next release. Use from_cudf_edgelist instead
      Use from_cudf_edgelist instead')
    
    ---------------------------------------------------------------------------
    RuntimeError                              Traceback (most recent call last)
    <ipython-input-2-4b59f30e695a> in <module>
         12 G = cugraph.Graph()
         13 G.add_edge_list(sources, destinations, values)
    ---> 14 parts = cugraph.ecg(G)
    
    site-packages/cugraph/community/ecg.py in ecg(input_graph, min_weight, ensemble_size)
         61     """
         62 
    ---> 63     parts = ecg_wrapper.ecg(input_graph, min_weight, ensemble_size)
         64 
         65     return parts
    
    cugraph/community/ecg_wrapper.pyx in cugraph.community.ecg_wrapper.ecg()
    
    RuntimeError: CUDA error encountered at: /conda/conda-bld/libcugraph_1580841414814/work/cpp/src/converters/COOtoCSR.cuh:163: 1 cudaErrorInvalidValue invalid argument
    
    
    

    Expected behavior ECG to be comuted successfully

    Desktop (please complete the following information):

    • OS: [e.g. iOS] centos 7
    • Browser [e.g. chrome, safari] firefox developer edition
    • Version [e.g. 22] cuGraph: 0.12 (latest stable)
  • Add `is_multigraph` to PG and change `has_duplicate_edges` to use types

    Add `is_multigraph` to PG and change `has_duplicate_edges` to use types

    Closes #2591

    Also, change default graph type to MultiGraph. allow_multi_edges keyword was renamed to do_expensive_check, which only occurs when the output graph is not a MultiGraph. Should do_expensive_check default to True or False?

  • [FEA] neighbor sampling in COO/CSR format

    [FEA] neighbor sampling in COO/CSR format

    This pull request adds neighborhood sampling, as needed by GNN frameworks (DGL, PyTorch-Geometric).

    Since I did not hear back on most of the other issues that need to be addressed before this, I am continuing with my plan of first opening a PR with just the API. Once we agree on the final API, and once a minimal version of cugraph-ops is integrated, we can add the implementation of this API.

    In particular, for now I am suggesting that the sampling type is exposed in the public API (it does not exist yet in cugraph-ops since that has not been integrated yet). This must be decided ahead of sampling for best performance (either by the end user or some automatic heuristic on the original graph), which is why it makes sense to have as a separate parameter for this API.

    EDIT: link to issue https://github.com/rapidsai/cugraph/issues/1978

  • initial creation of libcugraph_etl.so

    initial creation of libcugraph_etl.so

    The new renumbering implementation will require C++ integration directly with cudf. In order to facilitate that, but also support our customers that won't need cudf, this PR will create a separate library (libcugraph_etl.so) which will ultimately link with libcudf.so and contain the ETL portions of cugraph that require cudf features.

    This way our other libcugraph customers that don't need to reference the new library will not need to install all of the cudf dependencies.

    To seed this, the PR also includes a proposed API for the new renumbering capability.

  • Build CUDA 11.8 and Python 3.10 Packages

    Build CUDA 11.8 and Python 3.10 Packages

    This PR updates cugraph to build against branch cuda-118 of the shared-action-workflow repository.

    That branch contains updates for CUDA 11.8 and Python 3.10 packages.

    It also includes some minor file renames.

    Depends on https://github.com/rapidsai/raft/pull/1120

  • Add pagerank to cugraph-service

    Add pagerank to cugraph-service

    The current pagerank API is just a placeholder to indicate that additional APIs beyond just those for sampling are intended.

    This issue tracks the progress of adding a cugraph-service pagerank API.

  • Debugging test failure in CI

    Debugging test failure in CI

    There's a failure that I can't reproduce locally. Python tests are randomly failing at the end of the run during tear down.

    Trying to isolate what might be going on.

  • [FEA]: MG PropertyGraph backed by Dask DataFrames

    [FEA]: MG PropertyGraph backed by Dask DataFrames

    Is this a new feature, an improvement, or a change to existing functionality?

    New Feature

    How would you describe the priority of this feature request

    Medium

    Please provide a clear description of problem this feature solves

    Allow MG PropertyGraph to store data with dask.dataframe instead of with dask_cudf. This allows data to be stored in host memory and mirrors functionality in PropertyGraph.

    Describe your ideal solution

    Ideally this can be composed nicely with #2424 where some types can be Dask DataFrames and some can be cudf DataFrames. But, I think this issue can be done before #2424.

    Describe any alternatives you have considered

    No response

    Additional context

    No response

    Code of Conduct

    • [X] I agree to follow cuGraph's Code of Conduct
    • [X] I have searched the open feature requests and have found no duplicates for this feature request
  • [BUG]: edge betweenness centrailty results contain incorrect values

    [BUG]: edge betweenness centrailty results contain incorrect values

    Version

    early 2022

    Which installation method(s) does this occur on?

    No response

    Describe the bug.

    ~70K of ~170K edges getting NA values for undirected edge betweenness centrality... these are on what we expect to be some of the highest-centrality.

    Minimum reproducible example

    No response

    Relevant log output

    No response

    Environment details

    No response

    Other/Misc.

    cc @lmeyerov

    Code of Conduct

    • [X] I agree to follow cuGraph's Code of Conduct
    • [X] I have searched the open bugs and have found no duplicates for this bug report
  • [QST]: Saving cugraph object

    [QST]: Saving cugraph object

    What is your question?

    Hi,

    I was wondering if anyone knows how to save a Graph object after it's been created? I'm creating a large DiGraph (1.7 billion edges), so I would preferably like to save the output once it has been created. Is this possible? I looked into pickle and joblib but got errors "cannot pickle 'socket' object" and "TypeError: cannot pickle 'TaskStepMethWrapper' object", respectively

    Code of Conduct

    • [X] I agree to follow cuGraph's Code of Conduct
    • [X] I have searched the open issues and have found no duplicates for this question
cuSignal - RAPIDS Signal Processing Library
cuSignal - RAPIDS Signal Processing Library

cuSignal The RAPIDS cuSignal project leverages CuPy, Numba, and the RAPIDS ecosystem for GPU accelerated signal processing. In some cases, cuSignal is

Dec 30, 2022
BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.
BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.

A lightweight, GPU accelerated, SQL engine built on the RAPIDS.ai ecosystem. Get Started on app.blazingsql.com Getting Started | Documentation | Examp

Jan 2, 2023
A NumPy-compatible array library accelerated by CUDA
A NumPy-compatible array library accelerated by CUDA

CuPy : A NumPy-compatible array library accelerated by CUDA Website | Docs | Install Guide | Tutorial | Examples | API Reference | Forum CuPy is an im

Jan 5, 2023
ArrayFire: a general purpose GPU library.
ArrayFire: a general purpose GPU library.

ArrayFire is a general-purpose library that simplifies the process of developing software that targets parallel and massively-parallel architectures i

Dec 29, 2022
cuDF - GPU DataFrame Library
cuDF - GPU DataFrame Library

cuDF - GPU DataFrames NOTE: For the latest stable README.md ensure you are on the main branch. Resources cuDF Reference Documentation: Python API refe

Jan 8, 2023
Python 3 Bindings for NVML library. Get NVIDIA GPU status inside your program.
Python 3 Bindings for NVML library. Get NVIDIA GPU status inside your program.

py3nvml Documentation also available at readthedocs. Python 3 compatible bindings to the NVIDIA Management Library. Can be used to query the state of

Jan 4, 2023
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

NVIDIA DALI The NVIDIA Data Loading Library (DALI) is a library for data loading and pre-processing to accelerate deep learning applications. It provi

Jan 8, 2023
Python 3 Bindings for the NVIDIA Management Library

====== pyNVML ====== *** Patched to support Python 3 (and Python 2) *** ------------------------------------------------ Python bindings to the NVID

Jan 1, 2023
Library for faster pinned CPU <-> GPU transfer in Pytorch
Library for faster pinned CPU <-> GPU transfer in Pytorch

SpeedTorch Faster pinned CPU tensor <-> GPU Pytorch variabe transfer and GPU tensor <-> GPU Pytorch variable transfer, in certain cases. Update 9-29-1

Dec 19, 2022
Jan 7, 2023
cuML - RAPIDS Machine Learning Library
cuML - RAPIDS Machine Learning Library

cuML - GPU Machine Learning Algorithms cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions t

Dec 28, 2022
cuML - RAPIDS Machine Learning Library
cuML - RAPIDS Machine Learning Library

cuML - GPU Machine Learning Algorithms cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions t

Jan 4, 2023
cuSignal - RAPIDS Signal Processing Library
cuSignal - RAPIDS Signal Processing Library

cuSignal The RAPIDS cuSignal project leverages CuPy, Numba, and the RAPIDS ecosystem for GPU accelerated signal processing. In some cases, cuSignal is

Dec 30, 2022
Analytics service that is part of iter8. Robust analytics and control to unleash cloud-native continuous experimentation.

iter8-analytics iter8 enables statistically robust continuous experimentation of microservices in your CI/CD pipelines. For in-depth information about

Oct 14, 2021
BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.
BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.

A lightweight, GPU accelerated, SQL engine built on the RAPIDS.ai ecosystem. Get Started on app.blazingsql.com Getting Started | Documentation | Examp

Jan 2, 2023
A static analysis library for computing graph representations of Python programs suitable for use with graph neural networks.

python_graphs This package is for computing graph representations of Python programs for machine learning applications. It includes the following modu

Dec 29, 2022
This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

Jan 2, 2023
[UNMAINTAINED] Automated machine learning for analytics & production

auto_ml Automated machine learning for production and analytics Installation pip install auto_ml Getting started from auto_ml import Predictor from au

Jan 2, 2023
Apache Spark - A unified analytics engine for large-scale data processing

Apache Spark Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an op

Jan 4, 2023