Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/branch-0.15' into opg_csr_page…
Browse files Browse the repository at this point in the history
…rank
  • Loading branch information
afender committed Jun 12, 2020
2 parents 9d95922 + 5b0b911 commit 8629727
Show file tree
Hide file tree
Showing 87 changed files with 3,350 additions and 1,892 deletions.
14 changes: 13 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,23 @@
# cuGraph 0.15.0 (Date TBD)

## New Features
- PR #937 Add wrapper for gunrock HITS algorithm

## Improvements
- PR #898 Add Edge Betweenness Centrality, and endpoints to BC
- PR #913 Eliminate `rmm.device_array` usage
- PR #903 Add short commit hash to conda package
- PR #920 modify bfs test, update graph number_of_edges, update storage of transposedAdjList in Graph
- PR #933 Update opg_degree to use raft, add python tests
- PR #930 rename test_utils.h to utilities/test_utils.hpp and remove thrust dependency
- PR #934 Update conda dev environment.yml dependencies to 0.15
- PR #941 Regression python/cudf fix

## Bug Fixes
- PR #936 Update Force Atlas 2 doc and wrapper
- PR #938 Quote conda installs to avoid bash interpretation

# cuGraph 0.14.0 (Date TBD)
# cuGraph 0.14.0 (03 Jun 2020)

## New Features
- PR #756 Add Force Atlas 2 layout
Expand Down Expand Up @@ -52,6 +60,7 @@
- PR #874 Update setup.py to use custom clean command
- PR #876 Add BFS C++ tests
- PR #878 Updated build script
- PR #887 Updates test to common datasets
- PR #879 Add docs build script to repository
- PR #880 Remove remaining gdf_column references
- PR #882 Add Force Atlas 2 to benchmarks
Expand All @@ -62,6 +71,7 @@
- PR #906 Update Louvain notebook

## Bug Fixes
- PR #927 Update scikit learn dependency
- PR #916 Fix CI error on Force Atlas 2 test
- PR #763 Update RAPIDS conda dependencies to v0.14
- PR #795 Fix some documentation
Expand All @@ -79,6 +89,8 @@
- PR #907 Fix bfs directed missing vertices
- PR #911 Env and changelog update
- PR #923 Updated pagerank with @afender 's temp fix for double-free crash
- PR #928 Fix scikit learn test install to work with libgcc-ng 7.3
- PR 935 Merge

# cuGraph 0.13.0 (31 Mar 2020)

Expand Down
39 changes: 20 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,42 +2,43 @@

[![Build Status](https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci/job/cugraph/job/branches/job/cugraph-branch-pipeline/badge/icon)](https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci/job/cugraph/job/branches/job/cugraph-branch-pipeline/)

The [RAPIDS](https://rapids.ai) cuGraph library is a collection of GPU accelerated graph algorithms that process data found in [GPU DataFrames](https://github.com/rapidsai/cudf). The vision of cuGraph is _to make graph analysis ubiquitous to the point that users just think in terms of analysis and not technologies or frameworks_. To realize that vision, cuGraph operators, at the Python layer, on GPU DataFrames, allowing for seamless passing of data between ETL tasks in [cuDF](https://github.com/rapidsai/cudf) and machine learning tasks in [cuML](https://github.com/rapidsai/cuml). Data scientist familiar with Python will quickly pick up how cuGraph integrates with the Pandas-like API of cuDF. Likewise, user familiar with NetworkX will quickly reconnize the NetworkX-like API provided in cuGraph, with the goal being to allow existing code to be ported with minimal effort into RAPIDS. For users familiar with C++/CUDA and graph structures, a C++ API is also provided. However, there is less type and structure checking at the C++ layer.
The [RAPIDS](https://rapids.ai) cuGraph library is a collection of GPU accelerated graph algorithms that process data found in [GPU DataFrames](https://github.com/rapidsai/cudf). The vision of cuGraph is _to make graph analysis ubiquitous to the point that users just think in terms of analysis and not technologies or frameworks_. To realize that vision, cuGraph operates, at the Python layer, on GPU DataFrames, allowing for seamless passing of data between ETL tasks in [cuDF](https://github.com/rapidsai/cudf) and machine learning tasks in [cuML](https://github.com/rapidsai/cuml). Data scientists familiar with Python will quickly pick up how cuGraph integrates with the Pandas-like API of cuDF. Likewise, users familiar with NetworkX will quickly recognize the NetworkX-like API provided in cuGraph, with the goal to allow existing code to be ported with minimal effort into RAPIDS. For users familiar with C++/CUDA and graph structures, a C++ API is also provided. However, there is less type and structure checking at the C++ layer.

For more project details, see [rapids.ai](https://rapids.ai/).

**NOTE:** For the latest stable [README.md](https://github.com/rapidsai/cudf/blob/master/README.md) ensure you are on the latest branch.



```markdown
```python
import cugraph

# read data into a cuDF DataFrame using read_csv
gdf = cudf.read_csv("graph_data.csv", names=["src", "dst"], dtype=["int32", "int32"] )
cu_M = cudf.read_csv("graph_data.csv", names=["src", "dst"], dtype=["int32", "int32"])

# We now have data as edge pairs
# create a Graph using the source (src) and destination (dst) vertex pairs the GDF
# create a Graph using the source (src) and destination (dst) vertex pairs
G = cugraph.Graph()
G.from_cudf_edgelist(gdf, source='src', destination='dst')
G.from_cudf_edgelist(cu_M, source='src', destination='dst')

# Let's now get the PageRank score of each vertex by calling cugraph.pagerank
gdf_page = cugraph.pagerank(G)
df_page = cugraph.pagerank(G)

# Let's look at the PageRank Score (only do this on small graphs)
for i in range(len(gdf_page)):
print("vertex " + str(gdf_page['vertex'][i]) +
" PageRank is " + str(gdf_page['pagerank'][i]))
for i in range(len(df_page)):
print("vertex " + str(df_page['vertex'].iloc[i]) +
" PageRank is " + str(df_page['pagerank'].iloc[i]))
```


## Supported Algorithms

| Category | Algorithm | Sacle | Notes
| Category | Algorithm | Scale | Notes
| ------------ | -------------------------------------- | ------------ | ------------------- |
| Centrality | | | |
| | Katz | Single-GPU | |
| | Betweenness Centrality | Single-GPU | |
| | Edge Betweenness Centrality | Single-GPU | |
| Community | | | |
| | Louvain | Single-GPU | |
| | Ensemble Clustering for Graphs | Single-GPU | |
Expand All @@ -55,7 +56,7 @@ for i in range(len(gdf_page)):
| Layout | | | |
| | Force Atlas 2 | Single-GPU | |
| Link Analysis| | | |
| | Pagerank | Single-GPU | Multi-GPU on DGX avaible |
| | Pagerank | Single-GPU | |
| | Personal Pagerank | Single-GPU | |
| Link Prediction | | | |
| | Jacard Similarity | Single-GPU | |
Expand Down Expand Up @@ -84,18 +85,18 @@ The current version of cuGraph has some limitations:

cuGraph provides the renumber function to mitigate this problem. Input vertex IDs for the renumber function can be any type, can be non-contiguous, and can start from an arbitrary number. The renumber function maps the provided input vertex IDs to 32-bit contiguous integers starting from 0. cuGraph still requires the renumbered vertex IDs to be representable in 32-bit integers. These limitations are being addressed and will be fixed soon.

cuGraph provides an auto-renumbering feature, enabled by default, during Graph creating. Renumbered vertices are automaticaly un-renumbered.
cuGraph provides an auto-renumbering feature, enabled by default, during Graph creating. Renumbered vertices are automatically un-renumbered.

cuGraph is constantly being updatred and improved. Please see the [Transition Guide](TRANSITIONGUIDE.md) if errors are encountered with newer versions
cuGraph is constantly being updated and improved. Please see the [Transition Guide](TRANSITIONGUIDE.md) if errors are encountered with newer versions

## Graph Sizes and GPU Memory Size
The amount of memory required is dependent on the graph structure and the analytics being executed. As a simple rule of thumb, the amount of GPU memory should be about twice the size of the data size. That gives overhead for the CSV reader and other transform functions. There are ways around the rule but using smaller data chunks.
The amount of memory required is dependent on the graph structure and the analytics being executed. As a simple rule of thumb, the amount of GPU memory should be about twice the size of the data size. That gives overhead for the CSV reader and other transform functions. There are ways around the rule but using smaller data chunks.


| Size | Recomended GPU Memory |
|-------------------|-----------------------|
| 500 million edges | 32GB |
| 250 million edges | 16 GB |
| Size | Recommended GPU Memory |
|-------------------|------------------------|
| 500 million edges | 32GB |
| 250 million edges | 16 GB |



Expand Down Expand Up @@ -153,7 +154,7 @@ Python API documentation can be generated from [docs](docs) directory.

## <div align="left"><img src="img/rapids_logo.png" width="265px"/></div> Open GPU Data Science

The RAPIDS suite of open source software libraries aim to enable execution of end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposing that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.
The RAPIDS suite of open source software libraries aims to enable execution of end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization but exposing that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.

<p align="center"><img src="img/rapids_arrow.png" width="80%"/></p>

Expand Down
6 changes: 6 additions & 0 deletions benchmarks/bench_algos.py
Original file line number Diff line number Diff line change
Expand Up @@ -233,3 +233,9 @@ def bench_graph_degrees(gpubenchmark, anyGraphWithAdjListComputed):
def bench_betweenness_centrality(gpubenchmark, anyGraphWithAdjListComputed):
gpubenchmark(cugraph.betweenness_centrality,
anyGraphWithAdjListComputed, k=10, seed=123)


def bench_edge_betweenness_centrality(gpubenchmark,
anyGraphWithAdjListComputed):
gpubenchmark(cugraph.edge_betweenness_centrality,
anyGraphWithAdjListComputed, k=10, seed=123)
24 changes: 12 additions & 12 deletions ci/gpu/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -57,20 +57,20 @@ source activate gdf

logger "conda install required packages"
conda install -c nvidia -c rapidsai -c rapidsai-nightly -c conda-forge -c defaults \
cudf=${MINOR_VERSION} \
rmm=${MINOR_VERSION} \
networkx>=2.3 \
"cudf=${MINOR_VERSION}" \
"rmm=${MINOR_VERSION}" \
"networkx>=2.3" \
python-louvain \
cudatoolkit=$CUDA_REL \
dask>=2.12.0 \
distributed>=2.12.0 \
dask-cudf=${MINOR_VERSION} \
dask-cuda=${MINOR_VERSION} \
scikit-learn>=0.21 \
nccl>=2.5 \
ucx-py=${MINOR_VERSION} \
"cudatoolkit=$CUDA_REL" \
"dask>=2.12.0" \
"distributed>=2.12.0" \
"dask-cudf=${MINOR_VERSION}" \
"dask-cuda=${MINOR_VERSION}" \
"scikit-learn>=0.21" \
"nccl>=2.5" \
"ucx-py=${MINOR_VERSION}" \
libcypher-parser \
ipython=7.3* \
"ipython=7.3*" \
jupyterlab

# Install the master version of dask and distributed
Expand Down
9 changes: 9 additions & 0 deletions ci/release/update-version.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ CURRENT_TAG=`git tag | grep -xE 'v[0-9\.]+' | sort --version-sort | tail -n 1 |
CURRENT_MAJOR=`echo $CURRENT_TAG | awk '{split($0, a, "."); print a[1]}'`
CURRENT_MINOR=`echo $CURRENT_TAG | awk '{split($0, a, "."); print a[2]}'`
CURRENT_PATCH=`echo $CURRENT_TAG | awk '{split($0, a, "."); print a[3]}'`
CURRENT_SHORT_TAG=${CURRENT_MAJOR}.${CURRENT_MINOR}
NEXT_MAJOR=$((CURRENT_MAJOR + 1))
NEXT_MINOR=$((CURRENT_MINOR + 1))
NEXT_PATCH=$((CURRENT_PATCH + 1))
Expand Down Expand Up @@ -51,3 +52,11 @@ sed_runner 's/'"CUGRAPH VERSION .* LANGUAGES C CXX CUDA)"'/'"CUGRAPH VERSION ${N
# RTD update
sed_runner 's/version = .*/version = '"'${NEXT_SHORT_TAG}'"'/g' docs/source/conf.py
sed_runner 's/release = .*/release = '"'${NEXT_FULL_TAG}'"'/g' docs/source/conf.py

for FILE in conda/environments/*.yml; do
sed_runner "s/cudf=${CURRENT_SHORT_TAG}/cudf=${NEXT_SHORT_TAG}/g" ${FILE};
sed_runner "s/rmm=${CURRENT_SHORT_TAG}/rmm=${NEXT_SHORT_TAG}/g" ${FILE};
sed_runner "s/dask-cuda=${CURRENT_SHORT_TAG}/dask-cuda=${NEXT_SHORT_TAG}/g" ${FILE};
sed_runner "s/dask-cudf=${CURRENT_SHORT_TAG}/dask-cudf=${NEXT_SHORT_TAG}/g" ${FILE};
sed_runner "s/ucx-py=${CURRENT_SHORT_TAG}/ucx-py=${NEXT_SHORT_TAG}/g" ${FILE};
done
13 changes: 6 additions & 7 deletions conda/environments/cugraph_dev_cuda10.0.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,14 @@ channels:
- rapidsai-nightly
- conda-forge
dependencies:
- cudf=0.14.*
- nvstrings=0.14.*
- rmm=0.14.*
- cudf=0.15.*
- rmm=0.15.*
- dask>=2.12.0
- distributed>=2.12.0
- dask-cuda=0.14*
- dask-cudf=0.14*
- dask-cuda=0.15*
- dask-cudf=0.15*
- nccl>=2.5
- ucx-py=0.14*
- ucx-py=0.15*
- scipy
- networkx
- python-louvain
Expand All @@ -24,7 +23,7 @@ dependencies:
- boost
- cython>=0.29,<0.30
- pytest
- scikit-learn>=0.21
- scikit-learn>=0.23.1
- sphinx
- sphinx_rtd_theme
- sphinxcontrib-websupport
Expand Down
13 changes: 6 additions & 7 deletions conda/environments/cugraph_dev_cuda10.1.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,14 @@ channels:
- rapidsai-nightly
- conda-forge
dependencies:
- cudf=0.14.*
- nvstrings=0.14.*
- rmm=0.14.*
- cudf=0.15.*
- rmm=0.15.*
- dask>=2.12.0
- distributed>=2.12.0
- dask-cuda=0.14*
- dask-cudf=0.14*
- dask-cuda=0.15*
- dask-cudf=0.15*
- nccl>=2.5
- ucx-py=0.14*
- ucx-py=0.15*
- scipy
- networkx
- python-louvain
Expand All @@ -24,7 +23,7 @@ dependencies:
- boost
- cython>=0.29,<0.30
- pytest
- scikit-learn>=0.21
- scikit-learn>=0.23.1
- sphinx
- sphinx_rtd_theme
- sphinxcontrib-websupport
Expand Down
13 changes: 6 additions & 7 deletions conda/environments/cugraph_dev_cuda10.2.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,14 @@ channels:
- rapidsai-nightly
- conda-forge
dependencies:
- cudf=0.14.*
- nvstrings=0.14.*
- rmm=0.14.*
- cudf=0.15.*
- rmm=0.15.*
- dask>=2.12.0
- distributed>=2.12.0
- dask-cuda=0.14*
- dask-cudf=0.14*
- dask-cuda=0.15*
- dask-cudf=0.15*
- nccl>=2.5
- ucx-py=0.14*
- ucx-py=0.15*
- scipy
- networkx
- python-louvain
Expand All @@ -24,7 +23,7 @@ dependencies:
- boost
- cython>=0.29,<0.30
- pytest
- scikit-learn>=0.21
- scikit-learn>=0.23.1
- sphinx
- sphinx_rtd_theme
- sphinxcontrib-websupport
Expand Down
8 changes: 4 additions & 4 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -278,14 +278,14 @@ else(DEFINED ENV{RAFT_PATH})

ExternalProject_Add(raft
GIT_REPOSITORY https://github.com/rapidsai/raft.git
GIT_TAG e003de27fc4e4a096337f184dddbd7942a68bb5c
GIT_TAG 2487eb0c12f374729043baa5448c0d309c921e60
PREFIX ${RAFT_DIR}
CONFIGURE_COMMAND ""
BUILD_COMMAND ""
INSTALL_COMMAND "")

# Redefining RAFT_DIR so it coincides with the one inferred by env variable.
set(RAFT_DIR ${RAFT_DIR}/src/raft/ CACHE STRING "Path to RAFT repo")
set(RAFT_DIR "${RAFT_DIR}/src/raft/")
endif(DEFINED ENV{RAFT_PATH})


Expand All @@ -301,14 +301,14 @@ link_directories(
"${CMAKE_CUDA_IMPLICIT_LINK_DIRECTORIES}")

add_library(cugraph SHARED
src/comms/mpi/comms_mpi.cpp
src/db/db_object.cu
src/db/db_parser_integration_test.cu
src/db/db_operators.cu
src/utilities/cusparse_helper.cu
src/utilities/spmv_1D.cu
src/structure/graph.cu
src/link_analysis/pagerank.cu
src/link_analysis/gunrock_hits.cpp
src/traversal/bfs.cu
src/traversal/sssp.cu
src/link_prediction/jaccard.cu
Expand Down Expand Up @@ -378,7 +378,7 @@ target_include_directories(cugraph
# - link libraries --------------------------------------------------------------------------------

target_link_libraries(cugraph PRIVATE
${RMM_LIBRARY} gunrock ${NVSTRINGS_LIBRARY} cublas cusparse curand cusolver cudart cuda ${LIBCYPHERPARSER_LIBRARY} ${MPI_CXX_LIBRARIES} ${NCCL_LIBRARIES})
${RMM_LIBRARY} gunrock cublas cusparse curand cusolver cudart cuda ${LIBCYPHERPARSER_LIBRARY} ${MPI_CXX_LIBRARIES} ${NCCL_LIBRARIES})

if(OpenMP_CXX_FOUND)
target_link_libraries(cugraph PRIVATE
Expand Down
Loading

0 comments on commit 8629727

Please sign in to comment.