Merge remote-tracking branch 'upstream/branch-0.15' into opg_csr_page…

…rank
afender · Jun 12, 2020 · 8629727 · 8629727
2 parents 9d95922 + 5b0b911
commit 8629727
Show file tree

Hide file tree

Showing 87 changed files with 3,350 additions and 1,892 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,15 +1,23 @@
 # cuGraph 0.15.0 (Date TBD)
 
 ## New Features
+- PR #937 Add wrapper for gunrock HITS algorithm
 
 ## Improvements
+- PR #898 Add Edge Betweenness Centrality, and endpoints to BC
 - PR #913 Eliminate `rmm.device_array` usage
 - PR #903 Add short commit hash to conda package
 - PR #920 modify bfs test, update graph number_of_edges, update storage of transposedAdjList in Graph
+- PR #933 Update opg_degree to use raft, add python tests
+- PR #930 rename test_utils.h to utilities/test_utils.hpp and remove thrust dependency
+- PR #934 Update conda dev environment.yml dependencies to 0.15
+- PR #941 Regression python/cudf fix
 
 ## Bug Fixes
+- PR #936 Update Force Atlas 2 doc and wrapper
+- PR #938 Quote conda installs to avoid bash interpretation
 
-# cuGraph 0.14.0 (Date TBD)
+# cuGraph 0.14.0 (03 Jun 2020)
 
 ## New Features
 - PR #756 Add Force Atlas 2 layout
@@ -52,6 +60,7 @@
 - PR #874 Update setup.py to use custom clean command
 - PR #876 Add BFS C++ tests
 - PR #878 Updated build script
+- PR #887 Updates test to common datasets
 - PR #879 Add docs build script to repository
 - PR #880 Remove remaining gdf_column references
 - PR #882 Add Force Atlas 2 to benchmarks
@@ -62,6 +71,7 @@
 - PR #906 Update Louvain notebook
 
 ## Bug Fixes
+- PR #927 Update scikit learn dependency
 - PR #916 Fix CI error on Force Atlas 2 test
 - PR #763 Update RAPIDS conda dependencies to v0.14
 - PR #795 Fix some documentation
@@ -79,6 +89,8 @@
 - PR #907 Fix bfs directed missing vertices
 - PR #911 Env and changelog update
 - PR #923 Updated pagerank with @afender 's temp fix for double-free crash
+- PR #928 Fix scikit learn test install to work with libgcc-ng 7.3
+- PR 935 Merge 
 
 # cuGraph 0.13.0 (31 Mar 2020)
 

diff --git a/README.md b/README.md
@@ -2,42 +2,43 @@
 
 [![Build Status](https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci/job/cugraph/job/branches/job/cugraph-branch-pipeline/badge/icon)](https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci/job/cugraph/job/branches/job/cugraph-branch-pipeline/)
 
-The [RAPIDS](https://rapids.ai) cuGraph library is a collection of GPU accelerated graph algorithms that process data found in [GPU DataFrames](https://github.com/rapidsai/cudf).  The vision of cuGraph is _to make graph analysis ubiquitous to the point that users just think in terms of analysis and not technologies or frameworks_.  To realize that vision, cuGraph operators, at the Python layer, on GPU DataFrames, allowing for seamless passing of data between ETL tasks in [cuDF](https://github.com/rapidsai/cudf) and machine learning tasks in [cuML](https://github.com/rapidsai/cuml).  Data scientist familiar with Python will quickly pick up how cuGraph integrates with the Pandas-like API of cuDF.  Likewise, user familiar with NetworkX will quickly reconnize the NetworkX-like API provided in cuGraph, with the goal being to allow existing code to be ported with minimal effort into RAPIDS.  For users familiar with C++/CUDA and graph structures, a C++ API is also provided.  However, there is less type and structure checking at the C++ layer.
+The [RAPIDS](https://rapids.ai) cuGraph library is a collection of GPU accelerated graph algorithms that process data found in [GPU DataFrames](https://github.com/rapidsai/cudf).  The vision of cuGraph is _to make graph analysis ubiquitous to the point that users just think in terms of analysis and not technologies or frameworks_.  To realize that vision, cuGraph operates, at the Python layer, on GPU DataFrames, allowing for seamless passing of data between ETL tasks in [cuDF](https://github.com/rapidsai/cudf) and machine learning tasks in [cuML](https://github.com/rapidsai/cuml).  Data scientists familiar with Python will quickly pick up how cuGraph integrates with the Pandas-like API of cuDF.  Likewise, users familiar with NetworkX will quickly recognize the NetworkX-like API provided in cuGraph, with the goal to allow existing code to be ported with minimal effort into RAPIDS.  For users familiar with C++/CUDA and graph structures, a C++ API is also provided.  However, there is less type and structure checking at the C++ layer.
 
  For more project details, see [rapids.ai](https://rapids.ai/).
 
 **NOTE:** For the latest stable [README.md](https://github.com/rapidsai/cudf/blob/master/README.md) ensure you are on the latest branch.
 
 
 
-```markdown
+```python
 import cugraph
 
 # read data into a cuDF DataFrame using read_csv
-gdf = cudf.read_csv("graph_data.csv", names=["src", "dst"], dtype=["int32", "int32"] )
+cu_M = cudf.read_csv("graph_data.csv", names=["src", "dst"], dtype=["int32", "int32"])
 
 # We now have data as edge pairs
-# create a Graph using the source (src) and destination (dst) vertex pairs the GDF  
+# create a Graph using the source (src) and destination (dst) vertex pairs
 G = cugraph.Graph()
-G.from_cudf_edgelist(gdf, source='src', destination='dst')
+G.from_cudf_edgelist(cu_M, source='src', destination='dst')
 
 # Let's now get the PageRank score of each vertex by calling cugraph.pagerank
-gdf_page = cugraph.pagerank(G)
+df_page = cugraph.pagerank(G)
 
 # Let's look at the PageRank Score (only do this on small graphs)
-for i in range(len(gdf_page)):
-	print("vertex " + str(gdf_page['vertex'][i]) + 
-		" PageRank is " + str(gdf_page['pagerank'][i]))  
+for i in range(len(df_page)):
+	print("vertex " + str(df_page['vertex'].iloc[i]) +
+		" PageRank is " + str(df_page['pagerank'].iloc[i]))
 ```
 
 
 ## Supported Algorithms
 
-| Category     | Algorithm                              | Sacle        |  Notes
+| Category     | Algorithm                              | Scale        |  Notes
 | ------------ | -------------------------------------- | ------------ | ------------------- |
 | Centrality   |                                        |              |                     |
 |              | Katz                                   | Single-GPU   |                     |
 |              | Betweenness Centrality                 | Single-GPU   |                     |
+|              | Edge Betweenness Centrality            | Single-GPU   |                     |
 | Community    |                                        |              |                     |
 |              | Louvain                                | Single-GPU   |                     |
 |              | Ensemble Clustering for Graphs         | Single-GPU   |                     |
@@ -55,7 +56,7 @@ for i in range(len(gdf_page)):
 | Layout       |                                        |              |                     |
 |              | Force Atlas 2                          | Single-GPU   |                     |
 | Link Analysis|                                        |              |                     |
-|              | Pagerank                               | Single-GPU   |  Multi-GPU on DGX avaible  |
+|              | Pagerank                               | Single-GPU   |                     |
 |              | Personal Pagerank                      | Single-GPU   |                     |
 | Link Prediction |                                     |              |                     |
 |              | Jacard Similarity                      | Single-GPU   |                     |
@@ -84,18 +85,18 @@ The current version of cuGraph has some limitations:
 
 cuGraph provides the renumber function to mitigate this problem. Input vertex IDs for the renumber function can be any type, can be non-contiguous, and can start from an arbitrary number. The renumber function maps the provided input vertex IDs to 32-bit contiguous integers starting from 0. cuGraph still requires the renumbered vertex IDs to be representable in 32-bit integers. These limitations are being addressed and will be fixed soon.
 
-cuGraph provides an auto-renumbering feature, enabled by default, during Graph creating.  Renumbered vertices are automaticaly un-renumbered.
+cuGraph provides an auto-renumbering feature, enabled by default, during Graph creating.  Renumbered vertices are automatically un-renumbered.
 
-cuGraph is constantly being updatred and improved. Please see the [Transition Guide](TRANSITIONGUIDE.md) if errors are encountered with newer versions
+cuGraph is constantly being updated and improved. Please see the [Transition Guide](TRANSITIONGUIDE.md) if errors are encountered with newer versions
 
 ## Graph Sizes and GPU Memory Size
-The amount of memory required is dependent on the graph structure and the analytics being executed.  As a simple rule of thumb, the amount of GPU memory should be about twice the size of the data size.  That gives overhead for the CSV reader and other transform functions.  There are ways around the rule but using smaller data chunks.  
+The amount of memory required is dependent on the graph structure and the analytics being executed.  As a simple rule of thumb, the amount of GPU memory should be about twice the size of the data size.  That gives overhead for the CSV reader and other transform functions.  There are ways around the rule but using smaller data chunks.
 
 
-|       Size        | Recomended GPU Memory |
-|-------------------|-----------------------|
-| 500 million edges	|  32GB    |
-| 250 million edges |	16 GB  |
+|       Size        | Recommended GPU Memory |
+|-------------------|------------------------|
+| 500 million edges |  32GB                  |
+| 250 million edges |  16 GB                 |
 
 
 
@@ -153,7 +154,7 @@ Python API documentation can be generated from [docs](docs) directory.
 
 ## <div align="left"><img src="img/rapids_logo.png" width="265px"/></div> Open GPU Data Science
 
-The RAPIDS suite of open source software libraries aim to enable execution of end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposing that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.
+The RAPIDS suite of open source software libraries aims to enable execution of end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization but exposing that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.
 
 <p align="center"><img src="img/rapids_arrow.png" width="80%"/></p>
 

diff --git a/benchmarks/bench_algos.py b/benchmarks/bench_algos.py
@@ -233,3 +233,9 @@ def bench_graph_degrees(gpubenchmark, anyGraphWithAdjListComputed):
 def bench_betweenness_centrality(gpubenchmark, anyGraphWithAdjListComputed):
     gpubenchmark(cugraph.betweenness_centrality,
                  anyGraphWithAdjListComputed, k=10, seed=123)
+
+
+def bench_edge_betweenness_centrality(gpubenchmark,
+                                      anyGraphWithAdjListComputed):
+    gpubenchmark(cugraph.edge_betweenness_centrality,
+                 anyGraphWithAdjListComputed, k=10, seed=123)
diff --git a/ci/gpu/build.sh b/ci/gpu/build.sh
@@ -57,20 +57,20 @@ source activate gdf
 
 logger "conda install required packages"
 conda install -c nvidia -c rapidsai -c rapidsai-nightly -c conda-forge -c defaults \
-      cudf=${MINOR_VERSION} \
-      rmm=${MINOR_VERSION} \
-      networkx>=2.3 \
+      "cudf=${MINOR_VERSION}" \
+      "rmm=${MINOR_VERSION}" \
+      "networkx>=2.3" \
       python-louvain \
-      cudatoolkit=$CUDA_REL \
-      dask>=2.12.0 \
-      distributed>=2.12.0 \
-      dask-cudf=${MINOR_VERSION} \
-      dask-cuda=${MINOR_VERSION} \
-      scikit-learn>=0.21 \
-      nccl>=2.5 \
-      ucx-py=${MINOR_VERSION} \
+      "cudatoolkit=$CUDA_REL" \
+      "dask>=2.12.0" \
+      "distributed>=2.12.0" \
+      "dask-cudf=${MINOR_VERSION}" \
+      "dask-cuda=${MINOR_VERSION}" \
+      "scikit-learn>=0.21" \
+      "nccl>=2.5" \
+      "ucx-py=${MINOR_VERSION}" \
       libcypher-parser \
-      ipython=7.3* \
+      "ipython=7.3*" \
       jupyterlab
 
 # Install the master version of dask and distributed

diff --git a/ci/release/update-version.sh b/ci/release/update-version.sh
@@ -17,6 +17,7 @@ CURRENT_TAG=`git tag | grep -xE 'v[0-9\.]+' | sort --version-sort | tail -n 1 |
 CURRENT_MAJOR=`echo $CURRENT_TAG | awk '{split($0, a, "."); print a[1]}'`
 CURRENT_MINOR=`echo $CURRENT_TAG | awk '{split($0, a, "."); print a[2]}'`
 CURRENT_PATCH=`echo $CURRENT_TAG | awk '{split($0, a, "."); print a[3]}'`
+CURRENT_SHORT_TAG=${CURRENT_MAJOR}.${CURRENT_MINOR}
 NEXT_MAJOR=$((CURRENT_MAJOR + 1))
 NEXT_MINOR=$((CURRENT_MINOR + 1))
 NEXT_PATCH=$((CURRENT_PATCH + 1))
@@ -51,3 +52,11 @@ sed_runner 's/'"CUGRAPH VERSION .* LANGUAGES C CXX CUDA)"'/'"CUGRAPH VERSION ${N
 # RTD update
 sed_runner 's/version = .*/version = '"'${NEXT_SHORT_TAG}'"'/g' docs/source/conf.py
 sed_runner 's/release = .*/release = '"'${NEXT_FULL_TAG}'"'/g' docs/source/conf.py
+
+for FILE in conda/environments/*.yml; do
+   sed_runner "s/cudf=${CURRENT_SHORT_TAG}/cudf=${NEXT_SHORT_TAG}/g" ${FILE};
+   sed_runner "s/rmm=${CURRENT_SHORT_TAG}/rmm=${NEXT_SHORT_TAG}/g" ${FILE};
+   sed_runner "s/dask-cuda=${CURRENT_SHORT_TAG}/dask-cuda=${NEXT_SHORT_TAG}/g" ${FILE};
+   sed_runner "s/dask-cudf=${CURRENT_SHORT_TAG}/dask-cudf=${NEXT_SHORT_TAG}/g" ${FILE};
+   sed_runner "s/ucx-py=${CURRENT_SHORT_TAG}/ucx-py=${NEXT_SHORT_TAG}/g" ${FILE};
+done
diff --git a/conda/environments/cugraph_dev_cuda10.0.yml b/conda/environments/cugraph_dev_cuda10.0.yml
@@ -5,15 +5,14 @@ channels:
 - rapidsai-nightly
 - conda-forge
 dependencies:
-- cudf=0.14.*
-- nvstrings=0.14.*
-- rmm=0.14.*
+- cudf=0.15.*
+- rmm=0.15.*
 - dask>=2.12.0
 - distributed>=2.12.0
-- dask-cuda=0.14*
-- dask-cudf=0.14*
+- dask-cuda=0.15*
+- dask-cudf=0.15*
 - nccl>=2.5
-- ucx-py=0.14*
+- ucx-py=0.15*
 - scipy
 - networkx
 - python-louvain
@@ -24,7 +23,7 @@ dependencies:
 - boost
 - cython>=0.29,<0.30
 - pytest
-- scikit-learn>=0.21
+- scikit-learn>=0.23.1
 - sphinx
 - sphinx_rtd_theme
 - sphinxcontrib-websupport

diff --git a/conda/environments/cugraph_dev_cuda10.1.yml b/conda/environments/cugraph_dev_cuda10.1.yml
@@ -5,15 +5,14 @@ channels:
 - rapidsai-nightly
 - conda-forge
 dependencies:
-- cudf=0.14.*
-- nvstrings=0.14.*
-- rmm=0.14.*
+- cudf=0.15.*
+- rmm=0.15.*
 - dask>=2.12.0
 - distributed>=2.12.0
-- dask-cuda=0.14*
-- dask-cudf=0.14*
+- dask-cuda=0.15*
+- dask-cudf=0.15*
 - nccl>=2.5
-- ucx-py=0.14*
+- ucx-py=0.15*
 - scipy
 - networkx
 - python-louvain
@@ -24,7 +23,7 @@ dependencies:
 - boost
 - cython>=0.29,<0.30
 - pytest
-- scikit-learn>=0.21
+- scikit-learn>=0.23.1
 - sphinx
 - sphinx_rtd_theme
 - sphinxcontrib-websupport

diff --git a/conda/environments/cugraph_dev_cuda10.2.yml b/conda/environments/cugraph_dev_cuda10.2.yml
@@ -5,15 +5,14 @@ channels:
 - rapidsai-nightly
 - conda-forge
 dependencies:
-- cudf=0.14.*
-- nvstrings=0.14.*
-- rmm=0.14.*
+- cudf=0.15.*
+- rmm=0.15.*
 - dask>=2.12.0
 - distributed>=2.12.0
-- dask-cuda=0.14*
-- dask-cudf=0.14*
+- dask-cuda=0.15*
+- dask-cudf=0.15*
 - nccl>=2.5
-- ucx-py=0.14*
+- ucx-py=0.15*
 - scipy
 - networkx
 - python-louvain
@@ -24,7 +23,7 @@ dependencies:
 - boost
 - cython>=0.29,<0.30
 - pytest
-- scikit-learn>=0.21
+- scikit-learn>=0.23.1
 - sphinx
 - sphinx_rtd_theme
 - sphinxcontrib-websupport

diff --git a/cpp/CMakeLists.txt b/cpp/CMakeLists.txt
@@ -278,14 +278,14 @@ else(DEFINED ENV{RAFT_PATH})
 
   ExternalProject_Add(raft
     GIT_REPOSITORY    https://github.com/rapidsai/raft.git
-    GIT_TAG           e003de27fc4e4a096337f184dddbd7942a68bb5c
+    GIT_TAG           2487eb0c12f374729043baa5448c0d309c921e60
     PREFIX            ${RAFT_DIR}
     CONFIGURE_COMMAND ""
     BUILD_COMMAND     ""
     INSTALL_COMMAND   "")
 
   # Redefining RAFT_DIR so it coincides with the one inferred by env variable.
-  set(RAFT_DIR ${RAFT_DIR}/src/raft/ CACHE STRING "Path to RAFT repo")
+  set(RAFT_DIR "${RAFT_DIR}/src/raft/")
 endif(DEFINED ENV{RAFT_PATH})
 
 
@@ -301,14 +301,14 @@ link_directories(
     "${CMAKE_CUDA_IMPLICIT_LINK_DIRECTORIES}")
 
 add_library(cugraph SHARED
-    src/comms/mpi/comms_mpi.cpp
     src/db/db_object.cu
     src/db/db_parser_integration_test.cu
     src/db/db_operators.cu
     src/utilities/cusparse_helper.cu
     src/utilities/spmv_1D.cu
     src/structure/graph.cu
     src/link_analysis/pagerank.cu
+    src/link_analysis/gunrock_hits.cpp
     src/traversal/bfs.cu
     src/traversal/sssp.cu
     src/link_prediction/jaccard.cu
@@ -378,7 +378,7 @@ target_include_directories(cugraph
 # - link libraries --------------------------------------------------------------------------------
 
 target_link_libraries(cugraph PRIVATE
-    ${RMM_LIBRARY} gunrock ${NVSTRINGS_LIBRARY} cublas cusparse curand cusolver cudart cuda ${LIBCYPHERPARSER_LIBRARY} ${MPI_CXX_LIBRARIES} ${NCCL_LIBRARIES})
+    ${RMM_LIBRARY} gunrock cublas cusparse curand cusolver cudart cuda ${LIBCYPHERPARSER_LIBRARY} ${MPI_CXX_LIBRARIES} ${NCCL_LIBRARIES})
 
 if(OpenMP_CXX_FOUND)
 target_link_libraries(cugraph PRIVATE