Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example to build custom application and link to libcudf #7671

Merged
merged 33 commits into from
Jun 18, 2021
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
c128957
Migrating from link_to_libcudf repo
isVoid Mar 22, 2021
fa688f1
Doc, dockerfile revise
isVoid Mar 24, 2021
7ef5fe0
Multiple library usage, code style fixes.
isVoid Mar 24, 2021
abf20ab
several code style updates after merge
isVoid Mar 24, 2021
43ebd65
further trim down codes
isVoid Mar 24, 2021
eed062d
Just using basic rmm mr
isVoid Mar 24, 2021
22d8bcb
remove unused imports
isVoid Mar 24, 2021
a9a3bb5
simpler cmakefile
isVoid Mar 24, 2021
d3b789c
removed ccmake dependency
isVoid Mar 24, 2021
d956daa
Reorder includes and stale line remove
isVoid Mar 24, 2021
9957a7f
Remove cuda mr init
isVoid Mar 24, 2021
6f0b9ce
simpler dockerfile, readme update
isVoid Mar 26, 2021
8f6a98e
Organized into basic example
isVoid Mar 26, 2021
b2fdb37
creating an agg now a func
isVoid Apr 2, 2021
5ebf755
Update build description to include pre-built case
isVoid Apr 2, 2021
736e440
Moving buildargs below to avoid rerun apt-installs
isVoid Apr 5, 2021
9494537
cuda 11.2 and main branch
isVoid Apr 22, 2021
78e1d7c
removing dockerfile
isVoid Apr 22, 2021
efd897d
readme
isVoid Apr 22, 2021
8a49e83
Root readme, add to CI?
isVoid Apr 22, 2021
906154c
Updated with new build script
isVoid Apr 22, 2021
b80a7de
example build script bugs
isVoid Apr 22, 2021
de82e6e
Merge branch 'branch-0.20' of https://github.com/rapidsai/cudf into b…
isVoid May 5, 2021
079b88e
Merge branch 'branch-21.06' of https://github.com/rapidsai/cudf into …
isVoid May 24, 2021
feb77cc
revert changes made to system level build scripts
isVoid May 24, 2021
6418342
Fixing bugs in build.sh
isVoid May 24, 2021
7d5f8c2
add examples into gpuci
isVoid May 24, 2021
d799c01
Comments
isVoid May 24, 2021
972f1a1
Return table_with_metadata instead
isVoid May 24, 2021
a87475e
readme
isVoid May 24, 2021
66dd82c
Abs path in build.sh
isVoid Jun 8, 2021
145a96e
Merge branch 'branch-21.08' of https://github.com/rapidsai/cudf into …
isVoid Jun 17, 2021
41a6e3f
Use latest branch and configure auto update
isVoid Jun 17, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions cpp/example/4stock_5day.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
Company,Date,Open,High,Low,Close,Volume
MSFT,2021-03-03,232.16000366210938,233.5800018310547,227.25999450683594,227.55999755859375,33950400.0
MSFT,2021-03-04,226.74000549316406,232.49000549316406,224.25999450683594,226.72999572753906,44584200.0
MSFT,2021-03-05,229.52000427246094,233.27000427246094,226.4600067138672,231.60000610351562,41842100.0
MSFT,2021-03-08,231.3699951171875,233.3699951171875,227.1300048828125,227.38999938964844,35245900.0
MSFT,2021-03-09,232.8800048828125,235.3800048828125,231.6699981689453,233.77999877929688,33034000.0
GOOG,2021-03-03,2067.2099609375,2088.51806640625,2010.0,2026.7099609375,1483100.0
GOOG,2021-03-04,2023.3699951171875,2089.239990234375,2020.27001953125,2049.090087890625,2116100.0
GOOG,2021-03-05,2073.1201171875,2118.110107421875,2046.4150390625,2108.5400390625,2193800.0
GOOG,2021-03-08,2101.1298828125,2128.81005859375,2021.6099853515625,2024.1700439453125,1646000.0
GOOG,2021-03-09,2070.0,2078.0400390625,2047.8299560546875,2052.699951171875,1696400.0
AMZN,2021-03-03,3081.179931640625,3107.780029296875,2995.0,3005.0,3967200.0
AMZN,2021-03-04,3012.0,3058.1298828125,2945.429931640625,2977.570068359375,5458700.0
AMZN,2021-03-05,3005.0,3009.0,2881.0,3000.4599609375,5383400.0
AMZN,2021-03-08,3015.0,3064.590087890625,2951.31005859375,2951.949951171875,4178500.0
AMZN,2021-03-09,3017.989990234375,3090.9599609375,3005.14990234375,3062.85009765625,4023500.0
AAPL,2021-03-03,124.80999755859375,125.70999908447266,121.83999633789062,122.05999755859375,112430400.0
AAPL,2021-03-04,121.75,123.5999984741211,118.62000274658203,120.12999725341797,177275300.0
AAPL,2021-03-05,120.9800033569336,121.94000244140625,117.56999969482422,121.41999816894531,153590400.0
AAPL,2021-03-08,120.93000030517578,121.0,116.20999908447266,116.36000061035156,153918600.0
AAPL,2021-03-09,119.02999877929688,122.05999755859375,118.79000091552734,121.08999633789062,129159600.0
34 changes: 34 additions & 0 deletions cpp/example/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
cmake_minimum_required(VERSION 3.18)

project(libcudf_example VERSION 0.0.1 LANGUAGES C CXX CUDA)

set(CMAKE_CXX_STANDARD 14)
set(CMAKE_CUDA_ARCHITECTURES "")
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)

set(CPM_DOWNLOAD_VERSION 0.27.2)
set(CPM_DOWNLOAD_LOCATION "${CMAKE_BINARY_DIR}/cmake/CPM_${CPM_DOWNLOAD_VERSION}.cmake")

if(NOT (EXISTS ${CPM_DOWNLOAD_LOCATION}))
message(STATUS "Downloading CPM.cmake")
file(DOWNLOAD https://github.com/TheLartians/CPM.cmake/releases/download/v${CPM_DOWNLOAD_VERSION}/CPM.cmake ${CPM_DOWNLOAD_LOCATION})
endif()

include(${CPM_DOWNLOAD_LOCATION})

CPMAddPackage(NAME cudf
GIT_REPOSITORY https://github.com/rapidsai/cudf
GIT_TAG branch-0.19
GIT_SHALLOW TRUE
SOURCE_SUBDIR cpp
OPTIONS "BUILD_TESTS OFF"
"BUILD_BENCHMARKS OFF"
"ARROW_STATIC_LIB ON"
"JITIFY_USE_CACHE ON"
"CUDA_STATIC_RUNTIME ON"
"DISABLE_DEPRECATION_WARNING ON"
)

# Configure your project here
add_executable(${PROJECT_NAME} "src/process_csv.cpp")
target_link_libraries(${PROJECT_NAME} cudf::cudf)
49 changes: 49 additions & 0 deletions cpp/example/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
FROM nvidia/cuda:11.1.1-devel-ubuntu18.04
ARG PARALLEL_LEVEL=8

# Install basic cudf dependencies
RUN GCC_VERSION=9 \
&& apt update -y \
&& apt install -y software-properties-common \
&& add-apt-repository -y ppa:git-core/ppa \
&& add-apt-repository -y ppa:ubuntu-toolchain-r/test \
&& apt install -y \
gcc-${GCC_VERSION} g++-${GCC_VERSION} \
git nano sudo wget ninja-build bash-completion \
# CMake dependencies
curl libssl-dev libcurl4-openssl-dev zlib1g-dev \
# cuDF dependencies
libboost-filesystem-dev \
&& apt autoremove -y \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* \
# Remove any existing gcc and g++ alternatives
&& update-alternatives --remove-all cc >/dev/null 2>&1 || true \
&& update-alternatives --remove-all c++ >/dev/null 2>&1 || true \
&& update-alternatives --remove-all gcc >/dev/null 2>&1 || true \
&& update-alternatives --remove-all g++ >/dev/null 2>&1 || true \
&& update-alternatives --remove-all gcov >/dev/null 2>&1 || true \
&& update-alternatives \
--install /usr/bin/gcc gcc /usr/bin/gcc-${GCC_VERSION} 100 \
--slave /usr/bin/cc cc /usr/bin/gcc-${GCC_VERSION} \
--slave /usr/bin/g++ g++ /usr/bin/g++-${GCC_VERSION} \
--slave /usr/bin/c++ c++ /usr/bin/g++-${GCC_VERSION} \
--slave /usr/bin/gcov gcov /usr/bin/gcov-${GCC_VERSION} \
# Set gcc-${GCC_VERSION} as the default gcc
&& update-alternatives --set gcc /usr/bin/gcc-${GCC_VERSION}

ARG CMAKE_VERSION=3.18.5

# Install CMake
RUN cd /tmp \
&& curl -fsSLO --compressed "https://github.com/Kitware/CMake/releases/download/v$CMAKE_VERSION/cmake-$CMAKE_VERSION.tar.gz" -o /tmp/cmake-$CMAKE_VERSION.tar.gz \
&& tar -xvzf /tmp/cmake-$CMAKE_VERSION.tar.gz && cd /tmp/cmake-$CMAKE_VERSION \
&& /tmp/cmake-$CMAKE_VERSION/bootstrap \
--system-curl \
--parallel=$PARALLEL_LEVEL \
&& make install -j$PARALLEL_LEVEL \
&& cd /tmp && rm -rf /tmp/cmake-$CMAKE_VERSION*

ENV CUDA_HOME="/usr/local/cuda"

RUN mkdir -p /workspace
WORKDIR /workspace
44 changes: 44 additions & 0 deletions cpp/example/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Basic Standalone libcudf C++ application

This simple C++ example demonstrates a basic libcudf use case and provides a
minimal example of building your own application based on libcudf using CMake.

The example source code loads a csv file that contains stock prices from 4
companies spanning across 5 days, computes the average of the closing price
for each company and writes the result in csv format.

## How to compile and execute

The compilation process is automated by a Dockerfile included in the project.
isVoid marked this conversation as resolved.
Show resolved Hide resolved

Prerequisites:
- docker (API >= 1.40 to support --gpus)
- nvidia driver >= 450.80.02 (to support cudatoolkit 11.1)

### Step 1: build environment in docker
```bash
docker build . -t rapidsenv
```

### Step 2: start the container
```bash
docker run -t -d -v $PWD:/workspace --gpus all --name rapidsenvrt rapidsenv
```

### (When active container running) Configure project
```bash
docker exec rapidsenvrt sh -c "cmake -S . -B build/"
```

### (When active container running) Build project
```bash
docker exec rapidsenvrt sh -c "cmake --build build/ --parallel $PARALLEL_LEVEL"
```
The first time running this command will take a long time because it will build
libcudf on the host machine. It may be sped up by configuring the proper
`PARALLEL_LEVEL` number.

### (When active container running) Execute binary
```bash
docker exec rapidsenvrt sh -c "build/libcudf_example"
```
69 changes: 69 additions & 0 deletions cpp/example/src/process_csv.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
#include <cudf/aggregation.hpp>
#include <cudf/groupby.hpp>
#include <cudf/io/csv.hpp>
#include <cudf/table/table.hpp>

isVoid marked this conversation as resolved.
Show resolved Hide resolved
#include <rmm/mr/device/cuda_memory_resource.hpp>
#include <rmm/mr/device/per_device_resource.hpp>

#include <memory>
#include <string>
#include <utility>
#include <vector>

std::unique_ptr<cudf::table> read_csv(std::string const& file_path)
{
auto source_info = cudf::io::source_info(file_path);
auto builder = cudf::io::csv_reader_options::builder(source_info);
auto options = builder.build();
auto data_with_meta = cudf::io::read_csv(options);
return std::move(data_with_meta.tbl);
harrism marked this conversation as resolved.
Show resolved Hide resolved
}

void write_csv(cudf::table_view const& tbl_view, std::string const& file_path)
{
auto sink_info = cudf::io::sink_info(file_path);
auto builder = cudf::io::csv_writer_options::builder(sink_info, tbl_view);
auto options = builder.build();
cudf::io::write_csv(options);
}

std::unique_ptr<cudf::table> average_closing_price(cudf::table_view stock_info_table)
{
// Schema: | Company | Date | Open | High | Low | Close | Volume |
auto keys = cudf::table_view{{stock_info_table.column(0)}}; // Company
auto val = stock_info_table.column(5); // Close

// Compute the average of each company's closing price with entire column
cudf::groupby::groupby grpby_obj(keys);
cudf::groupby::aggregation_request agg_request{val, {}};
agg_request.aggregations.push_back(cudf::make_mean_aggregation());
std::vector<cudf::groupby::aggregation_request> requests;
requests.push_back(std::move(agg_request));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cudf::groupby::aggregation_request agg_request{val, {}};
agg_request.aggregations.push_back(cudf::make_mean_aggregation());
std::vector<cudf::groupby::aggregation_request> requests;
requests.push_back(std::move(agg_request));
cudf::groupby::aggregation_request agg_request{val, {cudf::make_mean_aggregation()}};
std::vector<cudf::groupby::aggregation_request> requests{std::move(agg_request)};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this, but getting compile error:

In file included from /usr/include/c++/9/memory:65,
                 from /workspace/build/_deps/cudf-src/cpp/include/cudf/aggregation.hpp:22,
                 from /workspace/src/process_csv.cpp:1:
/usr/include/c++/9/bits/stl_uninitialized.h: In instantiation of '_ForwardIterator std::uninitialized_copy(_InputIterator, _InputIterator, _ForwardIterator) [with _InputIterator = const std::unique_ptr<cudf::aggregation>*; _ForwardIterator = std::unique_ptr<cudf::aggregation>*]':
/usr/include/c++/9/bits/stl_uninitialized.h:307:37:   required from '_ForwardIterator std::__uninitialized_copy_a(_InputIterator, _InputIterator, _ForwardIterator, std::allocator<_Tp>&) [with _InputIterator = const std::unique_ptr<cudf::aggregation>*; _ForwardIterator = std::unique_ptr<cudf::aggregation>*; _Tp = std::unique_ptr<cudf::aggregation>]'
/usr/include/c++/9/bits/stl_vector.h:1582:33:   required from 'void std::vector<_Tp, _Alloc>::_M_range_initialize(_ForwardIterator, _ForwardIterator, std::forward_iterator_tag) [with _ForwardIterator = const std::unique_ptr<cudf::aggregation>*; _Tp = std::unique_ptr<cudf::aggregation>; _Alloc = std::allocator<std::unique_ptr<cudf::aggregation> >]'
/usr/include/c++/9/bits/stl_vector.h:626:2:   required from 'std::vector<_Tp, _Alloc>::vector(std::initializer_list<_Tp>, const allocator_type&) [with _Tp = std::unique_ptr<cudf::aggregation>; _Alloc = std::allocator<std::unique_ptr<cudf::aggregation> >; std::vector<_Tp, _Alloc>::allocator_type = std::allocator<std::unique_ptr<cudf::aggregation> >]'
/workspace/src/process_csv.cpp:41:86:   required from here
/usr/include/c++/9/bits/stl_uninitialized.h:127:72: error: static assertion failed: result type must be constructible from value type of input range
  127 |       static_assert(is_constructible<_ValueType2, decltype(*__first)>::value,
      |                                                                        ^~~~~
CMakeFiles/libcudf_example.dir/build.make:84: recipe for target 'CMakeFiles/libcudf_example.dir/src/process_csv.cpp.o' failed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@harrism harrism Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::vector<groupby::aggregation_request> requests;
requests.emplace_back(groupby::aggregation_request());

Perhaps like this then:

std::vector<cudf::groupby::aggregation_request> requests;
requests.emplace_back(val, {cudf::make_mean_aggregation()});

Copy link
Contributor Author

@isVoid isVoid Apr 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are extra two lines below the test example that are also relevant in creating an aggregation. For it's complexity, I created a separate function make_single_aggregation_request to wrap the logic.

An offline discussion mentions all these complexity maybe alleviated once the vectors are replaced with host_span. And that APIs like aggregation_request shouldn't be owner to an aggregation.


auto agg_results = grpby_obj.aggregate(requests);

// Assemble the result
auto result_key = std::move(agg_results.first);
auto result_val = std::move(agg_results.second[0].results[0]);
std::vector<cudf::column_view> columns{result_key->get_column(0), *result_val};
return std::make_unique<cudf::table>(cudf::table_view(columns));
}

int main(int argc, char** argv)
{
// Init cuda memory resource
rmm::mr::cuda_memory_resource cuda_mr;
rmm::mr::set_current_device_resource(&cuda_mr);
isVoid marked this conversation as resolved.
Show resolved Hide resolved

// Read data
auto stock_info_table = read_csv("4stock_5day.csv");

// Process
auto result = average_closing_price(*stock_info_table);

// Write out result
write_csv(*result, "4stock_5day_avg_close.csv");

return 0;
}