From 5a02ea4f26887ff8e9d0f01ccab2fe539adac425 Mon Sep 17 00:00:00 2001 From: "Mads R. B. Kristensen" Date: Fri, 25 Aug 2023 14:22:48 +0200 Subject: [PATCH] zarr doc --- README.md | 180 +++--------------------------------------- docs/source/index.rst | 1 + docs/source/zarr.rst | 11 +++ 3 files changed, 22 insertions(+), 170 deletions(-) create mode 100644 docs/source/zarr.rst diff --git a/README.md b/README.md index 0422028dd9..5a60216bda 100644 --- a/README.md +++ b/README.md @@ -1,181 +1,21 @@ -# KvikIO: C++ and Python bindings to cuFile +# KvikIO: High Performance File IO ## Summary -This provides C++ and Python bindings to cuFile, which enables GPUDirect Storage (GDS). -KvikIO also works efficiently when GDS isn't available and can read/write both host and -device data seamlessly. +KvikIO is a Python and C++ library for high performance file IO. It provides C++ and Python +bindings to `cuFile `_, +which enables `GPUDirect Storage `_ (GDS). +KvikIO also works efficiently when GDS isn't available and can read/write both host and device data seamlessly. ### Features -* Object Oriented API. -* Exception handling. +* Object oriented API of `cuFile `_ with C++/Python exception handling. +* A Python Zarr backend for reading and writing GPU data to file seamlessly. * Concurrent reads and writes using an internal thread pool. * Non-blocking API. -* Python Zarr reader. * Handle both host and device IO seamlessly. * Provides Python bindings to [nvCOMP](https://github.com/NVIDIA/nvcomp). -## Requirements - -To install users should have a working Linux machine with CUDA Toolkit -installed (v11.4+) and a working compiler toolchain (C++17 and cmake). - -### C++ - -The C++ bindings are header-only and depends on the CUDA Driver API. -In order to build and run the example code, CMake and the CUDA Runtime -API is required. - -### Python - -The Python package depends on the following packages: - -* cython -* pip -* setuptools -* scikit-build - -For nvCOMP, benchmarks, examples, and tests: - -* pytest -* numpy -* cupy - -## Install - -### Conda - -Install the stable release from the `rapidsai` channel like: - -``` -conda create -n kvikio_env -c rapidsai -c conda-forge kvikio -``` - -Install the `kvikio` conda package from the `rapidsai-nightly` channel like: - -``` -conda create -n kvikio_env -c rapidsai-nightly -c conda-forge python=3.10 cuda-version=11.8 kvikio -``` - -If the nightly install doesn't work, set `channel_priority: flexible` in your `.condarc`. - -In order to setup a development environment run: -``` -conda env create --name kvikio-dev --file conda/environments/all_cuda-118_arch-x86_64.yaml -``` - -### C++ (build from source) - -To build the C++ example run: - -``` -./build.sh libkvikio -``` - -Then run the example: - -``` -./examples/basic_io -``` - -### Python (build from source) - -To build and install the extension run: - -``` -./build.sh kvikio -``` - -One might have to define `CUDA_HOME` to the path to the CUDA installation. - -In order to test the installation, run the following: - -``` -pytest tests/ -``` - -And to test performance, run the following: - -``` -python benchmarks/single-node-io.py -``` - -## Examples - - -### Notebooks - - [How to read and write GPU memory directly to/from Zarr files](notebooks/zarr.ipynb) - - -### C++ - -```c++ -#include -#include -#include -using namespace std; - -int main() -{ - // Create two arrays `a` and `b` - constexpr std::size_t size = 100; - void *a = nullptr; - void *b = nullptr; - cudaMalloc(&a, size); - cudaMalloc(&b, size); - - // Write `a` to file - kvikio::FileHandle fw("test-file", "w"); - size_t written = fw.write(a, size); - fw.close(); - - // Read file into `b` - kvikio::FileHandle fr("test-file", "r"); - size_t read = fr.read(b, size); - fr.close(); - - // Read file into `b` in parallel using 16 threads - kvikio::default_thread_pool::reset(16); - { - kvikio::FileHandle f("test-file", "r"); - future future = f.pread(b_dev, sizeof(a), 0); // Non-blocking - size_t read = future.get(); // Blocking - // Notice, `f` closes automatically on destruction. - } -} -``` - -### Python - -```python -import cupy -import kvikio - -a = cupy.arange(100) -f = kvikio.CuFile("test-file", "w") -# Write whole array to file -f.write(a) -f.close() - -b = cupy.empty_like(a) -f = kvikio.CuFile("test-file", "r") -# Read whole array from file -f.read(b) -assert all(a == b) - -# Use contexmanager -c = cupy.empty_like(a) -with kvikio.CuFile(path, "r") as f: - f.read(c) -assert all(a == c) - -# Non-blocking read -d = cupy.empty_like(a) -with kvikio.CuFile(path, "r") as f: - future1 = f.pread(d[:50]) - future2 = f.pread(d[50:], file_offset=d[:50].nbytes) - future1.get() # Wait for first read - future2.get() # Wait for second read -assert all(a == d) -``` +### Documentation + * Python: + * C++: diff --git a/docs/source/index.rst b/docs/source/index.rst index 09ee569180..f69e6aef0e 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -22,5 +22,6 @@ Contents install quickstart + zarr api genindex diff --git a/docs/source/zarr.rst b/docs/source/zarr.rst new file mode 100644 index 0000000000..84ed133089 --- /dev/null +++ b/docs/source/zarr.rst @@ -0,0 +1,11 @@ +Zarr +==== + +KvikIO implements a Zarr-Python backend for reading and writing GPU data to file seamlessly. + +The following is An example of how to use the convenience function :py:meth:`kvikio.zarr.open_cupy_array` +to create a new Zarr array and how open an existing Zarr array. + + +.. literalinclude:: ../../python/examples/zarr_cupy_nvcomp.py + :language: python