From 5a02ea4f26887ff8e9d0f01ccab2fe539adac425 Mon Sep 17 00:00:00 2001
From: "Mads R. B. Kristensen" <madsbk@gmail.com>
Date: Fri, 25 Aug 2023 14:22:48 +0200
Subject: [PATCH] zarr doc

---
 README.md             | 180 +++---------------------------------------
 docs/source/index.rst |   1 +
 docs/source/zarr.rst  |  11 +++
 3 files changed, 22 insertions(+), 170 deletions(-)
 create mode 100644 docs/source/zarr.rst
diff --git a/README.md b/README.md
index 0422028dd9..5a60216bda 100644
--- a/README.md
+++ b/README.md
@@ -1,181 +1,21 @@
-# KvikIO: C++ and Python bindings to cuFile
+# KvikIO: High Performance File IO
 
 ## Summary
 
-This provides C++ and Python bindings to cuFile, which enables GPUDirect Storage (GDS).
-KvikIO also works efficiently when GDS isn't available and can read/write both host and
-device data seamlessly.
+KvikIO is a Python and C++ library for high performance file IO. It provides C++ and Python
+bindings to `cuFile <https://docs.nvidia.com/gpudirect-storage/api-reference-guide/index.html>`_,
+which enables `GPUDirect Storage <https://developer.nvidia.com/blog/gpudirect-storage/>`_ (GDS).
+KvikIO also works efficiently when GDS isn't available and can read/write both host and device data seamlessly.
 
 ### Features
 
-* Object Oriented API.
-* Exception handling.
+* Object oriented API of `cuFile <https://docs.nvidia.com/gpudirect-storage/api-reference-guide/index.html>`_ with C++/Python exception handling.
+* A Python Zarr backend for reading and writing GPU data to file seamlessly.
 * Concurrent reads and writes using an internal thread pool.
 * Non-blocking API.
-* Python Zarr reader.
 * Handle both host and device IO seamlessly.
 * Provides Python bindings to [nvCOMP](https://github.com/NVIDIA/nvcomp).
 
-## Requirements
-
-To install users should have a working Linux machine with CUDA Toolkit
-installed (v11.4+) and a working compiler toolchain (C++17 and cmake).
-
-### C++
-
-The C++ bindings are header-only and depends on the CUDA Driver API.
-In order to build and run the example code, CMake and the CUDA Runtime
-API is required.
-
-### Python
-
-The Python package depends on the following packages:
-
-* cython
-* pip
-* setuptools
-* scikit-build
-
-For nvCOMP, benchmarks, examples, and tests:
-
-* pytest
-* numpy
-* cupy
-
-## Install
-
-### Conda
-
-Install the stable release from the `rapidsai` channel like:
-
-```
-conda create -n kvikio_env -c rapidsai -c conda-forge kvikio
-```
-
-Install the `kvikio` conda package from the `rapidsai-nightly` channel like:
-
-```
-conda create -n kvikio_env -c rapidsai-nightly -c conda-forge python=3.10 cuda-version=11.8 kvikio
-```
-
-If the nightly install doesn't work, set `channel_priority: flexible` in your `.condarc`.
-
-In order to setup a development environment run:
-```
-conda env create --name kvikio-dev --file conda/environments/all_cuda-118_arch-x86_64.yaml
-```
-
-### C++ (build from source)
-
-To build the C++ example run:
-
-```
-./build.sh libkvikio
-```
-
-Then run the example:
-
-```
-./examples/basic_io
-```
-
-### Python (build from source)
-
-To build and install the extension run:
-
-```
-./build.sh kvikio
-```
-
-One might have to define `CUDA_HOME` to the path to the CUDA installation.
-
-In order to test the installation, run the following:
-
-```
-pytest tests/
-```
-
-And to test performance, run the following:
-
-```
-python benchmarks/single-node-io.py
-```
-
-## Examples
-
-
-### Notebooks
- - [How to read and write GPU memory directly to/from Zarr files](notebooks/zarr.ipynb)
-
-
-### C++
-
-```c++
-#include <cstddef>
-#include <cuda_runtime.h>
-#include <kvikio/file_handle.hpp>
-using namespace std;
-
-int main()
-{
-  // Create two arrays `a` and `b`
-  constexpr std::size_t size = 100;
-  void *a = nullptr;
-  void *b = nullptr;
-  cudaMalloc(&a, size);
-  cudaMalloc(&b, size);
-
-  // Write `a` to file
-  kvikio::FileHandle fw("test-file", "w");
-  size_t written = fw.write(a, size);
-  fw.close();
-
-  // Read file into `b`
-  kvikio::FileHandle fr("test-file", "r");
-  size_t read = fr.read(b, size);
-  fr.close();
-
-  // Read file into `b` in parallel using 16 threads
-  kvikio::default_thread_pool::reset(16);
-  {
-    kvikio::FileHandle f("test-file", "r");
-    future<size_t> future = f.pread(b_dev, sizeof(a), 0);  // Non-blocking
-    size_t read = future.get(); // Blocking
-    // Notice, `f` closes automatically on destruction.
-  }
-}
-```
-
-### Python
-
-```python
-import cupy
-import kvikio
-
-a = cupy.arange(100)
-f = kvikio.CuFile("test-file", "w")
-# Write whole array to file
-f.write(a)
-f.close()
-
-b = cupy.empty_like(a)
-f = kvikio.CuFile("test-file", "r")
-# Read whole array from file
-f.read(b)
-assert all(a == b)
-
-# Use contexmanager
-c = cupy.empty_like(a)
-with kvikio.CuFile(path, "r") as f:
-    f.read(c)
-assert all(a == c)
-
-# Non-blocking read
-d = cupy.empty_like(a)
-with kvikio.CuFile(path, "r") as f:
-    future1 = f.pread(d[:50])
-    future2 = f.pread(d[50:], file_offset=d[:50].nbytes)
-    future1.get()  # Wait for first read
-    future2.get()  # Wait for second read
-assert all(a == d)
-```
+### Documentation
+ * Python: <https://docs.rapids.ai/api/kvikio/nightly/>
+ * C++: <https://docs.rapids.ai/api/libkvikio/nightly/>
diff --git a/docs/source/index.rst b/docs/source/index.rst
index 09ee569180..f69e6aef0e 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -22,5 +22,6 @@ Contents
 
    install
    quickstart
+   zarr
    api
    genindex
diff --git a/docs/source/zarr.rst b/docs/source/zarr.rst
new file mode 100644
index 0000000000..84ed133089
--- /dev/null
+++ b/docs/source/zarr.rst
@@ -0,0 +1,11 @@
+Zarr
+====
+
+KvikIO implements a Zarr-Python backend for reading and writing GPU data to file seamlessly.
+
+The following is An example of how to use the convenience function :py:meth:`kvikio.zarr.open_cupy_array`
+to create a new Zarr array and how open an existing Zarr array.
+
+
+.. literalinclude:: ../../python/examples/zarr_cupy_nvcomp.py
+    :language: python