From 2b1e1838138929097ac51ceb3e38f8b946bc3aa5 Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Sat, 28 Sep 2024 10:26:24 -0600 Subject: [PATCH 1/5] Update DataFusion introduction to show that DataFusion offers packaged versions for end users --- README.md | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index bb8526c24e2c..7fb8f8ca99c4 100644 --- a/README.md +++ b/README.md @@ -42,14 +42,25 @@ DataFusion is an extensible query engine written in [Rust] that -uses [Apache Arrow] as its in-memory format. DataFusion's target users are +uses [Apache Arrow] as its in-memory format. + +The core DataFusion libraries in this repository are not designed to be an out-of-the-box tool for end users. The +following subprojects offer packaged versions of DataFusion. + +- [DataFusion Python](https://github.com/apache/datafusion-python/) offers a Python interface for SQL and DataFrame + queries. +- [DataFusion Comet](https://github.com/apache/datafusion-comet/) is an accelerator for Apache Spark based on + DataFusion. +- [DataFusion Ray](https://github.com/apache/datafusion-ray/) provides a distributed version of DataFusion that scales + out on Ray clusters. + +The target audience for the DataFusion crates in this repository are developers building fast and feature rich database and analytic systems, customized to particular workloads. See [use cases] for examples. -"Out of the box," DataFusion offers [SQL] and [`Dataframe`] APIs, +DataFusion offers [SQL] and [`Dataframe`] APIs, excellent [performance], built-in support for CSV, Parquet, JSON, and Avro, extensive customization, and a great community. -[Python Bindings] are also available. DataFusion features a full query planner, a columnar, streaming, multi-threaded, vectorized execution engine, and partitioned data sources. You can From f8a668f035e16936bc14a59f5140dbef034a3222 Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Sat, 28 Sep 2024 10:36:15 -0600 Subject: [PATCH 2/5] change order --- README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 7fb8f8ca99c4..6d94c1db382b 100644 --- a/README.md +++ b/README.md @@ -44,15 +44,15 @@ DataFusion is an extensible query engine written in [Rust] that uses [Apache Arrow] as its in-memory format. -The core DataFusion libraries in this repository are not designed to be an out-of-the-box tool for end users. The -following subprojects offer packaged versions of DataFusion. +The core DataFusion libraries in this repository are not designed to be an out-of-the-box tool for end users. However, +the following subprojects offer packaged versions of DataFusion. - [DataFusion Python](https://github.com/apache/datafusion-python/) offers a Python interface for SQL and DataFrame queries. -- [DataFusion Comet](https://github.com/apache/datafusion-comet/) is an accelerator for Apache Spark based on - DataFusion. - [DataFusion Ray](https://github.com/apache/datafusion-ray/) provides a distributed version of DataFusion that scales out on Ray clusters. +- [DataFusion Comet](https://github.com/apache/datafusion-comet/) is an accelerator for Apache Spark based on + DataFusion. The target audience for the DataFusion crates in this repository are developers building fast and feature rich database and analytic systems, From 8731834582657d7fa4158c8368beba25290bb5ce Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Tue, 1 Oct 2024 06:51:18 -0600 Subject: [PATCH 3/5] Update README.md Co-authored-by: Andrew Lamb --- README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/README.md b/README.md index 6d94c1db382b..3415e78157e5 100644 --- a/README.md +++ b/README.md @@ -44,8 +44,7 @@ DataFusion is an extensible query engine written in [Rust] that uses [Apache Arrow] as its in-memory format. -The core DataFusion libraries in this repository are not designed to be an out-of-the-box tool for end users. However, -the following subprojects offer packaged versions of DataFusion. +The DataFusion libraries in this repository are used to build data centric system software. Some subprojects offer packaged versions of DataFusion for end users. - [DataFusion Python](https://github.com/apache/datafusion-python/) offers a Python interface for SQL and DataFrame queries. From b0610127e254feab8cc261e0f6a671965762eb24 Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Tue, 1 Oct 2024 07:03:45 -0600 Subject: [PATCH 4/5] refine wording and update user guide for consistency --- README.md | 3 ++- docs/source/index.rst | 15 ++++++++++++++- 2 files changed, 16 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 3415e78157e5..14f355767fed 100644 --- a/README.md +++ b/README.md @@ -44,7 +44,8 @@ DataFusion is an extensible query engine written in [Rust] that uses [Apache Arrow] as its in-memory format. -The DataFusion libraries in this repository are used to build data centric system software. Some subprojects offer packaged versions of DataFusion for end users. +The DataFusion libraries in this repository are used to build data-centric system software. DataFusion also provides the +following subprojects, which are packaged versions of DataFusion intended for end users. - [DataFusion Python](https://github.com/apache/datafusion-python/) offers a Python interface for SQL and DataFrame queries. diff --git a/docs/source/index.rst b/docs/source/index.rst index 32a5dce323f2..959b964026be 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -34,7 +34,20 @@ Apache DataFusion DataFusion is an extensible query engine written in `Rust `_ that -uses `Apache Arrow `_ as its in-memory format. DataFusion's target users are +uses `Apache Arrow `_ as its in-memory format. + +This documentation is for the core DataFusion project, which contains +libraries that are used to build data-centric system software. DataFusion also offers the following subprojects, which +provide packaged versions of DataFusion intended for end users, and these have separate documentation. + +- DataFusion Python offers a Python interface for SQL and DataFrame + queries. +- DataFusion Ray provides a distributed version of DataFusion + that scales out on Ray clusters. +- DataFusion Comet is an accelerator for Apache Spark based on + DataFusion. + +DataFusion's target users are developers building fast and feature rich database and analytic systems, customized to particular workloads. See `use cases `_ for examples. From f7cf5f06332a83f5dd290dafb3b0e4309f3894da Mon Sep 17 00:00:00 2001 From: Andrew Lamb Date: Thu, 3 Oct 2024 10:45:00 -0400 Subject: [PATCH 5/5] prettier --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 14f355767fed..5d0b096c1de1 100644 --- a/README.md +++ b/README.md @@ -44,7 +44,7 @@ DataFusion is an extensible query engine written in [Rust] that uses [Apache Arrow] as its in-memory format. -The DataFusion libraries in this repository are used to build data-centric system software. DataFusion also provides the +The DataFusion libraries in this repository are used to build data-centric system software. DataFusion also provides the following subprojects, which are packaged versions of DataFusion intended for end users. - [DataFusion Python](https://github.com/apache/datafusion-python/) offers a Python interface for SQL and DataFrame