docs: Update DataFusion introduction to clarify that DataFusion does …

…provide an "out of the box" query engine (#12666) * Update DataFusion introduction to show that DataFusion offers packaged versions for end users * change order * Update README.md Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refine wording and update user guide for consistency * prettier --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
apache · Oct 3, 2024 · 42ef58e · 42ef58e
1 parent 1f2f02f
commit 42ef58e
Show file tree

Hide file tree

Showing 2 changed files with 28 additions and 4 deletions.
diff --git a/README.md b/README.md
@@ -42,14 +42,25 @@
 </a>
 
 DataFusion is an extensible query engine written in [Rust] that
-uses [Apache Arrow] as its in-memory format. DataFusion's target users are
+uses [Apache Arrow] as its in-memory format.
+
+The DataFusion libraries in this repository are used to build data-centric system software. DataFusion also provides the
+following subprojects, which are packaged versions of DataFusion intended for end users.
+
+- [DataFusion Python](https://github.com/apache/datafusion-python/) offers a Python interface for SQL and DataFrame
+  queries.
+- [DataFusion Ray](https://github.com/apache/datafusion-ray/) provides a distributed version of DataFusion that scales
+  out on Ray clusters.
+- [DataFusion Comet](https://github.com/apache/datafusion-comet/) is an accelerator for Apache Spark based on
+  DataFusion.
+
+The target audience for the DataFusion crates in this repository are
 developers building fast and feature rich database and analytic systems,
 customized to particular workloads. See [use cases] for examples.
 
-"Out of the box," DataFusion offers [SQL] and [`Dataframe`] APIs,
+DataFusion offers [SQL] and [`Dataframe`] APIs,
 excellent [performance], built-in support for CSV, Parquet, JSON, and Avro,
 extensive customization, and a great community.
-[Python Bindings] are also available.
 
 DataFusion features a full query planner, a columnar, streaming, multi-threaded,
 vectorized execution engine, and partitioned data sources. You can

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -34,7 +34,20 @@ Apache DataFusion
 
 
 DataFusion is an extensible query engine written in `Rust <http://rustlang.org>`_ that
-uses `Apache Arrow <https://arrow.apache.org>`_ as its in-memory format. DataFusion's target users are
+uses `Apache Arrow <https://arrow.apache.org>`_ as its in-memory format.
+
+This documentation is for the <a href="https://github.com/apache/datafusion">core DataFusion project</a>, which contains
+libraries that are used to build data-centric system software. DataFusion also offers the following subprojects, which
+provide packaged versions of DataFusion intended for end users, and these have separate documentation.
+
+- <a href="https://datafusion.apache.org/python/">DataFusion Python</a> offers a Python interface for SQL and DataFrame
+  queries.
+- <a href="https://github.com/apache/datafusion-ray/">DataFusion Ray</a> provides a distributed version of DataFusion
+  that scales out on <a href="https://www.ray.io">Ray</a> clusters.
+- <a href="https://datafusion.apache.org/comet/">DataFusion Comet</a> is an accelerator for Apache Spark based on
+  DataFusion.
+
+DataFusion's target users are
 developers building fast and feature rich database and analytic systems,
 customized to particular workloads. See `use cases <https://datafusion.apache.org/user-guide/introduction.html#use-cases>`_ for examples.