docs: enhance the docs of Ballista client (#985)

* docs: enhance the guide Update README.md * Update README.md
apache · Feb 25, 2024 · abcbf83 · abcbf83
1 parent 1d753fb
commit abcbf83
Showing 1 changed file with 15 additions and 12 deletions.
diff --git a/ballista/client/README.md b/ballista/client/README.md
@@ -59,7 +59,7 @@ With these crates installed, it is now possible to start a scheduler process.
 RUST_LOG=info ballista-scheduler
 ```
 
-The scheduler will bind to port 50050 by default.
+The scheduler will bind to port `50050` by default.
 
 Next, start an executor processes in a new terminal session with the specified concurrency
 level.
@@ -68,7 +68,7 @@ level.
 RUST_LOG=info ballista-executor -c 4
 ```
 
-The executor will bind to port 50051 by default. Additional executors can be started by
+The executor will bind to port `50051` by default. Additional executors can be started by
 manually specifying a bind port. For example:
 
 ```bash
@@ -80,13 +80,10 @@ RUST_LOG=info ballista-executor --bind-port 50052 -c 4
 Ballista provides a `BallistaContext` as a starting point for creating queries. DataFrames can be created
 by invoking the `read_csv`, `read_parquet`, and `sql` methods.
 
-To build a simple ballista example, add the following dependencies to your `Cargo.toml` file:
+To build a simple ballista example, run the following command to add the dependencies to your `Cargo.toml` file:
 
-```toml
-[dependencies]
-ballista = "0.11"
-datafusion = "28.0.0"
-tokio = "1.0"
+```bash
+cargo add ballista datafusion tokio
 ```
 
 The following example runs a simple aggregate SQL query against a Parquet file (`yellow_tripdata_2022-01.parquet`) from the
@@ -96,8 +93,6 @@ data set. Download the file and add it to the `testdata` folder before running t
 ```rust,no_run
 use ballista::prelude::*;
 use datafusion::prelude::{col, min, max, avg, sum, ParquetReadOptions};
-use datafusion::arrow::util::pretty;
-use datafusion::prelude::CsvReadOptions;
 
 #[tokio::main]
 async fn main() -> Result<()> {
@@ -116,8 +111,16 @@ async fn main() -> Result<()> {
         .read_parquet(filename, ParquetReadOptions::default())
         .await?
         .select_columns(&["passenger_count", "fare_amount"])?
-        .aggregate(vec![col("passenger_count")], vec![min(col("fare_amount")), max(col("fare_amount")), avg(col("fare_amount")), sum(col("fare_amount"))])?
-        .sort(vec![col("passenger_count").sort(true,true)])?;
+        .aggregate(
+            vec![col("passenger_count")],
+            vec![
+                min(col("fare_amount")),
+                max(col("fare_amount")),
+                avg(col("fare_amount")),
+                sum(col("fare_amount")),
+            ],
+        )?
+        .sort(vec![col("passenger_count").sort(true, true)])?;
 
     // this is equivalent to the following SQL
     // SELECT passenger_count, MIN(fare_amount), MAX(fare_amount), AVG(fare_amount), SUM(fare_amount)