Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve documentation on how to build ScalarValue::Struct and add ScalarStructBuilder #9229

Merged
merged 5 commits into from
Feb 15, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Feb 14, 2024

Which issue does this PR close?

Closes #9227

Rationale for this change

Described on #9227

Basically I found it hard to build ScalarValue::Struct after #7893

What changes are included in this PR?

  1. Add documentation examples
  2. Add ScalarStructBuilder to help ease the conversion when trying to add ScalarValues or arrays as fields
  3. REmove some From.. impls for ScalarValue that were added in Change ScalarValue::Struct to ArrayRef #7893 but are of limited utility

Are these changes tested?

Yes, by doc tests

Are there any user-facing changes?

@@ -99,6 +102,66 @@ use arrow_buffer::NullBuffer;
/// # }
/// ```
///
/// # Nested Types
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have some similar examples for creating Struct::List

/// # use arrow::datatypes::{DataType, Field};
/// # use datafusion_common::{ScalarValue, scalar::ScalarStructBuilder};
/// // Build a struct like: {a: 1, b: "foo"}
/// let field_a = Field::new("a", DataType::Int32, false);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use case for the ScalarStructBuilder is pretty nicely illustrated by this example compared to the one in

## Example: Creating [`ScalarValue::Struct`] directly

@@ -2710,27 +2774,6 @@ impl From<String> for ScalarValue {
}
}

// TODO: Remove this after changing to Scalar<T>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were added in #7893 and not yet released so this is not a breaking API change

let struct_arr = sv.to_array().unwrap();
let actual = as_struct_array(&struct_arr).unwrap();
assert_eq!(actual, &expected);
}

#[test]
#[should_panic(expected = "assertion `left == right` failed")]
#[should_panic(
expected = "Error building SclarValue::Struct. Expected array with exactly one element, found array with 4 elements"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the builder makes better messages I think

@@ -932,17 +932,17 @@ fn round_trip_scalar_values() {
ScalarValue::Binary(None),
ScalarValue::LargeBinary(Some(b"bar".to_vec())),
ScalarValue::LargeBinary(None),
ScalarValue::from((
vec![
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW this would have panic'd previously if the input was invalid. Using the ScalarStructBuilder makes it clear that this can happen

@alamb alamb marked this pull request as ready for review February 14, 2024 18:08
@@ -15,7 +15,9 @@
// specific language governing permissions and limitations
// under the License.

//! This module provides ScalarValue, an enum that can be used for storage of single elements
//! [`ScalarValue`]: stores single constant values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constant may be misleading.

Maybe like

In databases, a scalar value is simply a single value, as opposed to a set of values or a combination of values. It's like having just one piece of information

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scalar value is a single value of different datatypes: bool, ints, floats, arrays, structs, etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point -- reworded

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks @alamb
couple of minors you may want to fix

Copy link
Contributor Author

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @jayzhan211 for your comments

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@alamb alamb merged commit 3f0963d into apache:main Feb 15, 2024
23 checks passed
@alamb alamb deleted the alamb/better_struct_array branch February 15, 2024 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Easier way to create ScalarValue::Struct
3 participants