You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#[test]fnroundtrip_builtin_startswith_function() -> Result<()>{let field_a = Field::new("a",DataType::Utf8,false);let field_b = Field::new("b",DataType::Utf8,false);let schema = Arc::new(Schema::new(vec![field_a, field_b]));let input = Arc::new(EmptyExec::new(false, schema.clone()));let execution_props = ExecutionProps::new();let fun_expr = functions::create_physical_fun(&BuiltinScalarFunction::StartsWith,&execution_props,)?;let expr = ScalarFunctionExpr::new(&format!("{}", BuiltinScalarFunction::StartsWith),// Note that this is how ballista creates this name in https://github.com/apache/arrow-datafusion/blob/master/datafusion/physical-expr/src/functions.rs#L183
fun_expr,vec![col("a", &schema)?, col("b", &schema)?],&DataType::Boolean,);let project =
ProjectionExec::try_new(vec![(Arc::new(expr), "a".to_string())], input)?;roundtrip_test(Arc::new(project))}
Expected behavior
Above test should pass
Additional context
This is pretty much happening because UDFs and builtin functions are represented with the same ScalarFunctionExpr and ballista when serializing uses function name (which is startswith) and BuiltinScalarFunction::from_str (which matches only on starts_with) to differentiate between builtin and user defined functions.
While simplest fix probably would be in datafusion BuiltinScalarFunction::from_str to handle startswith as function name as well I believe that it would be better to fix it in some other way where builtin and udf functions are thrown in the same bag. Current approach means that in some given project code could unexpectedly break on datafusion updates if datafusion introduces new builtin function and project already had user defined function under the same name.
I wonder if it would be reasonable to separate builtin and user defined functions already in datafusion physical plan.
The text was updated successfully, but these errors were encountered:
Describe the bug
When physical plan includes datafusion builtin function
starts_with
then it is serialized as UDF.To Reproduce
Following test (can be added in somewhere in https://github.com/apache/arrow-ballista/blob/master/ballista/core/src/serde/physical_plan/mod.rs) should pass but it fails with
DataFusionError(Plan("There is no UDF named \"startswith\" in the registry"))
Expected behavior
Above test should pass
Additional context
This is pretty much happening because UDFs and builtin functions are represented with the same
ScalarFunctionExpr
and ballista when serializing uses function name (which isstartswith
) andBuiltinScalarFunction::from_str
(which matches only onstarts_with
) to differentiate between builtin and user defined functions.While simplest fix probably would be in datafusion
BuiltinScalarFunction::from_str
to handlestartswith
as function name as well I believe that it would be better to fix it in some other way where builtin and udf functions are thrown in the same bag. Current approach means that in some given project code could unexpectedly break on datafusion updates if datafusion introduces new builtin function and project already had user defined function under the same name.I wonder if it would be reasonable to separate builtin and user defined functions already in datafusion physical plan.
The text was updated successfully, but these errors were encountered: