Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support more temporary table construct #9354

Open
Lordworms opened this issue Feb 27, 2024 · 3 comments
Open

Support more temporary table construct #9354

Lordworms opened this issue Feb 27, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@Lordworms
Copy link
Contributor

Is your feature request related to a problem or challenge?

When I was doing #9323 , I found that duckDB supports this syntax
image
which does not implemented in datafusion
image
I think we should support such structures

Describe the solution you'd like

I have not really thought about it. A rough solution for this should be adding a new table schema in the default schema and when we emit the subquery, we add a new entry to the default schema and then do the following steps? I think we should deduce whether an expression could be evaluated as a temp table or not(In SQLparser or not?) and then add schema to the default schema book. This might include changing SQLparser and lots of existing logics

Describe alternatives you've considered

No response

Additional context

No response

@Lordworms Lordworms added the enhancement New feature or request label Feb 27, 2024
@jayzhan211
Copy link
Contributor

Duckdb does not support table for all the function too, like array_to_string. I think we can do it for range

@alamb
Copy link
Contributor

alamb commented Feb 28, 2024

I think this type of construct is called a TableFunction and we can define them in DataFUsion

This is how we implement the parquet_metadata function in datafusion-cli: https://arrow.apache.org/datafusion/user-guide/cli.html

SELECT path_in_schema, row_group_id, row_group_num_rows, stats_min, stats_max, total_compressed_size
FROM parquet_metadata('hits.parquet')
WHERE path_in_schema = '"WatchID"'
LIMIT 3;

+----------------+--------------+--------------------+---------------------+---------------------+-----------------------+
| path_in_schema | row_group_id | row_group_num_rows | stats_min           | stats_max           | total_compressed_size |
+----------------+--------------+--------------------+---------------------+---------------------+-----------------------+
| "WatchID"      | 0            | 450560             | 4611687214012840539 | 9223369186199968220 | 3883759               |
| "WatchID"      | 1            | 612174             | 4611689135232456464 | 9223371478009085789 | 5176803               |
| "WatchID"      | 2            | 344064             | 4611692774829951781 | 9223363791697310021 | 3031680               |
+----------------+--------------+--------------------+---------------------+---------------------+-----------------------+
3 rows in set. Query took 0.053 seconds.

Thus I think for this particular issue, we could implement a Table function rangeand register it via https://docs.rs/datafusion/latest/datafusion/execution/context/struct.SessionContext.html#method.register_udtf

@Lordworms
Copy link
Contributor Author

I think this type of construct is called a TableFunction and we can define them in DataFUsion

This is how we implement the parquet_metadata function in datafusion-cli: https://arrow.apache.org/datafusion/user-guide/cli.html

SELECT path_in_schema, row_group_id, row_group_num_rows, stats_min, stats_max, total_compressed_size
FROM parquet_metadata('hits.parquet')
WHERE path_in_schema = '"WatchID"'
LIMIT 3;

+----------------+--------------+--------------------+---------------------+---------------------+-----------------------+
| path_in_schema | row_group_id | row_group_num_rows | stats_min           | stats_max           | total_compressed_size |
+----------------+--------------+--------------------+---------------------+---------------------+-----------------------+
| "WatchID"      | 0            | 450560             | 4611687214012840539 | 9223369186199968220 | 3883759               |
| "WatchID"      | 1            | 612174             | 4611689135232456464 | 9223371478009085789 | 5176803               |
| "WatchID"      | 2            | 344064             | 4611692774829951781 | 9223363791697310021 | 3031680               |
+----------------+--------------+--------------------+---------------------+---------------------+-----------------------+
3 rows in set. Query took 0.053 seconds.

Thus I think for this particular issue, we could implement a Table function rangeand register it via https://docs.rs/datafusion/latest/datafusion/execution/context/struct.SessionContext.html#method.register_udtf

Got it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants