Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support executing query stages in execution engines other than DataFusion #684

Closed
andygrove opened this issue Feb 25, 2023 · 5 comments
Closed
Labels
enhancement New feature or request

Comments

@andygrove
Copy link
Member

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
When the executor receives a task, it deserializes the physical plan, wraps it in a ShuffleWriterExec, and executes it with DataFusion.

I want the ability to override this behavior to execute the plan in execution engines other than DataFusion.

Describe the solution you'd like
In the executor, we call new_shuffle_writer to create the ShuffleWriterExec that wraps the plan to be executed. I am thinking about moving that method into a new ExecuionEngine trait and creating a DataFusionExecutor implementation of the trait that is used by default.

We can then add a field to ExecutorProcessConfig as follows:

execution_engine: Option<Arc<dyn ExecutionEngine>>

This will allow me to register custom execution engines from PyBallista, and execute distributed queries in Polars, Pandas, and cuDF.

Describe alternatives you've considered
None

Additional context
None

@andygrove andygrove added the enhancement New feature or request label Feb 25, 2023
@andygrove
Copy link
Member Author

@Dandandan @thinkharderdev @yahoNanJing @avantgardnerio @jdye64 fyi - let me know if you have any opinions on this approach. I am going to build a prototype of this over the next week. I am sure the design will evolve as I try and implement this.

@jdye64
Copy link
Contributor

jdye64 commented Feb 25, 2023

I have been thinking about this a lot today. I have had numerous ideas and all seem to have fell flat as I tried to fully implement them. I like the general idea however and curious to see how it looks fully materialized. Interesting stuff for sure!

@andygrove
Copy link
Member Author

There needs to be a scheduler element to this as well so we can do the plan translation once rather than per task.

@mingmwang
Copy link
Contributor

@andygrove
I like this idea. For the other execution engines, do you have any proposal ?

@andygrove
Copy link
Member Author

I am closing this for now because I think it is too ambitious given the current level of development in the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants