-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support executing query stages in execution engines other than DataFusion #684
Comments
@Dandandan @thinkharderdev @yahoNanJing @avantgardnerio @jdye64 fyi - let me know if you have any opinions on this approach. I am going to build a prototype of this over the next week. I am sure the design will evolve as I try and implement this. |
I have been thinking about this a lot today. I have had numerous ideas and all seem to have fell flat as I tried to fully implement them. I like the general idea however and curious to see how it looks fully materialized. Interesting stuff for sure! |
There needs to be a scheduler element to this as well so we can do the plan translation once rather than per task. |
@andygrove |
I am closing this for now because I think it is too ambitious given the current level of development in the project. |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
When the executor receives a task, it deserializes the physical plan, wraps it in a
ShuffleWriterExec
, and executes it with DataFusion.I want the ability to override this behavior to execute the plan in execution engines other than DataFusion.
Describe the solution you'd like
In the executor, we call
new_shuffle_writer
to create theShuffleWriterExec
that wraps the plan to be executed. I am thinking about moving that method into a newExecuionEngine
trait and creating aDataFusionExecutor
implementation of the trait that is used by default.We can then add a field to
ExecutorProcessConfig
as follows:This will allow me to register custom execution engines from PyBallista, and execute distributed queries in Polars, Pandas, and cuDF.
Describe alternatives you've considered
None
Additional context
None
The text was updated successfully, but these errors were encountered: