Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dask] Support asynchronous workflows #3929

Closed
jmoralez opened this issue Feb 9, 2021 · 4 comments
Closed

[dask] Support asynchronous workflows #3929

jmoralez opened this issue Feb 9, 2021 · 4 comments

Comments

@jmoralez
Copy link
Collaborator

jmoralez commented Feb 9, 2021

Summary

Implementing coroutines for training and computing predictions with an asynchronous dask client.

Motivation

By having an asynchronous interface, LightGBM's distributed training with dask could be used in concurrent applications. One possible use case for this would be having a trained LightGBM model in a web application and get predictions from it in a non-blocking fashion. Another possibility could be having an API that takes in some configuration as a POST request, starts a remote cluster and trains a LightGBM model on it, having this interface would allow several models to be trained concurrently.

Description

This can be achieved by implementing coroutines for the train and predict functions and then using client.sync on them to get the synchronous variants.

References

@jameslamb
Copy link
Collaborator

Thanks for opening this! Could you please provide a more specific XGBoost link, to the parts of their code that specifically allow async access?

Could you also add details on why someone would want to use LightGBM with Dask asynchronously?

@jmoralez
Copy link
Collaborator Author

jmoralez commented Feb 9, 2021

Hi James. I've updated my comment with some cases I can think of. I realize this probably needs to come after some other building blocks but I'd like to work towards this, what do you think would be needed first?

@jameslamb
Copy link
Collaborator

Thanks very much for that. I added a link to XGBoost's docs on this topic as well: https://xgboost.readthedocs.io/en/latest/tutorials/dask.html#working-with-asyncio.

I realize this probably needs to come after some other building blocks but I'd like to work towards this

If you're interested in contributing further, we'd be very grateful! But to be honest, I think that a lot of fundamental pieces need to be added before we consider supporting asynchronous training / prediction in the Dask interface.

The Dask interface is still very new and is missing big features like init_score (#3807), the other boosting types (#3896), and the other distributed training modes (#3834), to name a few.

Adding support for init_score (#3807) or predict(raw_score=True) (#3793) are good next steps.

@jameslamb
Copy link
Collaborator

I've added this to #2302 , where we keep all feature requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants