Skip to content
This repository has been archived by the owner on Feb 25, 2022. It is now read-only.

Tips & tricks to speed up inference #179

Closed
danielpatrickhug opened this issue Mar 29, 2021 · 3 comments
Closed

Tips & tricks to speed up inference #179

danielpatrickhug opened this issue Mar 29, 2021 · 3 comments

Comments

@danielpatrickhug
Copy link

Hi Everyone,

Using the code written in the example notebook for GPT 3 2.7B I was wondering I anyone had any tips to speed up the inference of the model. Also, I was wondering how I could change the code to stop the decoder after a set number of characters. Does anyone have any advice or can point me in the right direction. Thank you.

@sdtblck
Copy link
Collaborator

sdtblck commented Mar 29, 2021

Hi, the main problem is the fact that our library uses tf.estimator. This isn't really designed for inference, as it has to reload the graph every time it's called.

If you want fast inference, I'd recommend using the HuggingFace port when it's ready. Another option would be to do something like https://github.com/marcsto/rl/blob/master/src/fast_predict2.py

re: your second point. I'm working on a PR to make sampling a bit more user friendly. This will include things like stopping generation after a set number of characters, or a specific token.

In the meantime, you can change the values that get passed into this function https://github.com/EleutherAI/gpt-neo/blob/master/model_fns.py#L99

@danielpatrickhug
Copy link
Author

Thank you for responding, I appreciate the help. Is there anything I can do to help with the port over to hugging face? or on this repository?

@sdtblck
Copy link
Collaborator

sdtblck commented Mar 29, 2021

Thank you for responding, I appreciate the help. Is there anything I can do to help with the port over to hugging face? or on this repository?

You'd have to ask HuggingFace about that, but it seems like they mostly have everything under control.

This repository is open to PRs. If you could figure out a way to monkey patch tf-estimator like in the link i posted above so the graph doesn't have to be reloaded every time it's called, that's definitely something we'd be interested in using.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants