You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 25, 2022. It is now read-only.
Using the code written in the example notebook for GPT 3 2.7B I was wondering I anyone had any tips to speed up the inference of the model. Also, I was wondering how I could change the code to stop the decoder after a set number of characters. Does anyone have any advice or can point me in the right direction. Thank you.
The text was updated successfully, but these errors were encountered:
Hi, the main problem is the fact that our library uses tf.estimator. This isn't really designed for inference, as it has to reload the graph every time it's called.
re: your second point. I'm working on a PR to make sampling a bit more user friendly. This will include things like stopping generation after a set number of characters, or a specific token.
Thank you for responding, I appreciate the help. Is there anything I can do to help with the port over to hugging face? or on this repository?
You'd have to ask HuggingFace about that, but it seems like they mostly have everything under control.
This repository is open to PRs. If you could figure out a way to monkey patch tf-estimator like in the link i posted above so the graph doesn't have to be reloaded every time it's called, that's definitely something we'd be interested in using.
Hi Everyone,
Using the code written in the example notebook for GPT 3 2.7B I was wondering I anyone had any tips to speed up the inference of the model. Also, I was wondering how I could change the code to stop the decoder after a set number of characters. Does anyone have any advice or can point me in the right direction. Thank you.
The text was updated successfully, but these errors were encountered: