-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issues for Whisper export with beam search #15619
Fix issues for Whisper export with beam search #15619
Conversation
Hi @kunal-vaishnavi, how can one load the exported ONNX models into HuggingFace optimum? I ran the statement mentioned here: https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/python/tools/transformers/models/whisper#exporting-whisper-with-beam-search
which lead to these files:
I am running into an error while running:
The error is:
I added |
Thank you for specifying the steps you followed.
Optimum exports Whisper as three models: encoder, decoder, and decoder with past. This custom export creates one combined model,
Optimum expects separate encoder, decoder, and decoder with past models as well as several JSON files to be located within Optimum + ONNX RuntimeYou can export Whisper using Optimum and save the generated models.
Then you can optimize and run Whisper with Optimum + ONNX Runtime by following the example in the PR linked above. |
Thanks @kunal-vaishnavi , giving it a try. Can you please provide an example of how one can use the |
I am trying this, it works but have confusion regarding the
|
We are planning to remove the In your above example, you can use Hugging Face's
|
Thanks a lot @kunal-vaishnavi for your help. |
Description
This PR fixes an issue with calling the ORT transformer optimizer script on the custom export of Whisper with beam search. It also includes the fix for the GPU out-of-memory issue.
Motivation and Context
With this PR fix, the optimizer runs as described in the Whisper model optimization PR.