-
Notifications
You must be signed in to change notification settings - Fork 19
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ggml-qnn: refine ggml backend subsystem (#216)
- Loading branch information
Showing
7 changed files
with
341 additions
and
187 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
bee4a4b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#216 in this project
ggerganov/llama.cpp#7641 in upstream
whisper and llm and minicpm-v inference using QNN backend works fine as expected on Xiaomi14.
new ggml backend can following this style for mixed inference between CPU&GPU / CPU&NPU very easily and just focus on bottle-neck performance fine-tuning for edge AI inference on Android phone.
there are three known bugs(some UT cases in JNI layer and a resouce clenup issue in llm infererence) in this commit.