-
Notifications
You must be signed in to change notification settings - Fork 716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add local Llama2 support from llama2-wrapper backend #400
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making this! A few comments
streaming=True, | ||
# openai_api_base=url, | ||
# temporaryly use fixed url | ||
openai_api_base="http://localhost:8001/v1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can make this an env
# need figure out how to set up llama2wrapper in frontend | ||
from realtime_ai_character.llm.llama2wrapper_llm import Llama2wrapperLlm | ||
|
||
return Llama2wrapperLlm(url=model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's keep the branching logic for the formal PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also might need a convention to route to local, e.g. maybe just call it local
for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I found that OPENAI_API_KEY
in .env
is always required and if not will raise the error:
openai.error.AuthenticationError: Incorrect API key provided: YOUR_API_KEY. You can find your API key at https://platform.openai.com/account/api-keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's keep the branching logic for the formal PR
If I keep the branching logic here, the arg model
will always be "gpt-3.5-turbo-16k", then initial a OpenaiLlm
.
I think the reason is that there is no local
button on frontend, and my choice GPT-3.5
will always set arg model
as gpt-3.5-turbo-16k
.
And LLM_MODEL_USE
from .env
is overwritten by frontend choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(presumably this is unusable on 3090 / too slow, right? ) @liltom-eth - do you have an a100 - or 2x 4090s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your current code makes showcasing the demo easier, but for us to merge the code base we should still aim to incorporate with existing logic. I suggest we first make the backend part ready.
For the frontend selection, we can make an environment variable or UI advanced option to enable local Llama inference. When this is toggled, the model string passed to the backend can be your choice here in the backend. The frontend part can be a separate PR if you would like. For testing only, you can change the model string of the "Llama-2-70b" to test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(presumably this is unusable on 3090 / too slow, right? ) @liltom-eth - do you have an a100 - or 2x 4090s?
I believe it is usable on 3090, (running gptq model 18.85 tokens/sec on 2080ti).
But right now when I was running on Windows WSL2 to demo on 2080ti, I got some errors on Realchar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your current code makes showcasing the demo easier, but for us to merge the code base we should still aim to incorporate with existing logic. I suggest we first make the backend part ready.
For the frontend selection, we can make an environment variable or UI advanced option to enable local Llama inference. When this is toggled, the model string passed to the backend can be your choice here in the backend. The frontend part can be a separate PR if you would like. For testing only, you can change the model string of the "Llama-2-70b" to test.
Thank you! I will test it by using "Llama-2-70b" button in this PR. Another PR for frontend would be helphul.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pycui When I tried the frontend button "Llama-2-70b", it always through an error like:
Is that error happening because of checking anyscale key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems because using a non-3.5 model directs you to a firebase auth, but you probably don't have a working Firebase App. You can probably edit thisclient/web/src/App.jsx
L218, so that (in your test) your model name doesn't require a sign-in.
before this gets merged - theres some caveats to be mindful with these local llms. so everyone is using the quantized / smaller 4 or 5bit models to get anything usable. there's also contention on what models get merged - and this becomes tech spike this one seems great - then it's their problem to update the models. |
# need figure out how to set up model=url in frontend | ||
# if select "Llama-2-70b" button from frontend, | ||
# model here will be "meta-llama/Llama-2-70b-chat-hf" | ||
model = os.getenv('LOCAL_LLM_URL') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pycui Thank you! Have made some updates based on your suggestions.
If I select "Llama-2-70b" button from frontend, model
here will be "meta-llama/Llama-2-70b-chat-hf".
Thus I load model
temporarily from .env here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I made some change to still use the model param. For testing, you can modify the frontend to pass localhost as model name.
.env.example
Outdated
@@ -24,6 +24,9 @@ OPENAI_API_KEY=YOUR_API_KEY | |||
ANTHROPIC_API_KEY=YOUR_API_KEY | |||
# Anyscale Endpoint API Key | |||
ANYSCALE_ENDPOINT_API_KEY= | |||
# Local LLM with Openai Compatiable API | |||
# LOCAL_LLM_URL="http://localhost:8001/v1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# LOCAL_LLM_URL="http://localhost:8001/v1" | |
# Example value: "http://localhost:8001/v1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
temperature=0.5, | ||
streaming=True, | ||
openai_api_base=url, | ||
# openai_api_base="http://localhost:8001/v1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Made an update to clean this.
Thanks! That is a good idea. A model catalog can be helpful for users and developers. |
* add llama2-wrapper as local backend * update local llm backend * update local llm backend * update * Update __init__.py * Update __init__.py --------- Co-authored-by: Piaoyang Cui <bcstyle@gmail.com>
* Add local Llama2 support from llama2-wrapper backend (#400) * add llama2-wrapper as local backend * update local llm backend * update local llm backend * update * Update __init__.py * Update __init__.py --------- Co-authored-by: Piaoyang Cui <bcstyle@gmail.com> * Fix style issues and refine code (#425) * Minor fix * fix build * fix style issues and refine code * Use consistent name style * Add API_HOST to react-web (#426) * Update style to fit tablet screens (#427) * Update README.md (minor typo) 😅 (#429) * Add a Render deployment guide (#431) * Add a Render deployment guide * Update render_deploy.md * Lint * Format * Lei/use zustand (#428) * update page logic * Apply zustand, fix minor bugs * Solve the scroll issue * Upload zustand files * minor fix * Add local Llama2 support from llama2-wrapper backend (#400) * add llama2-wrapper as local backend * update local llm backend * update local llm backend * update * Update __init__.py * Update __init__.py --------- Co-authored-by: Piaoyang Cui <bcstyle@gmail.com> * Fix style issues and refine code (#425) * Minor fix * fix build * fix style issues and refine code * Use consistent name style * Add API_HOST to react-web (#426) * Update style to fit tablet screens (#427) * Update README.md (minor typo) 😅 (#429) * Add a Render deployment guide (#431) * Add a Render deployment guide * Update render_deploy.md * Lint * Format * Lei/use zustand (#428) * update page logic * Apply zustand, fix minor bugs * Solve the scroll issue * Upload zustand files * minor fix --------- Co-authored-by: Tom <plain1994@gmail.com> Co-authored-by: Piaoyang Cui <bcstyle@gmail.com> Co-authored-by: Lei Qiu <amethystlei@gmail.com> Co-authored-by: Devansh <mdevansh28@gmail.com>
* Add local Llama2 support from llama2-wrapper backend (#400) * add llama2-wrapper as local backend * update local llm backend * update local llm backend * update * Update __init__.py * Update __init__.py --------- Co-authored-by: Piaoyang Cui <bcstyle@gmail.com> * Fix style issues and refine code (#425) * Minor fix * fix build * fix style issues and refine code * Use consistent name style * Add API_HOST to react-web (#426) * Update style to fit tablet screens (#427) * Update README.md (minor typo) 😅 (#429) * Add a Render deployment guide (#431) * Add a Render deployment guide * Update render_deploy.md * Lint * Format * Lei/use zustand (#428) * update page logic * Apply zustand, fix minor bugs * Solve the scroll issue * Upload zustand files * minor fix * update cli to support next-web (#432) --------- Co-authored-by: Tom <plain1994@gmail.com> Co-authored-by: Piaoyang Cui <bcstyle@gmail.com> Co-authored-by: Lei Qiu <amethystlei@gmail.com> Co-authored-by: Devansh <mdevansh28@gmail.com>
* Add local Llama2 support from llama2-wrapper backend (#400) * add llama2-wrapper as local backend * update local llm backend * update local llm backend * update * Update __init__.py * Update __init__.py --------- Co-authored-by: Piaoyang Cui <bcstyle@gmail.com> * Fix style issues and refine code (#425) * Minor fix * fix build * fix style issues and refine code * Use consistent name style * Add API_HOST to react-web (#426) * Update style to fit tablet screens (#427) * Update README.md (minor typo) 😅 (#429) * Add a Render deployment guide (#431) * Add a Render deployment guide * Update render_deploy.md * Lint * Format * Lei/use zustand (#428) * update page logic * Apply zustand, fix minor bugs * Solve the scroll issue * Upload zustand files * minor fix * update cli to support next-web (#432) * Update .gitignore (#433) * local dev change * Update .gitignore (#436) * Reduce VAD latency. (#430) * Lei/mobile next web (#437) * Fix the avatar size in home page * Update home page style to support mobile device * Add mobile support for most of the page * Remove 'add character' when small screen * Lei/mobile next web (#439) * Fix the avatar size in home page * Update home page style to support mobile device * Add mobile support for most of the page * Remove 'add character' when small screen * Finish hamburger menu and update page layout * Fix minor layout issues * Add ion (#442) --------- Co-authored-by: Tom <plain1994@gmail.com> Co-authored-by: Piaoyang Cui <bcstyle@gmail.com> Co-authored-by: Lei Qiu <amethystlei@gmail.com> Co-authored-by: Devansh <mdevansh28@gmail.com> Co-authored-by: Fangbai Chai <139947087+hksfang@users.noreply.github.com>
* Add local Llama2 support from llama2-wrapper backend (#400) * add llama2-wrapper as local backend * update local llm backend * update local llm backend * update * Update __init__.py * Update __init__.py --------- Co-authored-by: Piaoyang Cui <bcstyle@gmail.com> * Fix style issues and refine code (#425) * Minor fix * fix build * fix style issues and refine code * Use consistent name style * Add API_HOST to react-web (#426) * Update style to fit tablet screens (#427) * Update README.md (minor typo) 😅 (#429) * Add a Render deployment guide (#431) * Add a Render deployment guide * Update render_deploy.md * Lint * Format * Lei/use zustand (#428) * update page logic * Apply zustand, fix minor bugs * Solve the scroll issue * Upload zustand files * minor fix * update cli to support next-web (#432) * Update .gitignore (#433) * Update .gitignore (#436) * Reduce VAD latency. (#430) * Lei/mobile next web (#437) * Fix the avatar size in home page * Update home page style to support mobile device * Add mobile support for most of the page * Remove 'add character' when small screen * Lei/mobile next web (#439) * Fix the avatar size in home page * Update home page style to support mobile device * Add mobile support for most of the page * Remove 'add character' when small screen * Finish hamburger menu and update page layout * Fix minor layout issues * Add ion (#442) * Avatar embedding (#441) * fix: update audio * feat: avatar generation embedding * chore: move embedding to top * Lei/mobile next web (#439) * Fix the avatar size in home page * Update home page style to support mobile device * Add mobile support for most of the page * Remove 'add character' when small screen * Finish hamburger menu and update page layout * Fix minor layout issues * fix: no audio in other character --------- Co-authored-by: Lei Qiu <amethystlei@gmail.com> * Add info loggers showing latencies of STT, LLM, TTS processes (#445) * deployment working except for voice cloning * update README: new issue about tts doesn't speak due to bad llm response * deployment successful; essential features all function * update README * prepare to merge with main * Add info loggers showing latencies of STT, LLM, TTS processes * update .gitignore * untrack reset_databash.sh * update README * Add info loggers showing latencies of STT, LLM, TTS processes * Add more latency monitors specific for the APIs * Refactor the timers into decorators; Report latencies together * Add terms of service page (#453) --------- Co-authored-by: Tom <plain1994@gmail.com> Co-authored-by: Piaoyang Cui <bcstyle@gmail.com> Co-authored-by: Lei Qiu <amethystlei@gmail.com> Co-authored-by: Devansh <mdevansh28@gmail.com> Co-authored-by: Fangbai Chai <139947087+hksfang@users.noreply.github.com> Co-authored-by: Edwin Wong <73209427+HongSiu@users.noreply.github.com> Co-authored-by: Yi Guo <guoyi0328@gmail.com>
* Add local Llama2 support from llama2-wrapper backend (#400) * add llama2-wrapper as local backend * update local llm backend * update local llm backend * update * Update __init__.py * Update __init__.py --------- Co-authored-by: Piaoyang Cui <bcstyle@gmail.com> * Fix style issues and refine code (#425) * Minor fix * fix build * fix style issues and refine code * Use consistent name style * Add API_HOST to react-web (#426) * Update style to fit tablet screens (#427) * Update README.md (minor typo) 😅 (#429) * Add a Render deployment guide (#431) * Add a Render deployment guide * Update render_deploy.md * Lint * Format * Lei/use zustand (#428) * update page logic * Apply zustand, fix minor bugs * Solve the scroll issue * Upload zustand files * minor fix * update cli to support next-web (#432) * Update .gitignore (#433) * Update .gitignore (#436) * Reduce VAD latency. (#430) * Lei/mobile next web (#437) * Fix the avatar size in home page * Update home page style to support mobile device * Add mobile support for most of the page * Remove 'add character' when small screen * Lei/mobile next web (#439) * Fix the avatar size in home page * Update home page style to support mobile device * Add mobile support for most of the page * Remove 'add character' when small screen * Finish hamburger menu and update page layout * Fix minor layout issues * Add ion (#442) * Avatar embedding (#441) * fix: update audio * feat: avatar generation embedding * chore: move embedding to top * Lei/mobile next web (#439) * Fix the avatar size in home page * Update home page style to support mobile device * Add mobile support for most of the page * Remove 'add character' when small screen * Finish hamburger menu and update page layout * Fix minor layout issues * fix: no audio in other character --------- Co-authored-by: Lei Qiu <amethystlei@gmail.com> * Add info loggers showing latencies of STT, LLM, TTS processes (#445) * deployment working except for voice cloning * update README: new issue about tts doesn't speak due to bad llm response * deployment successful; essential features all function * update README * prepare to merge with main * Add info loggers showing latencies of STT, LLM, TTS processes * update .gitignore * untrack reset_databash.sh * update README * Add info loggers showing latencies of STT, LLM, TTS processes * Add more latency monitors specific for the APIs * Refactor the timers into decorators; Report latencies together * Add terms of service page (#453) * Implement next-web functionalities. * fix small issue recorderSlice.js (#455) --------- Co-authored-by: Tom <plain1994@gmail.com> Co-authored-by: Piaoyang Cui <bcstyle@gmail.com> Co-authored-by: Lei Qiu <amethystlei@gmail.com> Co-authored-by: Devansh <mdevansh28@gmail.com> Co-authored-by: Fangbai Chai <139947087+hksfang@users.noreply.github.com> Co-authored-by: Edwin Wong <73209427+HongSiu@users.noreply.github.com> Co-authored-by: Yi Guo <guoyi0328@gmail.com> Co-authored-by: Fangbai Chai <fangbaichai@gmail.com>
Hi @Shaunwei @pycui ,
I am working on the project llama2-wrapper to make it easily call Llama2 model locally as an LLM backend.
And to follow up on the Twitter discussion, I made this PR as a showcase running Realchar and Llama2 locally on an M2 Macbook Air.
Here is the demo:
How to run on Mac:
Run OpenAI Compatible API on Llama2 models
Start Realchar
Implementation
I found it hard to load local LLM object directly as backend since Realchar is using
langchain.chat_models
as the LLM.Thus I chose to run a local LLM as OpenAI Compatible API, then call
langchain.chat_models.ChatOpenAI
to run LLM from the local URL.Issues
Now the PR still has issues automatically passing customize URL from
.env
as the model URL to llm. I haven't figured out how to add a new LLM option in the new Realchar Web UI and hard code to make Realchar run on llama2-wrapper.Showcase
This showcase is running Realchar and Llama2 on Mac. (13.70 tokens/sec through llama.cpp)
Another interesting showcase might be running Realchar and Llama2 on free colab T4 GPU. (18.19 tokens/sec through gptq)