-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
4096 context length #26
Comments
Hi @vadi2, Just made it 8192! Let us know if you have any other issues. |
Hm, damn, hitting some hardware limits:
I was able to use 4k context on llama 13b, but that was on a single, local video card with 16gb vram. |
@vadi2 Thanks for letting us know! Now, the context size is enough but we don't allocate enough GPU memory to process all the context at once. We're fixing it here by splitting large queries into chunks during processing: bigscience-workshop/petals#403 Once we merge it, you'll be able to use your code without changes. |
This update became a part of Petals 2.0.1. I've tested that a prefix of 8000 tokens didn't work before but works after the update. Let us know if you meet any other issues! |
Llama 2 supports a context length of 4096, but it looks like chat.petals is limited to 2048:
Could the increased context length be supported?
The text was updated successfully, but these errors were encountered: