Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4096 context length #26

Closed
vadi2 opened this issue Jul 21, 2023 · 4 comments
Closed

4096 context length #26

vadi2 opened this issue Jul 21, 2023 · 4 comments

Comments

@vadi2
Copy link
Contributor

vadi2 commented Jul 21, 2023

Llama 2 supports a context length of 4096, but it looks like chat.petals is limited to 2048:

ValueError: Maximum length exceeded: prefix 0 + current 4094 exceeds pre-allocated maximum 2048

Could the increased context length be supported?

@borzunov
Copy link
Member

Hi @vadi2,

Just made it 8192!

Let us know if you have any other issues.

@vadi2
Copy link
Contributor Author

vadi2 commented Jul 21, 2023

Hm, damn, hitting some hardware limits:

hivemind.p2p.p2p_daemon_bindings.utils.P2PHandlerError: Failed to call handler TransformerConnectionHandler.rpc_inference at 12D3KooWNpRZJBknqGUQv4AFSqVBZaDn6aBTvavnqF9VitdU8vhr: CUDA out of memory. Tried to allocate 2.01 GiB (GPU 0; 23.69 GiB total capacity; 20.18 GiB already allocated; 1.88 GiB free; 20.51 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I was able to use 4k context on llama 13b, but that was on a single, local video card with 16gb vram.

@borzunov borzunov reopened this Jul 22, 2023
@borzunov
Copy link
Member

@vadi2 Thanks for letting us know! Now, the context size is enough but we don't allocate enough GPU memory to process all the context at once. We're fixing it here by splitting large queries into chunks during processing: bigscience-workshop/petals#403

Once we merge it, you'll be able to use your code without changes.

@borzunov
Copy link
Member

borzunov commented Jul 26, 2023

This update became a part of Petals 2.0.1. I've tested that a prefix of 8000 tokens didn't work before but works after the update.

Let us know if you meet any other issues!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants