4096 context length #26

vadi2 · 2023-07-21T06:59:18Z

Llama 2 supports a context length of 4096, but it looks like chat.petals is limited to 2048:

ValueError: Maximum length exceeded: prefix 0 + current 4094 exceeds pre-allocated maximum 2048

Could the increased context length be supported?

The text was updated successfully, but these errors were encountered:

borzunov · 2023-07-21T08:18:58Z

Hi @vadi2,

Just made it 8192!

Let us know if you have any other issues.

vadi2 · 2023-07-21T08:34:13Z

Hm, damn, hitting some hardware limits:

hivemind.p2p.p2p_daemon_bindings.utils.P2PHandlerError: Failed to call handler TransformerConnectionHandler.rpc_inference at 12D3KooWNpRZJBknqGUQv4AFSqVBZaDn6aBTvavnqF9VitdU8vhr: CUDA out of memory. Tried to allocate 2.01 GiB (GPU 0; 23.69 GiB total capacity; 20.18 GiB already allocated; 1.88 GiB free; 20.51 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I was able to use 4k context on llama 13b, but that was on a single, local video card with 16gb vram.

borzunov · 2023-07-22T16:55:29Z

@vadi2 Thanks for letting us know! Now, the context size is enough but we don't allocate enough GPU memory to process all the context at once. We're fixing it here by splitting large queries into chunks during processing: bigscience-workshop/petals#403

Once we merge it, you'll be able to use your code without changes.

borzunov · 2023-07-26T07:36:57Z

This update became a part of Petals 2.0.1. I've tested that a prefix of 8000 tokens didn't work before but works after the update.

Let us know if you meet any other issues!

borzunov closed this as completed Jul 21, 2023

borzunov reopened this Jul 22, 2023

borzunov closed this as completed Jul 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4096 context length #26

4096 context length #26

vadi2 commented Jul 21, 2023

borzunov commented Jul 21, 2023

vadi2 commented Jul 21, 2023 •

edited

Loading

borzunov commented Jul 22, 2023

borzunov commented Jul 26, 2023 •

edited

Loading

4096 context length #26

4096 context length #26

Comments

vadi2 commented Jul 21, 2023

borzunov commented Jul 21, 2023

vadi2 commented Jul 21, 2023 • edited Loading

borzunov commented Jul 22, 2023

borzunov commented Jul 26, 2023 • edited Loading

vadi2 commented Jul 21, 2023 •

edited

Loading

borzunov commented Jul 26, 2023 •

edited

Loading