-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: CPU Inference vllm_ops not defined #4275
Comments
@bsu3338 |
@zhouyuan services:
vllm-cpu-env:
image: vllm-cpu-env
command: ["python3","-m","vllm.entrypoints.openai.api_server", "--model", "meta-llama/Meta-Llama-3-70B-Instruct","--api-key","token-1234","--trust-remote-code","--dtype","auto"]
ports:
- 8000:8000
volumes:
- /srv/huggingface:/root/.cache/huggingface
- /srv/empty:/workspace/vllm/
environment:
- VLLM_TARGET_DEVICE=cpu
- HUGGING_FACE_HUB_TOKEN=hf_RANDOM
- VLLM_CPU_KVCACHE_SPACE=40 |
The above docker compose worked for me as well |
The problem occurs also in a kubernetes container (i.e. for Red Hat OpenShift). I noticed that for some reason the library (the compiled .egg) does not get properly installed inside the container and python can't find it. The workaround I found was to declare the Below is the Containerfile I used for the build:
Also, as a sidenote, the compiler installed inside the image was gcc-11.4.1 (the default one for rhel9 based images). |
Fixed by #5009 |
Your current environment
🐛 Describe the bug
Cloned the existing repository and built a docker image according to the CPU instructions
docker build -f Dockerfile.cpu -t vllm-cpu-env --shm-size=4g .
Did a sample curl request for a chat with powershell:
Used below for docker compose:
Get the below error:
The text was updated successfully, but these errors were encountered: