Bump up transformers version & Remove MistralConfig #1254

WoosukKwon · 2023-10-03T15:26:50Z

Now that MistralConfig is officially supported by the stable release of HF transformers, we can remove our MistralConfig.

zhuohan123

The PR LGTM! However, I think we need to make sure that the tokenizer performance is good before we merge this PR.

TensorTemplar · 2023-10-04T06:26:12Z

Hey i built this from source and there is an issue with the openai server:

please consider pinning the pydantic version to avoid:

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/src/vllm/vllm/entrypoints/openai/api_server.py", line 21, in <module>
    from vllm.entrypoints.openai.protocol import (
  File "/usr/src/vllm/vllm/entrypoints/openai/protocol.py", line 111, in <module>
    class CompletionResponseChoice(BaseModel):
  File "pydantic/main.py", line 198, in pydantic.main.ModelMetaclass.__new__
  File "pydantic/fields.py", line 506, in pydantic.fields.ModelField.infer
  File "pydantic/fields.py", line 436, in pydantic.fields.ModelField.__init__
  File "pydantic/fields.py", line 552, in pydantic.fields.ModelField.prepare
  File "pydantic/fields.py", line 661, in pydantic.fields.ModelField._type_analysis
  File "pydantic/fields.py", line 668, in pydantic.fields.ModelField._type_analysis
  File "/usr/lib/python3.8/typing.py", line 774, in __subclasscheck__
    return issubclass(cls, self.__origin__)
TypeError: issubclass() arg 1 must be a class

this worked for me into requirements.txt pydantic >=1.10.12, < 2

The next issue is a missing fschat dependency:
when trying to call the endpoint:

curl -X 'POST' \
  'http://0.0.0.0:8000/v1/chat/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "mistralai/Mistral-7B-Instruct-v0.1",
  "messages": [{"role": "user", "content": "what is the capital of Germany?"}],
  "temperature": 0.7,
  "top_p": 1,
  "n": 1,
  "max_tokens": 20
}'

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 426, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/usr/local/lib/python3.8/dist-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/dist-packages/fastapi/applications.py", line 292, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.8/dist-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/cors.py", line 91, in __call__
    await self.simple_response(scope, receive, send, request_headers=headers)
  File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/cors.py", line 146, in simple_response
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.8/dist-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
    raise e
  File "/usr/local/lib/python3.8/dist-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/usr/local/lib/python3.8/dist-packages/fastapi/routing.py", line 273, in app
    raw_response = await run_endpoint_function(
  File "/usr/local/lib/python3.8/dist-packages/fastapi/routing.py", line 190, in run_endpoint_function
    return await dependant.call(**values)
  File "/usr/src/vllm/vllm/entrypoints/openai/api_server.py", line 206, in create_chat_completion
    prompt = await get_gen_prompt(request)
  File "/usr/src/vllm/vllm/entrypoints/openai/api_server.py", line 74, in get_gen_prompt
    raise ModuleNotFoundError(
ModuleNotFoundError: fastchat is not installed. Please install fastchat to use the chat completion and conversation APIs: `$ pip install fschat`

This can be fixed via pip uninstall transformer-engine -y and adding fschat to dependencies fschat[model_worker]==0.2.24
After this i get a correct response:

{
  "id": "cmpl-798b021c1adc46c987dd0f01160937b8",
  "object": "chat.completion",
  "created": 211402,
  "model": "mistralai/Mistral-7B-Instruct-v0.1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": " The capital city of Germany is Berlin. It is a vibrant and multicultural city with a"
      },
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 543,
    "total_tokens": 563,
    "completion_tokens": 20
  }
}

WoosukKwon · 2023-10-11T07:32:09Z

The PR LGTM! However, I think we need to make sure that the tokenizer performance is good before we merge this PR.

@zhuohan123 I think the performance issue is orthogonal to the PR, since new users will install the newest version of transformers and experience the performance issue anyway.

WoosukKwon added 2 commits October 3, 2023 15:24

Bump up transformers version & Remove MistralConfig

75f0771

Fix requirements.txt

9274505

WoosukKwon requested a review from zhuohan123 October 3, 2023 15:26

zhuohan123 approved these changes Oct 4, 2023

View reviewed changes

Merge branch 'main' into remove-mistral

3ed067f

Merge branch 'main' into remove-mistral

4a67678

WoosukKwon merged commit e7c8555 into main Oct 13, 2023
2 checks passed

WoosukKwon deleted the remove-mistral branch October 13, 2023 17:05

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Bump up transformers version & Remove MistralConfig (vllm-project#1254)

2b9ea9b

sjchoi1 pushed a commit to casys-kaist-internal/vllm that referenced this pull request May 7, 2024

Bump up transformers version & Remove MistralConfig (vllm-project#1254)

68c8970

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump up transformers version & Remove MistralConfig #1254

Bump up transformers version & Remove MistralConfig #1254

WoosukKwon commented Oct 3, 2023

zhuohan123 left a comment

TensorTemplar commented Oct 4, 2023 •

edited

Loading

WoosukKwon commented Oct 11, 2023

Bump up transformers version & Remove MistralConfig #1254

Bump up transformers version & Remove MistralConfig #1254

Conversation

WoosukKwon commented Oct 3, 2023

zhuohan123 left a comment

Choose a reason for hiding this comment

TensorTemplar commented Oct 4, 2023 • edited Loading

WoosukKwon commented Oct 11, 2023

TensorTemplar commented Oct 4, 2023 •

edited

Loading