Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mistral] Mistral-7B-v0.1 support #1196

Merged
merged 7 commits into from
Sep 28, 2023
Merged

Conversation

Bam4d
Copy link
Contributor

@Bam4d Bam4d commented Sep 27, 2023

No description provided.


import torch
from torch import nn
from transformers import MistralConfig
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not work because MistralConfig is not a regular model in HF transformers at the moment (v4.33.3). Could you define this config class just like this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timlacroix Besides this, it seems everything works fine!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok addressed. Will we need to change this back after the next release ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timlacroix Yes. Once a new version of HF transformers is released, we will remove it.

@WoosukKwon WoosukKwon mentioned this pull request Sep 27, 2023
5 tasks
@casper-hansen
Copy link
Contributor

casper-hansen commented Sep 27, 2023

The Mistral model is almost equivalent to llama in terms of quantizing the model, it would be super easy to extend support as I have already added Mistral in AutoAWQ. If you can modify this part below, you will enable AWQ quantized models:

_MODEL_CLASSES_SUPPORT_QUANTIZATION = [
    LlamaForCausalLM,
    MistralForCausalLM,
]

After that, you should be able to run inference with the quantized model that is already available: https://huggingface.co/casperhansen/mistral-7b-instruct-v0.1-awq

from vllm import LLM, SamplingParams

prompts = [
    "The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(model="casperhansen/mistral-7b-instruct-v0.1-awq", quantization="awq", dtype="half")

outputs = llm.generate(prompts, sampling_params)

# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

@WoosukKwon WoosukKwon linked an issue Sep 27, 2023 that may be closed by this pull request
Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. As this PR is not modifiable, I will fix some miscellaneous issues right after merging this PR.

@WoosukKwon WoosukKwon mentioned this pull request Sep 28, 2023
@WoosukKwon WoosukKwon merged commit bb1ba58 into vllm-project:main Sep 28, 2023
2 checks passed
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
Co-authored-by: timlacroix <t@mistral.ai>
sjchoi1 pushed a commit to casys-kaist-internal/vllm that referenced this pull request May 7, 2024
Co-authored-by: timlacroix <t@mistral.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for Mistral 7B
4 participants