13 Sep 13:09

4304e4f

Latest

Pixtral

Mistral models can now 👀 !

pip install --upgrade mistral_inference   # >= 1.4.0

Download:

from huggingface_hub import snapshot_download
from pathlib import Path

mistral_models_path = Path.home().joinpath('mistral_models', 'Pixtral')
mistral_models_path.mkdir(parents=True, exist_ok=True)

snapshot_download(repo_id="mistralai/Pixtral-12B-2409", allow_patterns=["params.json", "consolidated.safetensors", "tekken.json"], local_dir=mistral_models_path)

CLI example:

mistral-chat $HOME/mistral_models/Pixtral --instruct --max_tokens 256 --temperature 0.35

E.g. Try out something like:

Text prompt: What can you see on the following picture?
[You can input zero, one or more images now.]
Image path or url [Leave empty and press enter to finish image input]: https://picsum.photos/id/237/200/300
Image path or url [Leave empty and press enter to finish image input]:
I see a black dog lying on a wooden surface. The dog appears to be looking up, and its eyes are clearly visible.

Python:

Load the model

from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate

from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage, TextChunk, ImageURLChunk
from mistral_common.protocol.instruct.request import ChatCompletionRequest

tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tekken.json")
model = Transformer.from_folder(mistral_models_path)

Run:

url = "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"
prompt = "Describe the image."

completion_request = ChatCompletionRequest(messages=[UserMessage(content=[ImageURLChunk(image_url=url), TextChunk(text=prompt)])])

encoded = tokenizer.encode_chat_completion(completion_request)

images = encoded.images
tokens = encoded.tokens

out_tokens, _ = generate([tokens], model, images=[images], max_tokens=256, temperature=0.35, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.decode(out_tokens[0])

print(result)

Assets 2

18 Jul 14:01

patrickvonplaten

v1.3.0

21790d6

v1.3.0 Mistral-Nemo

Welcome Mistral-Nemo from Mistral 🤝 NVIDIA

Read more about Mistral-Nemo here.

Install

pip install mistral-inference>=1.3.0

Download

export NEMO_MODEL=$HOME/12B_NEMO_MODEL
wget https://models.mistralcdn.com/mistral-nemo-2407/mistral-nemo-instruct-2407.tar
mkdir -p $NEMO_MODEL
tar -xf mistral-nemo-instruct-v0.1.tar -C $NEMO_MODEL

Chat

mistral-chat $HOME/NEMO_MODEL --instruct --max_tokens 1024

or directly in Python:

import os
from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate

from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest

tokenizer = MistralTokenizer.from_model("mistral-nemo")
model = Transformer.from_folder(os.environ.get("NEMO_MODEL"))

prompt = "How expensive would it be to ask a window cleaner to clean all windows in Paris. Make a reasonable guess in US Dollar."

completion_request = ChatCompletionRequest(messages=[UserMessage(content=prompt)])

tokens = tokenizer.encode_chat_completion(completion_request).tokens

out_tokens, _ = generate([tokens], model, max_tokens=1024, temperature=0.35, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.decode(out_tokens[0])

print(result)

Function calling:

from mistral_common.protocol.instruct.tool_calls import Function, Tool
from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate

from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest


tokenizer = MistralTokenizer.from_model("mistral-nemo")
model = Transformer.from_folder(os.environ.get("NEMO_MODEL"))

completion_request = ChatCompletionRequest(
    tools=[
        Tool(
            function=Function(
                name="get_current_weather",
                description="Get the current weather",
                parameters={
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA",
                        },
                        "format": {
                            "type": "string",
                            "enum": ["celsius", "fahrenheit"],
                            "description": "The temperature unit to use. Infer this from the users location.",
                        },
                    },
                    "required": ["location", "format"],
                },
            )
        )
    ],
    messages=[
        UserMessage(content="What's the weather like today in Paris?"),
        ],
)

tokens = tokenizer.encode_chat_completion(completion_request).tokens

out_tokens, _ = generate([tokens], model, max_tokens=256, temperature=0.35, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.decode(out_tokens[0])

print(result)

Summary

The Mistral-Nemo-Instruct-2407 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-Nemo-Base-2407. Trained jointly by Mistral AI and NVIDIA, it significantly outperforms existing models smaller or similar in size.

For more details about this model please refer to our release blog post.

Key features

Released under the Apache 2 License
Pre-trained and instructed versions
Trained with a 128k context window
Trained on a large proportion of multilingual and code data
Drop-in replacement of Mistral 7B

Model Architecture

Mistral Nemo is a transformer model, with the following architecture choices:

Layers: 40
Dim: 5,120
Head dim: 128
Hidden dim: 14,436
Activation Function: SwiGLU
Number of heads: 32
Number of kv-heads: 8 (GQA)
Vocabulary size: 2**17 ~= 128k
Rotary embeddings (theta = 1M)

Metrics

Main Benchmarks

Benchmark	Score
HellaSwag (0-shot)	83.5%
Winogrande (0-shot)	76.8%
OpenBookQA (0-shot)	60.6%
CommonSenseQA (0-shot)	70.4%
TruthfulQA (0-shot)	50.3%
MMLU (5-shot)	68.0%
TriviaQA (5-shot)	73.8%
NaturalQuestions (5-shot)	31.2%

Multilingual Benchmarks (MMLU)

Language	Score
French	62.3%
German	62.7%
Spanish	64.6%
Italian	61.3%
Portuguese	63.3%
Russian	59.2%
Chinese	59.0%
Japanese	59.0%

What's Changed

Tekken by @patrickvonplaten in #193

Full Changelog: v1.2.0...v1.3.0

Contributors

patrickvonplaten

Assets 2

16 Jul 12:11

patrickvonplaten

v1.2.0

2f8b5b2

v1.2.0 Add Mamba

Welcome 🐍 Codestral-Mamba and 🔢 Mathstral

pip install mistral-inference>=1.2.0

Codestral-Mamba

pip install packaging mamba-ssm causal-conv1d transformers

Download

export MAMBA_CODE=$HOME/7B_MAMBA_CODE
wget https://models.mistralcdn.com/codestral-mamba-7b-v0-1/codestral-mamba-7B-v0.1.tar
mkdir -p $MAMBA_CODE
tar -xf codestral-mamba-7B-v0.1.tar -C $MAMBA_CODE

Chat

mistral-chat $HOME/7B_MAMBA_CODE --instruct --max_tokens 256

Mathstral

Download

export MATHSTRAL=$HOME/7B_MATH
wget https://models.mistralcdn.com/mathstral-7b-v0-1/mathstral-7B-v0.1.tar
mkdir -p $MATHSTRAL
tar -xf mathstral-7B-v0.1.tar -C $MATHSTRAL

Chat

mistral-chat $HOME/7B_MATH --instruct --max_tokens 256

Blogs:
Blog Codestral Mamba 7B: https://mistral.ai/news/codestral-mamba/
Blog Mathstral 7B: https://mistral.ai/news/mathstral/

What's Changed

add a note about GPU requirement by @sophiamyang in #158
Add codestral by @patrickvonplaten in #164
Update README.md by @patrickvonplaten in #165
fixing type in README.md by @didier-durand in #175
Fix: typo in ModelArgs: "infered" to "inferred" by @CharlesCNorton in #174
fix: typo in LoRALoaderMixin: correct "multipe" to "multiple" by @CharlesCNorton in #173
fix: Correct typo in classifier.ipynb from "alborithm" to "algorithm" by @CharlesCNorton in #167
Fix: typo in error message for state_dict validation by @CharlesCNorton in #172
fix: Correct misspelling in ModelArgs docstring by @CharlesCNorton in #171
Update README.md by @patrickvonplaten in #168
fix: typo in HF_TOKEN environment variable check message by @CharlesCNorton in #179
Adding Issue/Bug template. by @pandora-s-git in #178
typo in ModelArgs class docstring. by @CharlesCNorton in #183
Update README.md by @Simontwice in #184
Add mamba by @patrickvonplaten in #187

New Contributors

@didier-durand made their first contribution in #175
@CharlesCNorton made their first contribution in #174
@pandora-s-git made their first contribution in #178
@Simontwice made their first contribution in #184

Full Changelog: v1.1.0...v1.2.0

Contributors

didier-durand, patrickvonplaten, and 4 other contributors

Assets 2

24 May 18:31

patrickvonplaten

v1.1.0

cd06b0d

v1.1.0 Add LoRA

mistral-inference==1.1.0 supports running LoRA models that were trained with: https://github.com/mistralai/mistral-finetune

Having trained a 7B base LoRA, you can run mistral-inference as follows:

from mistral_inference.model import Transformer
from mistral_inference.generate import generate

from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest


MODEL_PATH = "path/to/downloaded/7B_base_dir"

tokenizer = MistralTokenizer.from_file(f"{MODEL_PATH}/tokenizer.model.v3")  # change to extracted tokenizer file
model = Transformer.from_folder(MODEL_PATH)  # change to extracted model dir
model.load_lora("/path/to/run_lora_dir/checkpoints/checkpoint_000300/consolidated/lora.safetensors")

completion_request = ChatCompletionRequest(messages=[UserMessage(content="Explain Machine Learning to me in a nutshell.")])

tokens = tokenizer.encode_chat_completion(completion_request).tokens

out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])

print(result)

Assets 2

22 May 16:30

patrickvonplaten

v1.0.4

629631c

v1.0.4 - Mistral-inference

Mistral-inference is the official inference library for all Mistral models: 7B, 8x7B, 8x22B.

Install with:

pip install mistral-inference

Run with:

from mistral_inference.model import Transformer
from mistral_inference.generate import generate

from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.protocol.instruct.tool_calls import Function, Tool

tokenizer = MistralTokenizer.from_file("/path/to/tokenizer/file")  # change to extracted tokenizer file
model = Transformer.from_folder("./path/to/model/folder")  # change to extracted model dir

from mistral_common.protocol.instruct.tool_calls import Function, Tool

completion_request = ChatCompletionRequest(
    tools=[
        Tool(
            function=Function(
                name="get_current_weather",
                description="Get the current weather",
                parameters={
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA",
                        },
                        "format": {
                            "type": "string",
                            "enum": ["celsius", "fahrenheit"],
                            "description": "The temperature unit to use. Infer this from the users location.",
                        },
                    },
                    "required": ["location", "format"],
                },
            )
        )
    ],
    messages=[
        UserMessage(content="What's the weather like today in Paris?"),
        ],
)

tokens = tokenizer.encode_chat_completion(completion_request).tokens

out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])

print(result)

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Welcome Mistral-Nemo from Mistral 🤝 NVIDIA

Summary

Key features

Model Architecture

Metrics

Main Benchmarks

Multilingual Benchmarks (MMLU)

What's Changed

Contributors

Welcome 🐍 Codestral-Mamba and 🔢 Mathstral

What's Changed

New Contributors

Contributors

Releases: mistralai/mistral-inference

v1.4.0: Pixtral 👀

v1.3.0 Mistral-Nemo

Welcome Mistral-Nemo from Mistral 🤝 NVIDIA

Summary

Key features

Model Architecture

Metrics

Main Benchmarks

Multilingual Benchmarks (MMLU)

What's Changed

Contributors

v1.2.0 Add Mamba

Welcome 🐍 Codestral-Mamba and 🔢 Mathstral

What's Changed

New Contributors

Contributors

v1.1.0 Add LoRA

v1.0.4 - Mistral-inference