Open-weight models
We open-source both pre-trained models and fine-tuned models. These models are not tuned for safety as we want to empower users to test and refine moderation based on their use cases. For safer models, follow our guardrailing tutorial.
Model | Open-weight | API | Description | Max Tokens | Endpoint |
---|---|---|---|---|---|
Mistral 7B | ✔️ Apache2 | ✔️ | The first dense model released by Mistral AI, perfect for experimentation, customization, and quick iteration. At the time of the release, it matched the capabilities of models up to 30B parameters. Learn more on our blog post | 32k | open-mistral-7b (aka mistral-tiny-2312 ) |
Mixtral 8x7B | ✔️ Apache2 | ✔️ | A sparse mixture of experts model. As such, it leverages up to 45B parameters but only uses about 12B during inference, leading to better inference throughput at the cost of more vRAM. Learn more on the dedicated blog post | 32k | open-mixtral-8x7b (aka mistral-small-2312 ) |
Mixtral 8x22B | ✔️ Apache2 | ✔️ | A bigger sparse mixture of experts model with larger context window. As such, it leverages up to 141B parameters but only uses about 39B during inference, leading to better inference throughput at the cost of more vRAM. Learn more on the dedicated blog post | 64k | open-mixtral-8x22b |
Codestral | ✔️ MNPL | ✔️ | A cutting-edge generative model that has been specifically designed and optimized for code generation tasks, including fill-in-the-middle and code completion | 32k | codestral-latest |
Codestral-Mamba | ✔️ | ✔️ | A Mamba 2 language model specialized in code generation. Learn more on our blog post | 256k | codestral-mamba-latest |
Mathstral | ✔️ | ✔️ | A math-specific 7B model designed for math reasoning and scientific tasks. Learn more on our blog post | 32k | NA |
License
- Mistral 7B, Mixtral 8x7B, Mixtral 8x22B, Codestral-Mamba, and Mathstral are under Apache 2 License, which permits their use without any constraints.
- Codestral is under Mistral AI Non-Production (MNPL) License.
Downloading
Model | Download links | Features |
---|---|---|
Mistral-7B-v0.1 | Hugging Face raw_weights (md5sum: 37dab53973db2d56b2da0a033a15307f ) | - 32k vocabulary size - Rope Theta = 1e4 - With sliding window |
Mistral-7B-Instruct-v0.2 | Hugging Face raw_weights (md5sum: fbae55bc038f12f010b4251326e73d39 ) | - 32k vocabulary size - Rope Theta = 1e6 - No sliding window |
Mistral-7B-v0.3 | Hugging Face raw_weights (md5sum: 0663b293810d7571dad25dae2f2a5806 ) | - Extended vocabulary to 32768 |
Mistral-7B-Instruct-v0.3 | Hugging Face raw_weights (md5sum: 80b71fcb6416085bcb4efad86dfb4d52 ) | - Extended vocabulary to 32768 - Supports v3 Tokenizer - Supports function calling |
Mixtral-8x7B-v0.1 | Hugging Face | - 32k vocabulary size - Rope Theta = 1e6 |
Mixtral-8x7B-Instruct-v0.1 | Hugging Face raw_weights (md5sum: 8e2d3930145dc43d3084396f49d38a3f ) | - 32k vocabulary size - Rope Theta = 1e6 |
Mixtral-8x7B-v0.3 | Updated model coming soon! | - Extended vocabulary to 32768 - Supports v3 Tokenizer |
Mixtral-8x7B-Instruct-v0.3 | Updated model coming soon! | - Extended vocabulary to 32768 - Supports v3 Tokenizer - Supports function calling |
Mixtral-8x22B-v0.1 | Hugging Face raw_weights (md5sum: 0535902c85ddbb04d4bebbf4371c6341 ) | - 32k vocabulary size |
Mixtral-8x22B-Instruct-v0.1/ Mixtral-8x22B-Instruct-v0.3 | Hugging Face raw_weights (md5sum: 471a02a6902706a2f1e44a693813855b ) | - 32768 vocabulary size |
Mixtral-8x22B-v0.3 | raw_weights (md5sum: a2fa75117174f87d1197e3a4eb50371a ) | - 32768 vocabulary size - Supports v3 Tokenizer |
Codestral-22B-v0.1 | Hugging Face raw_weights (md5sum: 1ea95d474a1d374b1d1b20a8e0159de3 ) | - 32768 vocabulary size - Supports v3 Tokenizer |
Codestral-Mamba-7B-v0.1 | Hugging Face raw_weights(md5sum: d3993e4024d1395910c55db0d11db163 ) | - 32768 vocabulary size - Supports v3 Tokenizer |
Mathstral-7B-v0.1 | Hugging Face raw_weights(md5sum: 5f05443e94489c261462794b1016f10b ) | - 32768 vocabulary size - Supports v3 Tokenizer |
Sizes
Name | Number of parameters | Number of active parameters | Min. GPU RAM for inference (GB) |
---|---|---|---|
Mistral-7B-v0.3 | 7.3B | 7.3B | 16 |
Mixtral-8x7B-v0.1 | 46.7B | 12.9B | 100 |
Mixtral-8x22B-v0.3 | 140.6B | 39.1B | 300 |
Codestral-22B-v0.1 | 22.2B | 22.2B | 60 |
Codestral-Mamba-7B-v0.1 | 7.3B | 7.3B | 16 |
Mathstral-7B-v0.1 | 7.3B | 7.3B | 16 |
How to run?
Check out mistral-inference, a Python package for running our models. You can install mistral-inference
by
pip install mistral-inference
To learn more about how to use mistral-inference, take a look at the README and dive into this colab notebook to get started: