Fine-tuning
For detailed end-to-end fine-tuning examples and FAQ, check out our fine-tuning guide.
Every fine-tuning job comes with a minimum fee of $4, and there's a monthly storage fee of $2 for each model. For more detailed pricing information, please visit our pricing page.
Fine-tuning basics
Fine-tuning vs. prompting
When deciding whether to use prompt engineering or fine-tuning for an AI model, it can be difficult to determine which method is best. It's generally recommended to start with prompt engineering, as it's faster and less resource-intensive. To help you choose the right approach, here are the key benefits of prompting and fine-tuning:
-
Benefits of Prompting
- A generic model can work out of the box (the task can be described in a zero shot fashion)
- Does not require any fine-tuning data or training to work
- Can easily be updated for new workflows and prototyping
Check out our prompting guide to explore various capabilities of Mistral models.
-
Benefits of Fine-tuning
- Works significantly better than prompting
- Typically works better than a larger model (faster and cheaper because it doesn't require a very long prompt)
- Provides a better alignment with the task of interest because it has been specifically trained on these tasks
- Can be used to teach new facts and information to the model (such as advanced tools or complicated workflows)
Common use cases
Fine-tuning has a wide range of use cases, some of which include:
- Customizing the model to generate responses in a specific format and tone
- Specializing the model for a specific topic or domain to improve its performance on domain-specific tasks
- Improving the model through distillation from a stronger and more powerful model by training it to mimic the behavior of the larger model
- Enhancing the model’s performance by mimicking the behavior of a model with a complex prompt, but without the need for the actual prompt, thereby saving tokens, and reducing associated costs
- Reducing cost and latency by using a small yet efficient fine-tuned model
Dataset Format
Data must be stored in JSON Lines (.jsonl
) files, which allow storing multiple JSON objects, each on a new line.
Datasets should follow an instruction-following format representing a user-assistant conversation. Each JSON data sample should either consist of only user and assistant messages ("Default Instruct") or include function-calling logic ("Function-calling Instruct").
1. Default Instruct
Conversational data between user and assistant, which can be one-turn or multi-turn. Example:
{
"messages": [
{
"role": "user",
"content": "User interaction n°1 contained in document n°2"
},
{
"role": "assistant",
"content": "Bot interaction n°1 contained in document n°2"
},
{
"role": "user",
"content": "User interaction n°2 contained in document n°1"
},
{
"role": "assistant",
"content": "Bot interaction n°2 contained in document n°1"
}
]
}
- Conversational data must be stored under the
"messages"
key as a list. - Each list item is a dictionary containing the
"content"
and"role"
keys."role"
is a string:"user"
,"assistant"
, or"system"
. - Loss computation is performed only on tokens corresponding to assistant messages (
"role" == "assistant"
).
2. Function-calling Instruct
Conversational data with tool usage. Example:
{
"messages": [
{
"role": "system",
"content": "You are a helpful assistant with access to the following functions to help the user. You can use the functions if needed."
},
{
"role": "user",
"content": "Can you help me generate an anagram of the word 'listen'?"
},
{
"role": "assistant",
"tool_calls": [
{
"id": "TX92Jm8Zi",
"type": "function",
"function": {
"name": "generate_anagram",
"arguments": "{\"word\": \"listen\"}"
}
}
]
},
{
"role": "tool",
"content": "{\"anagram\": \"silent\"}",
"tool_call_id": "TX92Jm8Zi"
},
{
"role": "assistant",
"content": "The anagram of the word 'listen' is 'silent'."
},
{
"role": "user",
"content": "That's amazing! Can you generate an anagram for the word 'race'?"
},
{
"role": "assistant",
"tool_calls": [
{
"id": "3XhQnxLsT",
"type": "function",
"function": {
"name": "generate_anagram",
"arguments": "{\"word\": \"race\"}"
}
}
]
}
],
"tools": [
{
"type": "function",
"function": {
"name": "generate_anagram",
"description": "Generate an anagram of a given word",
"parameters": {
"type": "object",
"properties": {
"word": {
"type": "string",
"description": "The word to generate an anagram of"
}
},
"required": ["word"]
}
}
}
]
}
- Conversational data must be stored under the
"messages"
key as a list. - Each message is a dictionary containing the
"role"
and"content"
or"tool_calls"
keys."role"
should be one of"user"
,"assistant"
,"system"
, or"tool"
. - Only messages of type
"assistant"
can have a"tool_calls"
key, representing the assistant performing a call to an available tool. - An assistant message with a
"tool_calls"
key cannot have a"content"
key and must be followed by a"tool"
message, which in turn must be followed by another assistant message. - The
"tool_call_id"
of tool messages must match the"id"
of at least one of the previous assistant messages. - Both
"id"
and"tool_call_id"
are randomly generated strings of exactly 9 characters. We recommend generating these automatically in a data preparation script as done here. - The
"tools"
key must include definitions of all tools used in the conversation. - Loss computation is performed only on tokens corresponding to assistant messages (
"role" == "assistant"
).
Upload a file
Once you have the data file with the right format, you can upload the data file to the Mistral Client, making them available for use in fine-tuning jobs.
- python
- javascript
- curl
import os
from mistralai.client import MistralClient
api_key = os.environ.get("MISTRAL_API_KEY")
client = MistralClient(api_key=api_key)
with open("training_file.jsonl", "rb") as f:
training_data = client.files.create(file=("training_file.jsonl", f))
import MistralClient from '@mistralai/mistralai';
const apiKey = process.env.MISTRAL_API_KEY;
const client = new MistralClient(apiKey);
const file = fs.readFileSync('training_file.jsonl');
const training_data = await client.files.create({ file });
const file = fs.readFileSync('validation_file.jsonl');
const validation_data = await client.files.create({ file });
curl https://api.mistral.ai/v1/files \
-H "Authorization: Bearer $MISTRAL_API_KEY" \
-F purpose="fine-tune" \
-F file="@training_file.jsonl"
curl https://api.mistral.ai/v1/files \
-H "Authorization: Bearer $MISTRAL_API_KEY" \
-F purpose="fine-tune" \
-F file="@validation_file.jsonl"
Create a fine-tuning job
The next step is to create a fine-tuning job.
- model: the specific model you would like to fine-tune. The choices are
open-mistral-7b
(v0.3) andmistral-small-latest
(mistral-small-2402
). - training_files: a collection of training file IDs, which can consist of a single file or multiple files
- validation_files: a collection of validation file IDs, which can consist of a single file or multiple files
- hyperparameters: two adjustable hyperparameters, "training_step" and "learning_rate", that users can modify.
- python
- javascript
- curl
from mistralai.models.jobs import TrainingParameters
created_jobs = client.jobs.create(
model="open-mistral-7b",
training_files=[training_data.id],
validation_files=[validation_data.id],
hyperparameters=TrainingParameters(
training_steps=10,
learning_rate=0.0001,
)
)
created_jobs
const createdJob = await client.jobs.create({
model: 'open-mistral-7b',
trainingFiles: [training_data.id],
validationFiles: [validation_data.id],
hyperparameters: {
trainingSteps: 10,
learningRate: 0.0001,
},
});
curl https://api.mistral.ai/v1/fine_tuning/jobs \
--header "Authorization: Bearer $MISTRAL_API_KEY" \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--data '{
"model": "open-mistral-7b",
"training_files": [
"<uuid>"
],
"validation_files": [
"<uuid>"
],
"hyperparameters": {
"training_steps": 10,
"learning_rate": 0.0001
}
}'
List/retrieve/cancel jobs
You can also list jobs, retrieve a job, or cancel a job.
You can filter and view a list of jobs using various parameters such as
page
, page_size
, model
, created_after
, created_by_me
, status
, wandb_project
, wandb_name
, and suffix
. Check out our API specs for details.
- python
- javascript
- curl
# List jobs
jobs = client.jobs.list()
print(jobs)
# Retrieve a jobs
retrieved_jobs = client.jobs.retrieve(created_jobs.id)
print(retrieved_jobs)
# Cancel a jobs
canceled_jobs = client.jobs.cancel(created_jobs.id)
print(canceled_jobs)
// List jobs
const jobs = await client.jobs.list();
// Retrieve a job
const retrievedJob = await client.jobs.retrieve({ jobId: createdJob.id });
// Cancel a job
const canceledJob = await client.jobs.cancel({ jobId: createdJob.id });
# List jobs
curl https://api.mistral.ai/v1/fine_tuning/jobs \
--header "Authorization: Bearer $MISTRAL_API_KEY" \
--header 'Content-Type: application/json'
# Retrieve a job
curl https://api.mistral.ai/v1/fine_tuning/jobs/<jobid> \
--header "Authorization: Bearer $MISTRAL_API_KEY" \
--header 'Content-Type: application/json'
# Cancel a job
curl -X POST https://api.mistral.ai/v1/fine_tuning/jobs/<jobid>/cancel \
--header "Authorization: Bearer $MISTRAL_API_KEY" \
--header 'Content-Type: application/json'
Use a fine-tuned model
When a fine-tuned job is finished, you will be able to see the fine-tuned model name via retrieved_jobs.fine_tuned_model
. Then you can use our chat
endpoint to chat with the fine-tuned model:
- python
- javascript
- curl
from mistralai.models.chat_completion import ChatMessage
chat_response = client.chat(
model=retrieved_job.fine_tuned_model,
messages=[ChatMessage(role='user', content='What is the best French cheese?')]
)
const chatResponse = await client.chat({
model: retrievedJob.fine_tuned_model,
messages: [{role: 'user', content: 'What is the best French cheese?'}],
});
curl "https://api.mistral.ai/v1/chat/completions" \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header "Authorization: Bearer $MISTRAL_API_KEY" \
--data '{
"model": "ft:open-mistral-7b:daf5e488:20240430:c1bed559",
"messages": [{"role": "user", "content": "Who is the most renowned French painter?"}]
}'
Delete a fine-tuned model
- python
- curl
client.delete_model(retrieved_job.fine_tuned_model)
curl --location --request DELETE 'https://api.mistral.ai/v1/models/ft:open-mistral-7b:XXX:20240531:XXX' \
--header 'Accept: application/json' \
--header "Authorization: Bearer $MISTRAL_API_KEY"
FAQ
How to validate data format?
-
Mistral API: We currently validate each file when you upload the dataset.
-
mistral-finetune
: You can run the data validation script to validate the data and run the reformat data script to reformat the data to the right format:# download the reformat script
wget https://raw.githubusercontent.com/mistralai/mistral-finetune/main/utils/reformat_data.py
# download the validation script
wget https://raw.githubusercontent.com/mistralai/mistral-finetune/main/utils/validate_data.py
# reformat data
python reformat_data.py data.jsonl
# validate data
python validate_data.py data.jsonlHowever, it's important to note that these scripts might not detect all problematic cases. Therefore, you may need to manually validate and correct any unique edge cases in your data.
What's the size limit of the training data?
While the size limit for an individual training data file is 512MB, there's no limitation on the number of files you can upload. You can upload multiple files and reference them when creating the job.
What's the size limit of the validation data?
The size limit for the validation data is 1MB. As a rule of thumb:
validation_set_max_size = min(1MB, 5% of training data)
How many epochs are in the training process?
A general rule of thumb is: Num epochs = max_steps / file_of_training_jsonls_in_MB. For instance, if your training file is 100MB and you set max_steps=1000, the training process will roughly perform 10 epochs.
Where can I find information on ETA / number of tokens / number of passes over each files?
Mistral API: Use the dry_run=True
argument.
dry_run_job = await client.jobs.create(
model="open-mistral-7b",
training_files=[training_file.id],
hyperparameters=TrainingParameters(
training_steps=10,
learning_rate=0.0001,
),
dry_run=True,
)
print(dry_run_job)
mistral-finetune
: You can use the following script to find out: https://github.com/mistralai/mistral-finetune/blob/main/utils/validate_data.py. This script accepts a .yaml training file as input and returns the number of tokens the model is being trained on.
How to estimate cost of a fine-tuning job?
For Mistral API, you can use the dry_run=True
argument as mentioned in the previous question.
What is the recommended learning rate?
For LoRA fine-tuning, we recommended 1e-4 (default) or 1e-5.
Note that the learning rate we define is the peak learning rate, instead of a flat learning rate. The learning rate follows a linear warmup and cosine decay schedule. During the warmup phase, the learning rate is linearly increased from a small initial value to a larger value over a certain number of training steps. After the warmup phase, the learning rate is decayed using a cosine function.
Is the fine-tuning API compatible with OpenAI data format?
Yes, we support OpenAI format.
What if my file size is larger than 500MB and I get the error message 413 Request Entity Too Large
?
You can split your data file into chunks. Here is an example:
Details
import json
from datasets import load_dataset
# get data from hugging face
ds = load_dataset("HuggingFaceH4/ultrachat_200k",split="train_gen")
# save data into .jsonl. This file is about 1.3GB
with open('train.jsonl', 'w') as f:
for line in ds:
json.dump(line, f)
f.write('\n')
# reformat data
!wget https://raw.githubusercontent.com/mistralai/mistral-finetune/main/utils/reformat_data.py
!python reformat_data.py train.jsonl
# Split file into three chunks
input_file = "train.jsonl"
output_files = ["train_1.jsonl", "train_2.jsonl", "train_3.jsonl"]
# open the output files
output_file_objects = [open(file, "w") for file in output_files]
# counter for output files
counter = 0
with open(input_file, "r") as f_in:
# read the input file line by line
for line in f_in:
# parse the line as JSON
data = json.loads(line)
# write the data to the current output file
output_file_objects[counter].write(json.dumps(data) + "\n")
# increment the counter
counter = (counter + 1) % 3
# close the output files
for file in output_file_objects:
file.close()
# now you should see three jsonl files under 500MB