ollama Modelfile 使用

Modelfile 是与 Ollama 创建和共享模型的文件，功能类似于 docker 制作镜像的 Dockerfile。

格式

Modelfile 格式：

# comment
INSTRUCTION arguments

指令（Instruction）	描述
`FROM` (required)	定义要使用的基本模型。
`PARAMETER`	设置参数（parameters）控制 Ollama 如何运行模型（model）
`TEMPLATE`	发送给模型的完整提示模板（prompt template）
`SYSTEM`	指定将在模板中设置的系统信息（system message）
`ADAPTER`	定义应用于模型的 (Q)LoRA 适配器（(Q)LoRA adapters）
`LICENSE`	指定法律许可
`MESSAGE`	指定消息历史记录

定制模型 Modelfile 示例

参考 Customize a model
Modelfile 可以用来调整大模型提示词和参数
示例：基于阿里千问 7b 的 Modelfile

FROM qwen:7b

# 把温度调到1，越高越有创意，越低越连贯
PARAMETER temperature 0.6

# 设置上下文token尺寸
PARAMETER num_ctx 8192

# 设置系统消息
SYSTEM """
你是由谢先斌开发并提供的一个名为x的人工智能助手。
你擅长说中文和笑话。
"""

重新构建模型

$ ollama create qwenhi -f ./Modelfile
transferring model data
using existing layer sha256:87f26aae09c7f052de93ff98a2282f05822cc6de4af1a2a159c5bd1acbd10ec4
using existing layer sha256:7c7b8e244f6aa1ac8c32b74f56d42c41a0364dd2dabed8d9c6030a862e805b54
using existing layer sha256:1da0581fd4ce92dcf5a66b1da737cf215d8dcf25aa1b98b44443aaf7173155f5
using existing layer sha256:d9735bf21cb7479889ae27f1b34f43a0173fa97286f36c808a9439be88657e83
using existing layer sha256:59eda4b87a1b3455735f4c59d45d86eb71556568f0b3d748c92bff9a7720e3d7
using existing layer sha256:b742e5414ad161e36e4731e5dfd125733810cc6a8d9f58a343f663a42612533b
writing manifest
success
$ ollama run qwenhi
>>> 你是谁
我是谢先斌研发的人工智能助手，你可以称呼我为x。我主要擅长中文交流以及讲笑话。有什么问题或者需要帮助的吗？

>>> /bye
...

查看模型 modelfile

ollama show qwenhi --modelfile

使用

`Modelfile` 基础

创建马里奥 blueprint 的 Modelfile 示例：

FROM llama3.2
# sets the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1
# sets the context window size to 4096, this controls how many tokens the LLM can use as context to generate the next token
PARAMETER num_ctx 4096

# sets a custom system message to specify the behavior of the chat assistant
SYSTEM You are Mario from super mario bros, acting as an assistant.

使用方法：

将其保存为文件（如Modelfile）。
ollama create choose-a-model-name -f <location of the file e.g. ./Modelfile>
ollama run choose-a-model-name
开始使用模型！

要查看指定模型（model）的 Modelfile，请使用 ollama show --modelfile 命令。

shell

ollama show --modelfile llama3.2

输出:

# Modelfile generated by "ollama show"
# To build a new Modelfile based on this one, replace the FROM line with:
# FROM llama3.2:latest
FROM /Users/pdevine/.ollama/models/blobs/sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>"""
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"
PARAMETER stop "<|reserved_special_token"

指令

FROM (Required)

FROM指令定义了创建模型时要使用的基础模型

FROM <model name>:<tag>

基于现有模式构建

FROM llama3.2

可用的基础模型：https://github.com/ollama/ollama#model-library
更多模型可在以下网站找到：https://ollama.com/library

基于 Safetensors 模型构建

FROM <model directory>

Currently supported model architectures:

模型目录应包含受支持架构的 Safetensors 权重，目前支持的模型架构：

Llama (including Llama 2, Llama 3, Llama 3.1, and Llama 3.2)
Mistral (including Mistral 1, Mistral 2, and Mixtral)
Gemma (including Gemma 1 and Gemma 2)
Phi3

基于 GGUF 文件构架

FROM ./ollama-model.gguf

GGUF 文件位置应指定为绝对路径或相对于 Modelfile 的位置

参数 PARAMETER

PARAMETER 指令定义了模型运行时可以设置的参数。

PARAMETER <parameter> <parametervalue>

有效参数和值

Parameter	Description	Value Type	Example Usage
mirostat	Enable Mirostat sampling for controlling perplexity. (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)	int	mirostat 0
mirostat_eta	Influences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive. (Default: 0.1)	float	mirostat_eta 0.1
mirostat_tau	Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. (Default: 5.0)	float	mirostat_tau 5.0
num_ctx	Sets the size of the context window used to generate the next token. (Default: 2048)	int	num_ctx 4096
repeat_last_n	Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)	int	repeat_last_n 64
repeat_penalty	Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)	float	repeat_penalty 1.1
temperature	The temperature of the model. Increasing the temperature will make the model answer more creatively. (Default: 0.8)	float	temperature 0.7
seed	Sets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt. (Default: 0)	int	seed 42
stop	Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return. Multiple stop patterns may be set by specifying multiple separate `stop` parameters in a modelfile.	string	stop “AI assistant:”
num_predict	Maximum number of tokens to predict when generating text. (Default: -1, infinite generation)	int	num_predict 42
top_k	Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)	int	top_k 40
top_p	Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)	float	top_p 0.9
min_p	Alternative to the topp, and aims to ensure a balance of quality and variety. The parameter _p represents the minimum probability for a token to be considered, relative to the probability of the most likely token. For example, with p=0.05 and the most likely token having a probability of 0.9, logits with a value less than 0.045 are filtered out. (Default: 0.0)	float	min_p 0.05

参数说明：

temperature temperature 模型的温度，增加温度会使模型更具创造性
- 温度越高，创造性越强，温度越低，连贯性越强
num_ctx 设置用于生成下一个 Token 的上下文窗口大小，默认值: 2048
top_k 减少生成无意义文本的概率，较高的值（例如 100）会给出更多样化的答案，而较低的值（例如 10）会更保守，默认值: 40
- 整数值，通常设置在 0 到 100 之间
- 较低的 top_k 值降低了 LLM 生成无意义内容的概率
top_p 与 top_k 一起使用，较高的值（例如 0.95）会导致更多样化的文本，而较低的值（例如 0.5）会生成更聚焦和保守的文本，默认值: 0.9
- 参数是一个介于 0 和 1 之间的浮点值
- 较高的值，即 1.0 意味着 LLM 被允许考虑更广泛的可能下一个 token 范围，从而允许更多的创造力
seed 设置用于生成的随机数种子，将其设置为特定数字将使模型对相同提示生成相同的文本
num_predict 生成文本时要预测的最大 Token 数

模板 TEMPLATE

TEMPLATE 是将完整提示模板（prompt template）传递给模型。它可能包括（可选）系统信息（system message）、用户信息（user's message）和模型的响应。注意：语法可能与模型有关。模板使用 Go template syntax。

模板变量 Template Variables

Variable	Description
`{{ .System }}`	The system message used to specify custom behavior.
`{{ .Prompt }}`	The user prompt message.
`{{ .Response }}`	The response from the model. When generating a response, text after this variable is omitted.

TEMPLATE """{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
"""

系统 SYSTEM

SYSTEM 指令指定模板中要使用的系统信息。

SYSTEM """<system message>"""

适配器 ADAPTER

ADAPTER 指令指定了应用于基本模型的微调 LoRA 适配器（LoRA adapter）。适配器的值应该是绝对路径或相对于 Modelfile 的路径。应使用 FROM 指令指定基础模型。如果基础模型与根据适配器调整的基础模型不相同，则行为将不稳定。

Safetensor adapter

ADAPTER <path to safetensor adapter>

目前支持的 Safetensor 适配器:

Llama (including Llama 2, Llama 3, and Llama 3.1)
Mistral (including Mistral 1, Mistral 2, and Mixtral)
Gemma (including Gemma 1 and Gemma 2)

GGUF adapter

ADAPTER ./ollama-lora.gguf

LICENSE

通过 LICENSE 指令，可以指定共享或分发与此 Modelfile 一起使用的模型所依据的法律许可。

LICENSE """
<license text>
"""

MESSAGE

通过 MESSAGE 指令，可以指定信息历史记录，供模型在回复时使用。使用 MESSAGE 指令的多次重复可以建立一个对话，引导模型以类似的方式进行回复。

MESSAGE <role> <message>

有效的 roles

Role	Description
system	Alternate way of providing the SYSTEM message for the model.
user	An example message of what the user could have asked.
assistant	An example message of how the model should respond.

对话示例

MESSAGE user Is Toronto in Canada?
MESSAGE assistant yes
MESSAGE user Is Sacramento in Canada?
MESSAGE assistant no
MESSAGE user Is Ontario in Canada?
MESSAGE assistant yes

说明

the Modelfile is not case sensitive. In the examples, uppercase instructions are used to make it easier to distinguish it from arguments.
Instructions can be in any order. In the examples, the FROM instruction is first to keep it easily readable.
Modelfile不区分大小写。在示例中，使用大写指令是为了更容易将其与参数区分开来。
指令的顺序可以任意。在示例中，FROM 指令放在第一位，以便于阅读。

扩展

ollama 模型采用 OCI 格式，可以提交到 distribution/distribution 仓库中（参考）
基于 Qwen3-4B 通过修改 Modelfile 实现的的专业编程助手模型参考

格式