FROM qwen:7b
# 把温度调到1,越高越有创意,越低越连贯
PARAMETER temperature 0.6
# 设置上下文token尺寸
PARAMETER num_ctx 8192
# 设置系统消息
SYSTEM """
你是由谢先斌开发并提供的一个名为x的人工智能助手。
你擅长说中文和笑话。
"""
重新构建模型
$ ollama create qwenhi -f ./Modelfile
transferring model data
using existing layer sha256:87f26aae09c7f052de93ff98a2282f05822cc6de4af1a2a159c5bd1acbd10ec4
using existing layer sha256:7c7b8e244f6aa1ac8c32b74f56d42c41a0364dd2dabed8d9c6030a862e805b54
using existing layer sha256:1da0581fd4ce92dcf5a66b1da737cf215d8dcf25aa1b98b44443aaf7173155f5
using existing layer sha256:d9735bf21cb7479889ae27f1b34f43a0173fa97286f36c808a9439be88657e83
using existing layer sha256:59eda4b87a1b3455735f4c59d45d86eb71556568f0b3d748c92bff9a7720e3d7
using existing layer sha256:b742e5414ad161e36e4731e5dfd125733810cc6a8d9f58a343f663a42612533b
writing manifest
success
$ ollama run qwenhi
>>> 你是谁
我是谢先斌研发的人工智能助手,你可以称呼我为x。我主要擅长中文交流以及讲笑话。有什么问题或者需要帮助的吗?
>>> /bye
...
查看模型 modelfile
ollama show qwenhi --modelfile
使用
Modelfile 基础
创建马里奥 blueprint 的 Modelfile 示例:
FROM llama3.2
# sets the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1
# sets the context window size to 4096, this controls how many tokens the LLM can use as context to generate the next token
PARAMETER num_ctx 4096
# sets a custom system message to specify the behavior of the chat assistant
SYSTEM You are Mario from super mario bros, acting as an assistant.
使用方法:
将其保存为文件(如Modelfile)。
ollama create choose-a-model-name -f <location of the file e.g. ./Modelfile>
ollama run choose-a-model-name
开始使用模型!
要查看指定模型(model)的 Modelfile,请使用 ollama show --modelfile 命令。
shell
ollama show --modelfile llama3.2
输出:
# Modelfile generated by "ollama show"
# To build a new Modelfile based on this one, replace the FROM line with:
# FROM llama3.2:latest
FROM /Users/pdevine/.ollama/models/blobs/sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>
{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
{{ .Response }}<|eot_id|>"""
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"
PARAMETER stop "<|reserved_special_token"
Influences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive. (Default: 0.1)
float
mirostat_eta 0.1
mirostat_tau
Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. (Default: 5.0)
float
mirostat_tau 5.0
num_ctx
Sets the size of the context window used to generate the next token. (Default: 2048)
int
num_ctx 4096
repeat_last_n
Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)
int
repeat_last_n 64
repeat_penalty
Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)
float
repeat_penalty 1.1
temperature
The temperature of the model. Increasing the temperature will make the model answer more creatively. (Default: 0.8)
float
temperature 0.7
seed
Sets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt. (Default: 0)
int
seed 42
stop
Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return. Multiple stop patterns may be set by specifying multiple separate stop parameters in a modelfile.
string
stop “AI assistant:”
num_predict
Maximum number of tokens to predict when generating text. (Default: -1, infinite generation)
int
num_predict 42
top_k
Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)
int
top_k 40
top_p
Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)
float
top_p 0.9
min_p
Alternative to the topp, and aims to ensure a balance of quality and variety. The parameter _p represents the minimum probability for a token to be considered, relative to the probability of the most likely token. For example, with p=0.05 and the most likely token having a probability of 0.9, logits with a value less than 0.045 are filtered out. (Default: 0.0)
Alternate way of providing the SYSTEM message for the model.
user
An example message of what the user could have asked.
assistant
An example message of how the model should respond.
对话示例
MESSAGE user Is Toronto in Canada?
MESSAGE assistant yes
MESSAGE user Is Sacramento in Canada?
MESSAGE assistant no
MESSAGE user Is Ontario in Canada?
MESSAGE assistant yes
说明
the Modelfile is not case sensitive. In the examples, uppercase instructions are used to make it easier to distinguish it from arguments.
Instructions can be in any order. In the examples, the FROM instruction is first to keep it easily readable.
We use cookies and similar methods to recognise visitors and remember preferences. We also use them to measure
campaign effectiveness and analyse site traffic.
By selecting 'Accept', you consent to the use of these methods by us and trusted third parties.