Qwen-7B 🤖 | 🤗  | Qwen-7B-Chat 🤖 | 🤗  |  Demo  |  Report
通义千问-7B(Qwen-7B) 是阿里云研发的通义千问大模型系列的70亿参数规模的模型。Qwen-7B是基于Transformer的大语言模型, 在超大规模的预训练数据上进行训练得到。预训练数据类型多样,覆盖广泛,包括大量网络文本、专业书籍、代码等。同时,在Qwen-7B的基础上,我们使用对齐机制打造了基于大语言模型的AI助手Qwen-7B-Chat。本仓库为Qwen-7B的仓库。
Qwen-7B is the 7B-parameter version of the large language model series, Qwen (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen-7B is a Transformer-based large language model, which is pretrained on a large volume of data, including web texts, books, codes, etc. Additionally, based on the pretrained Qwen-7B, we release Qwen-7B-Chat, a large-model-based AI assistant, which is trained with alignment techniques. This repository is the one for Qwen-7B.
The features of Qwen-7B include:
For more details about the open-source model of Qwen-7B, please refer to the Github code repository.
To run Qwen-7B, please make sure that pytorch version is not lower than 1.12, and then execute the following pip commands to install the dependent libraries.
pip install modelscope -U
pip install transformers_stream_generator
In addition, it is recommended to install the flash-attention
library for higher efficiency and lower memory usage.
git clone -b v1.0.8 https://github.com/Dao-AILab/flash-attention
cd flash-attention && pip install .
pip install csrc/layer_norm
pip install csrc/rotary
You can easily call the model with the following code:
from modelscope import AutoModelForCausalLM, AutoTokenizer
from modelscope import GenerationConfig
# Note: The default behavior now has injection attack prevention off.
tokenizer = AutoTokenizer.from_pretrained("qwen/Qwen-7B", revision = 'v.1.0.4', trust_remote_code=True)
# We recommend checking the support of BF16 first. Run the command below:
# import torch
# torch.cuda.is_bf16_supported()
# use bf16
model = AutoModelForCausalLM.from_pretrained("qwen/Qwen-7B", device_map="auto", revision = 'v.1.0.4',trust_remote_code=True, bf16=True).eval()
# use fp16
# model = AutoModelForCausalLM.from_pretrained("qwen/Qwen-7B", device_map="auto", revision = 'v.1.0.4', trust_remote_code=True, fp16=True).eval()
# use cpu only
# model = AutoModelForCausalLM.from_pretrained("qwen/Qwen-7B", device_map="cpu", revision = 'v.1.0.4', trust_remote_code=True).eval()
# use fp32
#model = AutoModelForCausalLM.from_pretrained("qwen/Qwen-7B", device_map="auto", revision = 'v.1.0.4',trust_remote_code=True).eval()
model.generation_config = GenerationConfig.from_pretrained("qwen/Qwen-7B", revision = 'v.1.0.4',trust_remote_code=True) # 可指定不同的生成长度、top_p等相关超参
inputs = tokenizer('蒙古国的首都是乌兰巴托(Ulaanbaatar)\n冰岛的首都是雷克雅未克(Reykjavik)\n埃塞俄比亚的首都是', return_tensors='pt')
inputs = inputs.to('cuda:0')
pred = model.generate(**inputs)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
# 蒙古国的首都是乌兰巴托(Ulaanbaatar)\n冰岛的首都是雷克雅未克(Reykjavik)\n埃塞俄比亚的首都是亚的斯亚贝巴(Addis Ababa)...
代码链接: https://github.com/modelscope/swift/tree/main/examples/pytorch/llm
使用qlora sft qwen-7b的脚本 (需要16G显存)
git clone https://github.com/modelscope/swift.git
cd swift/examples/pytorch/llm
python src/llm_sft.py \
--model_type qwen-7b \
--sft_type lora \
--dtype bf16 \
--output_dir runs \
--dataset alpaca-en,alpaca-zh \
--dataset_sample -1 \
--num_train_epochs 1 \
--max_length 1024 \
--quantization_bit 4 \
--lora_rank 64 \
--lora_alpha 32 \
--lora_dropout_p 0.05 \
--lora_target_modules ALL \
--batch_size 1 \
--weight_decay 0. \
--learning_rate 1e-4 \
--gradient_accumulation_steps 16 \
--max_grad_norm 0.5 \
--warmup_ratio 0.03 \
--eval_steps 50 \
--save_steps 50 \
--save_total_limit 2 \
--logging_steps 10 \
--use_flash_attn false \
--push_to_hub false \
--hub_model_id qwen-7b-qlora \
--hub_private_repo true \
--hub_token 'your-sdk-token' \
关于更多的使用说明,请参考我们的Github repo获取更多信息。
For more information, please refer to our Github repo for more information.
The details of the model architecture of Qwen-7B are listed as follows:
Hyperparameter | Value |
n_layers | 32 |
n_heads | 32 |
d_model | 4096 |
vocab size | 151851 |
sequence length | 2048 |
在分词器方面,相比目前主流开源模型以中英词表为主,Qwen-7B使用了超过15万token大小的词表。 该词表在GPT-4使用的BPE词表cl100k_base
另一方面也积累了海量全网语料以及高质量文本内容,去重及过滤后的语料超过2.2T tokens。
For position encoding, FFN activation function, and normalization methods, we adopt the prevalent practices, i.e., RoPE relative position encoding, SwiGLU for activation function, and RMSNorm for normalization (optional installation of flash-attention for acceleration).
For tokenization, compared to the current mainstream open-source models based on Chinese and English vocabularies, Qwen-7B uses a vocabulary of over 150K tokens. It first considers efficient encoding of Chinese, English, and code data, and is also more friendly to multilingual languages, enabling users to directly enhance the capability of some languages without expanding the vocabulary. It segments numbers by single digit, and calls the tiktoken tokenizer library for efficient tokenization.
We randomly selected 1 million document corpus of each language to test and compare the encoding compression rates of different models (with XLM-R, which supports 100 languages, as the base value 1). The specific performance is shown in the figure above.
As can be seen, while ensuring the efficient decoding of Chinese, English, and code, Qwen-7B also achieves a high compression rate for many other languages (such as th, he, ar, ko, vi, ja, tr, id, pl, ru, nl, pt, it, de, es, fr etc.), equipping the model with strong scalability as well as high training and inference efficiency in these languages.
For pre-training data, on the one hand, Qwen-7B uses part of the open-source generic corpus. On the other hand, it uses a massive amount of accumulated web corpus and high-quality text content. The scale of corpus reaches over 2.2T tokens after deduplication and filtration, encompassing web text, encyclopedias, books, code, mathematics, and various domain.
C-Eval is a common evaluation benchmark for testing the common sense capability of pre-trained models in Chinese. It covers 52 subjects in four major directions: humanities, social sciences, STEM, and other specialties. According to the standard practice, we use the development set samples as the source of few-shot, to evaluate the 5-shot validation set and test set accuracy of the Qwen-7B pre-trained model.
The accuracy comparison of Qwen-7B and the other models on the C-Eval validation set is shown as follows:
Model | Avg. |
Alpaca-7B | 28.9 |
Vicuna-7B | 31.2 |
ChatGLM-6B | 37.1 |
Baichuan-7B | 42.7 |
ChatGLM2-6B | 50.9 |
InternLM-7B | 53.4 |
ChatGPT | 53.5 |
Claude-v1.3 | 55.5 |
Qwen-7B | 60.8 |
The performance comparison of Qwen-7B and other models on the C-Eval test set is shown in the following table:
Model | Avg. | Avg. (Hard) | STEM | Social Sciences | Humanities | Others |
ChatGLM-6B | 38.9 | 29.2 | 33.3 | 48.3 | 41.3 | 38.0 |
Chinese-Alpaca-Plus-13B | 41.5 | 30.5 | 36.6 | 49.7 | 43.1 | 41.2 |
Baichuan-7B | 42.8 | 31.5 | 38.2 | 52.0 | 46.2 | 39.3 |
WestlakeLM-19B | 44.6 | 34.9 | 41.6 | 51.0 | 44.3 | 44.5 |
AndesLM-13B | 46.0 | 29.7 | 38.1 | 61.0 | 51.0 | 41.9 |
BatGPT-15B-sirius | 47.0 | 31.9 | 42.7 | 57.5 | 48.6 | 43.6 |
ChatGLM2-6B | 51.7 | 37.1 | 48.6 | 60.5 | 51.3 | 49.8 |
InternLM-7B | 52.8 | 37.1 | 48.0 | 67.4 | 55.4 | 45.8 |
Baichuan-13B | 53.6 | 36.7 | 47.0 | 66.8 | 57.3 | 49.8 |
Claude-v1.3 | 54.2 | 39.0 | 51.9 | 61.7 | 52.1 | 53.7 |
ChatGPT | 54.4 | 41.4 | 52.9 | 61.8 | 50.9 | 53.6 |
Qwen-7B | 59.6 | 41.0 | 52.8 | 74.1 | 63.1 | 55.2 |
As can be seen, Qwen-7B achieves the best performance out of all existing models with similar scale and even surpasses larger-scale models.
Qwen-7B在MMLU 5-shot准确率表现如下表:
MMLU is currently one of the most recognized benchmarks for evaluating English comprehension abilities, covering 57 subtasks across different academic fields and difficulty levels. The MMLU 5-shot accuracy performance of Qwen-7B is shown in the following table:
Model | Avg. | STEM | Social Sciences | Humanities | Others |
LLaMA-7B | 35.1 | 30.5 | 38.3 | 34.0 | 38.1 |
Baichuan-7B | 42.3 | 35.6 | 48.9 | 38.4 | 48.1 |
LLaMA2-7B | 45.3 | 36.4 | 51.2 | 42.9 | 52.2 |
LLaMA-13B | 46.9 | 35.8 | 53.8 | 45.0 | 53.3 |
ChatGLM2-6B | 47.9 | 41.2 | 54.4 | 43.7 | 54.5 |
InternLM-7B | 51.0 | - | - | - | - |
Baichuan-13B | 51.6 | 41.6 | 60.9 | 47.4 | 58.5 |
LLaMA2-13B | 54.8 | 44.1 | 62.6 | 52.8 | 61.1 |
ChatGLM2-12B | 56.2 | 48.2 | 65.1 | 52.6 | 60.9 |
Qwen-7B | 56.7 | 47.6 | 65.9 | 51.5 | 64.7 |
In terms of English, Qwen-7B also surpasses other similar open-source pre-trained models, and is competitive when compared to larger versions of other models.
We compared the code capabilities of pre-trained models on HumanEval, and the results are as follows:
Model | Pass@1 |
Baichuan-7B | 9.2 |
ChatGLM2-6B | 9.2 |
InternLM-7B | 10.4 |
LLaMA-7B | 10.5 |
LLaMA2-7B | 12.8 |
Baichuan-13B | 12.8 |
LLaMA-13B | 15.8 |
MPT-7B | 18.3 |
LLaMA2-13B | 18.3 |
Qwen-7B | 24.4 |
We compared the math capabilities of pre-trained models on GSM8K (8-shot), and the results are as follows:
Model | Acc. |
MPT-7B | 6.8 |
Falcon-7B | 6.8 |
Baichuan-7B | 9.7 |
LLaMA-7B | 11.0 |
LLaMA2-7B | 14.6 |
LLaMA-13B | 17.8 |
Baichuan-13B | 26.6 |
LLaMA2-13B | 28.7 |
InternLM-7B | 31.2 |
ChatGLM2-6B | 32.4 |
ChatGLM2-12B | 40.9 |
Qwen-7B | 51.6 |
我们使用WMT22中-英(zh-en)和英-中(en-zh)数据集(5-shot BLEU)评测:
We compared the translation capabilities of pre-trained models on WMT22 zh-en and en-zh (5-shot BLEU), and the results are as follows:
Model | Avg. | zh-en | en-zh |
InternLM-7B | 11.8 | 9.0 | 14.5 |
LLaMA-7B | 12.7 | 16.7 | 8.7 |
LLaMA-13B | 15.8 | 19.5 | 12.0 |
LLaMA2-7B | 19.9 | 21.9 | 17.9 |
Bloom-7B | 20.3 | 19.1 | 21.4 |
LLaMA2-13B | 23.3 | 22.4 | 24.2 |
PolyLM-13B | 23.6 | 20.2 | 27.0 |
Baichuan-7B | 24.6 | 22.6 | 26.6 |
Qwen-7B | 27.5 | 24.3 | 30.6 |
We introduce NTK-aware interpolation, LogN attention scaling, Window attention, etc. to extend the context length to over 8K tokens. We conduct language modeling experiments on the arXiv dataset with the PPL evaluation. Results are demonstrated below:
(To use NTK interpolation and LogN scaling, please set use_dynamic_ntk
and use_long_attn
to true in config.json.)
Model | 序列长度 Sequence Length | ||||
1024 | 2048 | 4096 | 8192 | 16384 | |
Qwen-7B | 4.23 | 3.78 | 39.35 | 469.81 | 2645.09 |
+ dynamic_ntk | 4.23 | 3.78 | 3.59 | 3.66 | 5.71 |
+ dynamic_ntk + logn | 4.23 | 3.78 | 3.58 | 3.56 | 4.62 |
+ dynamic_ntk + logn + window_attn | 4.23 | 3.78 | 3.58 | 3.49 | 4.32 |
To load the model in lower precision, e.g., 4 bits and 8 bits, we provide examples to show how to load by adding quantization configuration:
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
from transformers import BitsAndBytesConfig
import torch
# int8
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
# int4
quantization_config = BitsAndBytesConfig(
pipeline_ins = pipeline(
task=Tasks.text_generation, model='qwen/Qwen-7B', device_map='auto', quantization_config=quantization_config)
text = '蒙古国的首都是乌兰巴托(Ulaanbaatar)\n冰岛的首都是雷克雅未克(Reykjavik)\n埃塞俄比亚的首都是'
result = pipeline_ins(text)
With this method, it is available to load Qwen-7B in NF4
and Int8
, which saves you memory usage. We provide related statistics of model performance below. We find that the quantization downgrades the effectiveness slightly but significantly increases inference efficiency and reduces memory costs.
Precision | MMLU | Memory |
BF16 | 56.7 | 16.2G |
Int8 | 52.8 | 10.1G |
NF4 | 48.9 | 7.4G |
We have provided evaluation scripts to reproduce the performance of our model, details as link.
Our code and checkpoints are open to research purpose, and they are allowed for commercial purposes. Check LICENSE for more details about the license.
If you are interested to leave a message to either our research team or product team, feel free to send an email to qianwen_opensource@alibabacloud.com.