ChatGLM2-6B-32K

💻 Github Repo • 🐦 Twitter • 📃 [GLM@ACL 22] [GitHub] • 📃 [GLM-130B@ICLR 23] [GitHub]

👋 Join our Slack and WeChat

介绍

ChatGLM2-6B-32K在ChatGLM2-6B的基础上进一步强化了对于长文本的理解能力，能够更好的处理最多32K长度的上下文。具体地，我们基于位置插值（Positional Interpolation）的方法对位置编码进行了更新，并在对话阶段使用 32K 的上下文长度训练。在实际的使用中，如果您面临的上下文长度基本在 8K 以内，我们推荐使用ChatGLM2-6B；如果您需要处理超过 8K 的上下文长度，我们推荐使用ChatGLM2-6B-32K。

ChatGLM2-6B-32K是开源中英双语对话模型 ChatGLM2-6B 的加长版本，在保留了初代模型对话流畅、部署门槛较低等众多优秀特性的基础之上，ChatGLM2-6B-32k 引入了如下新特性：

更强大的性能：基于 ChatGLM 初代模型的开发经验，我们全面升级了 ChatGLM2-6B-32K 的基座模型。ChatGLM2-6B-32K 使用了 GLM 的混合目标函数，经过了 1.4T 中英标识符的预训练与人类偏好对齐训练。
更长的上下文：基于 FlashAttention 技术，我们将基座模型的上下文长度（Context Length）由 ChatGLM-6B 的 2K 扩展到了 32K，并在对话阶段使用 32K 的上下文长度训练，允许更多轮次的对话。
更高效的推理：基于 Multi-Query Attention 技术，ChatGLM2-6B-32K 有更高效的推理速度和更低的显存占用：在官方的模型实现下，推理速度相比初代提升了 42%，INT4 量化下，6G 显存支持的对话长度由 1K 提升到了 8K。
更开放的协议：ChatGLM2-6B-32K 权重对学术研究完全开放，在填写问卷进行登记后亦允许免费商业使用。

The ChatGLM2-6B-32K further strengthens the ability to understand long texts based on the ChatGLM2-6B, and can better handle up to 32K context length. Specifically, we have updated the position encoding based on the method of Positional Interpolation, and trained with a 32K context length during the dialogue alignment. In practical use, if the context length you are dealing with is generally within 8K, we recommend using ChatGLM2-6B; if you need to handle a context length exceeding 8K, we recommend using ChatGLM2-6B-32K.

ChatGLM2-6B-32K is the second-generation version of the open-source bilingual (Chinese-English) chat model ChatGLM-6B. It retains the smooth conversation flow and low deployment threshold of the first-generation model, while introducing the following new features:

Stronger Performance: Based on the development experience of the first-generation ChatGLM model, we have fully upgraded the base model of ChatGLM2-6B-32K. ChatGLM2-6B-32K uses the hybrid objective function of GLM, and has undergone pre-training with 1.4T bilingual tokens and human preference alignment training.
Longer Context: Based on FlashAttention technique, we have extended the context length of the base model from 2K in ChatGLM-6B to 32K, and trained with a context length of 32K during the dialogue alignment, allowing for more rounds of dialogue.
More Efficient Inference: Based on Multi-Query Attention technique, ChatGLM2-6B-32K has more efficient inference speed and lower GPU memory usage: under the official implementation, the inference speed has increased by 42% compared to the first generation; under INT4 quantization, the dialogue length supported by 6G GPU memory has increased from 1K to 8K.
More Open License: ChatGLM2-6B-32K weights are completely open for academic research, and free commercial use is also allowed after completing the questionnaire.

软件依赖

pip install protobuf transformers==4.30.2 cpm_kernels torch>=2.0 gradio mdtex2html sentencepiece accelerate

代码调用

可以通过如下代码调用 ChatGLM-6B-32K 模型来生成对话：

# 备注：最新模型版本要求modelscope的master

# 当前可用版本：
# Step1: git clone https://github.com/modelscope/modelscope.git
# Step2：cd modelscope
# Step3: pip install .

from modelscope.utils.constant import Tasks
from modelscope import Model
from modelscope.pipelines import pipeline
model = Model.from_pretrained('ZhipuAI/chatglm2-6b-32k', device_map='auto')
pipe = pipeline(task=Tasks.chat, model=model)
inputs = {'text':'你好', 'history': []}
result = pipe(inputs)
inputs = {'text':'介绍下清华大学', 'history': result['history']}
result = pipe(inputs)
print(result)

关于更多的使用说明，包括如何运行命令行和网页版本的 DEMO，以及使用模型量化以节省显存，请参考我们的 Github Repo。

For more instructions, including how to run CLI and web demos, and model quantization, please refer to our Github Repo.

Change Log

v1.0

协议

本仓库的代码依照 Apache-2.0 协议开源，ChatGLM2-6B-32K 模型的权重的使用则需要遵循 Model License。

引用

如果你觉得我们的工作有帮助的话，请考虑引用下列论文，ChatGLM2-6B 的论文会在近期公布，敬请期待～

@article{zeng2022glm,
  title={Glm-130b: An open bilingual pre-trained model},
  author={Zeng, Aohan and Liu, Xiao and Du, Zhengxiao and Wang, Zihan and Lai, Hanyu and Ding, Ming and Yang, Zhuoyi and Xu, Yifan and Zheng, Wendi and Xia, Xiao and others},
  journal={arXiv preprint arXiv:2210.02414},
  year={2022}
}

@inproceedings{du2022glm,
  title={GLM: General Language Model Pretraining with Autoregressive Blank Infilling},
  author={Du, Zhengxiao and Qian, Yujie and Liu, Xiao and Ding, Ming and Qiu, Jiezhong and Yang, Zhilin and Tang, Jie},
  booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  pages={320--335},
  year={2022}
}