💻 Github Repo • 🐦 Twitter • 📃 [GLM@ACL 22] [GitHub] • 📃 [GLM-130B@ICLR 23] [GitHub]
ChatGLM2-6B-32K在ChatGLM2-6B的基础上进一步强化了对于长文本的理解能力,能够更好的处理最多32K长度的上下文。具体地,我们基于位置插值(Positional Interpolation)的方法对位置编码进行了更新,并在对话阶段使用 32K 的上下文长度训练。在实际的使用中,如果您面临的上下文长度基本在 8K 以内,我们推荐使用ChatGLM2-6B;如果您需要处理超过 8K 的上下文长度,我们推荐使用ChatGLM2-6B-32K。
ChatGLM2-6B-32K是开源中英双语对话模型 ChatGLM2-6B 的加长版本,在保留了初代模型对话流畅、部署门槛较低等众多优秀特性的基础之上,ChatGLM2-6B-32k 引入了如下新特性:
The ChatGLM2-6B-32K further strengthens the ability to understand long texts based on the ChatGLM2-6B, and can better handle up to 32K context length. Specifically, we have updated the position encoding based on the method of Positional Interpolation, and trained with a 32K context length during the dialogue alignment. In practical use, if the context length you are dealing with is generally within 8K, we recommend using ChatGLM2-6B; if you need to handle a context length exceeding 8K, we recommend using ChatGLM2-6B-32K.
ChatGLM2-6B-32K is the second-generation version of the open-source bilingual (Chinese-English) chat model ChatGLM-6B. It retains the smooth conversation flow and low deployment threshold of the first-generation model, while introducing the following new features:
pip install protobuf transformers==4.30.2 cpm_kernels torch>=2.0 gradio mdtex2html sentencepiece accelerate
可以通过如下代码调用 ChatGLM-6B-32K 模型来生成对话:
# 备注:最新模型版本要求modelscope的master
# 当前可用版本:
# Step1: git clone https://github.com/modelscope/modelscope.git
# Step2:cd modelscope
# Step3: pip install .
from modelscope.utils.constant import Tasks
from modelscope import Model
from modelscope.pipelines import pipeline
model = Model.from_pretrained('ZhipuAI/chatglm2-6b-32k', device_map='auto')
pipe = pipeline(task=Tasks.chat, model=model)
inputs = {'text':'你好', 'history': []}
result = pipe(inputs)
inputs = {'text':'介绍下清华大学', 'history': result['history']}
result = pipe(inputs)
print(result)
关于更多的使用说明,包括如何运行命令行和网页版本的 DEMO,以及使用模型量化以节省显存,请参考我们的 Github Repo。
For more instructions, including how to run CLI and web demos, and model quantization, please refer to our Github Repo.
本仓库的代码依照 Apache-2.0 协议开源,ChatGLM2-6B-32K 模型的权重的使用则需要遵循 Model License。
如果你觉得我们的工作有帮助的话,请考虑引用下列论文,ChatGLM2-6B 的论文会在近期公布,敬请期待~
@article{zeng2022glm,
title={Glm-130b: An open bilingual pre-trained model},
author={Zeng, Aohan and Liu, Xiao and Du, Zhengxiao and Wang, Zihan and Lai, Hanyu and Ding, Ming and Yang, Zhuoyi and Xu, Yifan and Zheng, Wendi and Xia, Xiao and others},
journal={arXiv preprint arXiv:2210.02414},
year={2022}
}
@inproceedings{du2022glm,
title={GLM: General Language Model Pretraining with Autoregressive Blank Infilling},
author={Du, Zhengxiao and Qian, Yujie and Liu, Xiao and Ding, Ming and Qiu, Jiezhong and Yang, Zhilin and Tang, Jie},
booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
pages={320--335},
year={2022}
}