StableLM-Tuned-Alpha

模型描述

StableLM-Tuned-Alpha 是一套建立在StableLM-Base-Alpha 模型之上的3B和7B参数的纯解码器语言模型，并在各种聊天和指令跟随数据集上进一步微调。

Usage

通过使用以下代码片段，开始与 StableLM-Tuned-Alpha进行聊天：

from modelscope.utils.constant import Tasks
from modelscope.pipelines import pipeline
pipe = pipeline(task=Tasks.text_generation, model='AI-ModelScope/stablelm-tuned-alpha-7b', model_revision='v1.0.2', device='cuda')


system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version)
- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
- StableLM will refuse to participate in anything that could harm a human.
"""

prompt = f"{system_prompt}<|USER|>What's your mood today?<|ASSISTANT|>"
result = pipe(prompt)
print(result)

<|SYSTEM|># StableLM Tuned (Alpha version)
- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
- StableLM will refuse to participate in anything that could harm a human.

模型细节

开发者: Stability AI
模型类型: StableLM-Tuned-Alpha models are auto-regressive language models based on the NeoX transformer architecture.
语言: English
Library: HuggingFace Transformers
许可证: Fine-tuned checkpoints (StableLM-Tuned-Alpha) are licensed under the Non-Commercial Creative Commons license (CC BY-NC-SA-4.0), in-line with the original non-commercial license specified by Stanford Alpaca.
联系: For questions and comments about the model, please email lm@stability.ai

训练

Parameters	Hidden Size	Layers	Heads	Sequence Length
3B	4096	16	32	4096
7B	6144	16	48	4096

训练数据集

StableLM-Tuned-Alpha 模型是在五个数据集的组合上进行微调的：:
Alpaca: 一个由OpenAI的 text-davinci-003引擎生成的52,000条指令和演示的数据集。
GPT4All Prompt Generations: 其中包括由GPT-4生成的40万个提示和回应;
Anthropic HH: 由对人工智能助手的帮助性和无害性的偏好组成;
DataBricks Dolly: 包括Databricks员工在InstructGPT论文的能力领域产生的15000条指令/回复，包括头脑风暴、分类、封闭式QA、生成、信息提取、开放式QA和总结;
and ShareGPT Vicuna (English subset): 从ShareGPT获取的对话数据集.

训练流程

模型是在上述数据集上通过监督微调学习的，以混合精度（FP16）进行训练，并通过AdamW进行优化。我们概述了以下超参数:

Parameters	Batch Size	Learning Rate	Warm-up	Weight Decay	Betas
3B	256	2e-5	50	0.01	(0.9, 0.99)
7B	128	2e-5	100	0.01	(0.9, 0.99)

使用和限制

预期用途

这些模型旨在被开源社区的聊天类应用使用，遵守CC BY-NC-SA-4.0许可证.

局限性和偏见

尽管上述数据集有助于引导基础语言模型进入 "更安全 "的文本分布，但并非所有的偏见和毒性都可以通过微调来缓解。我们要求用户注意在生成的反应中可能出现的这类潜在问题。不要把模型输出作为人类判断的替代品或真理的来源。请负责任地使用。

鸣谢

如果没有达科塔-马汉(@dmayhem93)的帮助，这项工作是不可能完成的。 .

引用

@misc{alpaca,
  author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto },
  title = {Stanford Alpaca: An Instruction-following LLaMA model},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/tatsu-lab/stanford_alpaca}},
}

@misc{vicuna2023,
    title = {Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality},
    url = {https://vicuna.lmsys.org},
    author = {Chiang, Wei-Lin and Li, Zhuohan and Lin, Zi and Sheng, Ying and Wu, Zhanghao and Zhang, Hao and Zheng, Lianmin and Zhuang, Siyuan and Zhuang, Yonghao and Gonzalez, Joseph E. and Stoica, Ion and Xing, Eric P.},
    month = {March},
    year = {2023}
}

@misc{gpt4all,
  author = {Yuvanesh Anand and Zach Nussbaum and Brandon Duderstadt and Benjamin Schmidt and Andriy Mulyar},
  title = {GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/nomic-ai/gpt4all}},
}