MAOE介绍

What’s New

2023年1月：MAOE（More thAn One Encoder）首次发布，支持 1）细粒度实体识别（体验）；2）作为encoder继续fine-tuning（同BERT）。开发工具介绍; 3) 结合Adaseq进行少样本实体识别模型学习。

模型描述

该模型还在持续迭代中，相关论文还处于匿名期，因此此处只做简单介绍, 敬请期待完整内容。

MAOE（More thAn One Encoder）是一个集预训练模型和任务模型为一体的模型。可以用于以下场景：

作为预训练模型（encoder）,结合下游任务head继续fine-tuning, 下游任务可以是序列理解相关任务，如命名实体识别、实体分类等。
作为细粒度实体识别模型。MAOE内置细粒度实体识别能力，实现超1000种实体的识别。
作为零样本实体识别模型。MAOE内置零样本实体识别能力，只需给定实体类别即可抽取需要的实体。

期望模型使用方式以及适用范围

本模型主要用于给输入中文句子产出命名实体识别结果。用户可以自行尝试输入中文句子。具体调用方式请参考代码示例。

模型局限性以及可能的偏差

本模型支持实体类别较多，部分类别由于训练数据原因可能效果较低，请用户自行评测后决定如何使用。

如何使用

在安装ModelScope完成之后, 可结合官方工具AdaSeq体验相关能力, 默认单句长度不超过256。

作为Encoder使用
参考Adaseq模型训练示例，只需将配置文件中，embedder参数改成MAOE即可, 可以参考：examples/bert_crf/configs/maoe_example.yaml：

model:
  type: sequence-labeling-model
  embedder:
    model_name_or_path: damo/nlp_maoe_named-entity-recognition_chinese-base-general

少样本实体识别模型训练

参考Adaseq模型训练示例, examples/twostage_ner/configs/msra.yaml:

指定训练数据

dataset:
  data_file:
    train: https://www.modelscope.cn/api/v1/datasets/damo/msra_ner/repo/files?Revision=master&FilePath=train.txt
    test: https://www.modelscope.cn/api/v1/datasets/damo/msra_ner/repo/files?Revision=master&FilePath=test.txt

修改encoder

model:
  type: sequence-labeling-model
  embedder:
    model_name_or_path: damo/nlp_maoe_named-entity-recognition_chinese-base-general

也可以在创空间快速尝试。https://www.modelscope.cn/studios/damo/maoe_fsl_ner/summary

细粒度实体识别

from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks

from adaseq import pipelines

pipeline_ins = pipeline(
    Tasks.named_entity_recognition, 'damo/nlp_maoe_named-entity-recognition_general'
)
print(pipeline_ins('姚明是中国篮协主席。'))
# {'output': [{'span': '姚明', 'type': '人物/人物', 'start': 0, 'end': 2}, {'span': '中国篮协主席', 'type': '人物/指代/职位', 'start': 3, 'end': 9}]}

实体标签介绍

本模型目前支持300+（目标1000+）种实体类型识别，具体标签集可以参考configuration.json文件。