本方法采用Transformer-CRF模型,使用XLM-Roberta作为预训练模型底座,结合使用外部工具召回的相关句子作为额外上下文,使用Multi-view Training方式进行训练。
模型结构如下图所示:
可参考论文:Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning
本模型主要用于给输入英语句子产出命名实体识别结果。用户可以自行尝试输入英语句子。具体调用方式请参考代码示例。
在安装ModelScope完成之后即可使用chunking(组块分析)的能力, 默认单句长度不超过512。
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
ner_pipeline = pipeline(Tasks.named_entity_recognition, 'damo/nlp_raner_chunking_english-large')
result = ner_pipeline('Confidence in the pound is widely expected to take another sharp dive if trade figures for September.')
print(result)
# {'output': [{'type': 'NP', 'start': 0, 'end': 10, 'span': 'Confidence'}, {'type': 'PP', 'start': 11, 'end': 13, 'span': 'in'}, {'type': 'NP', 'start': 14, 'end': 23, 'span': 'the pound'}, {'type': 'VP', 'start': 24, 'end': 50, 'span': 'is widely expected to take'}, {'type': 'NP', 'start': 51, 'end': 69, 'span': 'another sharp dive'}, {'type': 'SBAR', 'start': 70, 'end': 72, 'span': 'if'}, {'type': 'NP', 'start': 73, 'end': 78, 'span': 'trade'}, {'type': 'NP', 'start': 79, 'end': 86, 'span': 'figures'}, {'type': 'PP', 'start': 87, 'end': 90, 'span': 'for'}, {'type': 'NP', 'start': 91, 'end': 101, 'span': 'September.'}]}
本模型基于conll2000数据集上训练,在垂类领域英语文本上的chunking效果会有降低,请用户自行评测后决定如何使用。
chunk类型 | 英文名 |
---|---|
名词短语 | NP |
动词短语 | VP |
介词短语 | PP |
副词短语 | ADVP |
从句 | SBAR |
形容词短语 | ADJP |
小品词 | PRT |
连词短语 | CONJP |
感叹词 | INTJ |
序列标记 | LST |
非协调短语 | UCP |
模型在conll2000测试数据评估结果:
Dataset | Precision | Recall | F1 |
---|---|---|---|
conll2000 | 97.15 | 97.21 | 97.18 |
如果你觉得这个该模型对有所帮助,请考虑引用下面的相关的论文:
@inproceedings{wang-etal-2021-improving,
title = "Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning",
author = "Wang, Xinyu and
Jiang, Yong and
Bach, Nguyen and
Wang, Tao and
Huang, Zhongqiang and
Huang, Fei and
Tu, Kewei",
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
month = aug,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.acl-long.142",
pages = "1800--1812",
}
@inproceedings{wang-etal-2022-damo,
title = "{DAMO}-{NLP} at {S}em{E}val-2022 Task 11: A Knowledge-based System for Multilingual Named Entity Recognition",
author = "Wang, Xinyu and
Shen, Yongliang and
Cai, Jiong and
Wang, Tao and
Wang, Xiaobin and
Xie, Pengjun and
Huang, Fei and
Lu, Weiming and
Zhuang, Yueting and
Tu, Kewei and
Lu, Wei and
Jiang, Yong",
booktitle = "Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)",
month = jul,
year = "2022",
address = "Seattle, United States",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.semeval-1.200",
pages = "1457--1468",
}