cino-large-v2 - 创业邦

CINO: Pre-trained Language Models for Chinese Minority Languages

Multilingual Pre-trained Language Model, such as mBERT, XLM-R, provide multilingual and cross-lingual ability for language understanding.
We have seen rapid progress on building multilingual PLMs in recent year.
However, there is a lack of contributions on building PLMs on Chines minority languages, which hinders researchers from building powerful NLP systems.

To address the absence of Chinese minority PLMs, Joint Laboratory of HIT and iFLYTEK Research (HFL) proposes CINO (Chinese-miNOrity pre-trained language model), which is built on XLM-R with additional pre-training using Chinese minority corpus, such as

Please read our GitHub repository for more details (Chinese): https://github.com/ymcui/Chinese-Minority-PLM

You may also interested in,

Chinese MacBERT: https://github.com/ymcui/MacBERT
Chinese BERT series: https://github.com/ymcui/Chinese-BERT-wwm
Chinese ELECTRA: https://github.com/ymcui/Chinese-ELECTRA
Chinese XLNet: https://github.com/ymcui/Chinese-XLNet
Knowledge Distillation Toolkit - TextBrewer: https://github.com/airaria/TextBrewer

More resources by HFL: https://github.com/ymcui/HFL-Anthology

代码范例

from modelscope.models import Model

from modelscope.pipelines import pipeline

from modelscope.utils.constant import Tasks

if __name__ == '__main__':
    task = Tasks.fill_mask
    sentence1 = '巴黎是<mask>国的首都。'
    model_id = 'dienstag/cino-large-v2'
    model = Model.from_pretrained(model_id)
    pipeline_ins = pipeline(task=Tasks.fill_mask,
                            model=model,
                            model_revision='v1.0.0')
    print(pipeline_ins(input=sentence1))