Documentation  |  Paper |  Blog   |  GitHub  
OFASys是一个面向多模态多任务统一学习的开源AI库,由达摩院M6团队开发。在这个系统下,我们训练了一个模型OFA+,首次支持了包括图文、语音、视频、动作等7种模态及其20多种多模态任务的统一训练和推断。
注:OFASys目前还在快速迭代,现在以相对独立的方式实现了ModelScope的接口,因此需要独立安装环境。
# modelscope的notebook不需要安装modelscope
# !pip install modelscope -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html
!pip install https://xingchen-data.oss-cn-zhangjiakou.aliyuncs.com/maas/ofasys/ofasys-0.1.0-py3-none-any.whl
from ofasys import ms_wrapper
from modelscope.pipelines import pipeline
pipe = pipeline('my-ofasys-task', model="damo/ofasys_multimodal_multitask_pretrain_base_en", model_revision='v1.0.0')
instruction = '[IMAGE:img] <BOS> what does the image describe? <EOS> -> <BOS> [TEXT:cap] <EOS>'
data = {'img': "https://xingchen-data.oss-cn-zhangjiakou.aliyuncs.com/maas/ofasys/ic.jpeg"}
output = pipe(data, instruction=instruction)
print(output.text) # "a man and woman sitting in front of a laptop computer"
instruction = '[IMAGE:img] <BOS> which region does the text " [TEXT:cap] " describe? <EOS> -> [BOX:patch_boxes,add_bos,add_eos]'
data = {'img': "https://xingchen-data.oss-cn-zhangjiakou.aliyuncs.com/maas/ofasys/vg.jpg", "cap": "hand"}
output = pipe(data, instruction=instruction)
output.save_box("output.jpg")
instruction = '<BOS> what is the summary of article " [TEXT:src] "? <EOS> -> <BOS> [TEXT:tgt] <EOS>'
data = {'src': "poland 's main opposition party tuesday endorsed president lech walesa in an upcoming "
"presidential run-off election after a reformed communist won the first round of voting ."}
output = pipe(data, instruction=instruction)
print(output.text) # "polish opposition endorses walesa in presidential run-off"
instruction = '<BOS> structured knowledge: " [STRUCT:database,uncased] " . how to describe the tripleset ? <EOS> -> <BOS> [TEXT:tgt] <EOS>'
data = {
'database': [['Atlanta', 'OFFICIAL_POPULATION', '5,457,831'],
['[TABLECONTEXT]', 'METROPOLITAN_AREA', 'Atlanta'],
['5,457,831', 'YEAR', '2012'],
['[TABLECONTEXT]', '[TITLE]', 'List of metropolitan areas by population'],
['Atlanta', 'COUNTRY', 'United States'],
]
}
output = pipe(data, instruction=instruction, beam_size=1)
print(output.text) # "atlanta is the metropolitan area in the united states in 2012."
instruction = '<BOS> " [TEXT:src] " ; structured knowledge: " [STRUCT:database,max_length=876] " . generating sql code. <EOS> -> <BOS> [TEXT:tgt] <EOS>'
database = [
['concert_singer'],
['stadium', 'stadium_id , location , name , capacity , highest , lowest , average'],
['singer', 'singer_id , name , country , song_name , song_release_year , age , is_male'],
['concert', 'concert_id , concert_name , theme , stadium_id , year'],
['singer_in_concert', 'concert_id , singer_id']
]
data = [
{'src': 'What are the names, countries, and ages for every singer in descending order of age?', 'database': database},
{'src': 'What are all distinct countries where singers above age 20 are from?', 'database': database},
{'src': 'What are the locations and names of all stations with capacity between 5000 and 10000?', 'database': database}
]
output = pipe(data, instruction=instruction)
print('\n'.join([o.text for o in output]))
# "select name, country, age from singer order by age desc"
# "select distinct country from singer where age > 20"
# "select location, name from stadium where capacity between 5000 and 10000"
instruction = '[VIDEO:video] <BOS> what does the video describe? <EOS> -> <BOS> [TEXT:cap] <EOS>'
data = {'video': 'https://xingchen-data.oss-cn-zhangjiakou.aliyuncs.com/maas/ofasys/video7021.mp4'}
output = pipe(data, instruction=instruction)
print(output.text) # "a baseball player is hitting a ball"
instruction = '[AUDIO:wav] <BOS> what is the text corresponding to the voice? <EOS> -> [TEXT:text,preprocess=text_phone,add_bos,add_eos]'
data = {'wav': 'https://xingchen-data.oss-cn-zhangjiakou.aliyuncs.com/maas/ofasys/1272-128104-0001.flac'}
output = pipe(data, instruction=instruction)
print(output.text) # "nor is mister klohs manner less interesting than his manner"
instruction = 'what is the complete image? caption: [TEXT:text]"? -> [IMAGE,preprocess=image_vqgan,adaptor=image_vqgan]'
data = {'text': "a city with tall buildings and a large green park."}
output = pipe(data, instruction=instruction)
output[0].save_image('0.png')
训练数据集自身有局限,有可能产生一些偏差,请用户自行评测后决定如何使用。
如果你觉得OFASys好用,喜欢我们的工作,欢迎引用:
@article{bai2022ofasys,
author = {
Jinze Bai and
Rui Men and
Hao Yang and
Xuancheng Ren and
Kai Dang and
Yichang Zhang and
Xiaohuan Zhou and
Peng Wang and
Sinan Tan and
An Yang and
Zeyu Cui and
Yu Han and
Shuai Bai and
Wenbin Ge and
Jianxin Ma and
Junyang Lin and
Jingren Zhou and
Chang Zhou},
title = {OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models},
journal = {CoRR},
volume = {abs/2212.04408},
year = {2022}
}