集成中

OFASys是什么

OFASys是一个面向多模态多任务统一学习的开源AI库，由达摩院M6团队开发。在这个系统下，我们训练了一个模型OFA+，首次支持了包括图文、语音、视频、动作等7种模态及其20多种多模态任务的统一训练和推断。

如何玩转OFASys

基础配置

注：OFASys目前还在快速迭代，现在以相对独立的方式实现了ModelScope的接口，因此需要独立安装环境。

首先安装ModelScope和OFASys:

# modelscope的notebook不需要安装modelscope
# !pip install modelscope -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html
!pip install https://xingchen-data.oss-cn-zhangjiakou.aliyuncs.com/maas/ofasys/ofasys-0.1.0-py3-none-any.whl

然后导入模型

from ofasys import ms_wrapper
from modelscope.pipelines import pipeline
pipe = pipeline('my-ofasys-task', model="damo/ofasys_multimodal_multitask_pretrain_base_en", model_revision='v1.0.0')

开启任务探索之旅

Image Captioning

instruction = '[IMAGE:img] <BOS> what does the image describe? <EOS> -> <BOS> [TEXT:cap] <EOS>'
data = {'img': "https://xingchen-data.oss-cn-zhangjiakou.aliyuncs.com/maas/ofasys/ic.jpeg"}
output = pipe(data, instruction=instruction)
print(output.text) # "a man and woman sitting in front of a laptop computer"

Visual Grounding

instruction = '[IMAGE:img] <BOS> which region does the text " [TEXT:cap] " describe? <EOS> -> [BOX:patch_boxes,add_bos,add_eos]'
data = {'img': "https://xingchen-data.oss-cn-zhangjiakou.aliyuncs.com/maas/ofasys/vg.jpg", "cap": "hand"}
output = pipe(data, instruction=instruction)
output.save_box("output.jpg")

Text Summarization

instruction = '<BOS> what is the summary of article " [TEXT:src] "? <EOS> -> <BOS> [TEXT:tgt] <EOS>'
data = {'src': "poland 's main opposition party tuesday endorsed president lech walesa in an upcoming "
        "presidential run-off election after a reformed communist won the first round of voting ."}
output = pipe(data, instruction=instruction)
print(output.text) # "polish opposition endorses walesa in presidential run-off"

Table-to-Text Generation

instruction = '<BOS> structured knowledge: " [STRUCT:database,uncased] "  . how to describe the tripleset ? <EOS> -> <BOS> [TEXT:tgt] <EOS>'
data = {
     'database': [['Atlanta', 'OFFICIAL_POPULATION', '5,457,831'],
                  ['[TABLECONTEXT]', 'METROPOLITAN_AREA', 'Atlanta'],
                  ['5,457,831', 'YEAR', '2012'],
                  ['[TABLECONTEXT]', '[TITLE]', 'List of metropolitan areas by population'],
                  ['Atlanta', 'COUNTRY', 'United States'],
     ]
 }
output = pipe(data, instruction=instruction, beam_size=1)
print(output.text) # "atlanta is the metropolitan area in the united states in 2012."

Text-to-SQL Generation

instruction = '<BOS> " [TEXT:src] " ; structured knowledge: " [STRUCT:database,max_length=876] " . generating sql code. <EOS> -> <BOS> [TEXT:tgt] <EOS>'
database = [
             ['concert_singer'],
             ['stadium', 'stadium_id , location , name , capacity , highest , lowest , average'],
             ['singer', 'singer_id , name , country , song_name , song_release_year , age , is_male'],
             ['concert', 'concert_id , concert_name , theme , stadium_id , year'],
             ['singer_in_concert', 'concert_id , singer_id']
 ]
data = [
     {'src': 'What are the names, countries, and ages for every singer in descending order of age?', 'database': database},
     {'src': 'What are all distinct countries where singers above age 20 are from?', 'database': database},
     {'src': 'What are the locations and names of all stations with capacity between 5000 and 10000?', 'database': database}
 ]
output = pipe(data, instruction=instruction)
print('\n'.join([o.text for o in output]))
# "select name, country, age from singer order by age desc"
# "select distinct country from singer where age > 20"
# "select location, name from stadium where capacity between 5000 and 10000"

Video Captioning

instruction = '[VIDEO:video] <BOS> what does the video describe? <EOS> -> <BOS> [TEXT:cap] <EOS>'
data = {'video': 'https://xingchen-data.oss-cn-zhangjiakou.aliyuncs.com/maas/ofasys/video7021.mp4'}
output = pipe(data, instruction=instruction)
print(output.text) # "a baseball player is hitting a ball"

Speech-to-Text Generation

instruction = '[AUDIO:wav] <BOS> what is the text corresponding to the voice? <EOS> -> [TEXT:text,preprocess=text_phone,add_bos,add_eos]'
data = {'wav': 'https://xingchen-data.oss-cn-zhangjiakou.aliyuncs.com/maas/ofasys/1272-128104-0001.flac'}
output = pipe(data, instruction=instruction)
print(output.text) # "nor is mister klohs manner less interesting than his manner"

Text-to-Image Generation

instruction = 'what is the complete image? caption: [TEXT:text]"? -> [IMAGE,preprocess=image_vqgan,adaptor=image_vqgan]'
data = {'text': "a city with tall buildings and a large green park."}
output = pipe(data, instruction=instruction)
output[0].save_image('0.png')

模型局限性以及可能的偏差

训练数据集自身有局限，有可能产生一些偏差，请用户自行评测后决定如何使用。

集成中

OFASys是什么

如何玩转OFASys

基础配置

开启任务探索之旅

Image Captioning

Visual Grounding

Text Summarization

Table-to-Text Generation

Text-to-SQL Generation

Video Captioning

Speech-to-Text Generation

Text-to-Image Generation

模型局限性以及可能的偏差

相关论文以及引用