
想针对图片提问担心得不到好的答案?OFA模型一定能帮你!你只需要输入任意1张你的图片以及你的问题,3秒内就能收获相应的答案。本页面右侧提供了在线体验的服务,欢迎使用!
本系列还有如下模型,欢迎试用:
玩转OFA只需区区以下数行代码,就是如此轻松!如果你觉得还不够方便,请点击右上角Notebook按钮,我们为你提供了配备了的环境,CPU和GPU都支持哦,你只需要在notebook里输入提供的代码,就可以把OFA玩起来了!
     
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
from modelscope.outputs import OutputKeys
from modelscope.preprocessors.multi_modal import OfaPreprocessor
model = 'damo/ofa_visual-question-answering_pretrain_huge_en'
preprocessor = OfaPreprocessor(model_dir=model)
ofa_pipe = pipeline(
    Tasks.visual_question_answering,
    model=model,
    model_revision='v1.0.1',
    preprocessor=preprocessor)
image = 'https://xingchen-data.oss-cn-zhangjiakou.aliyuncs.com/maas/visual-question-answering/visual_question_answering.png'
text = 'what is grown on the plant?'
input = {'image': image, 'text': text}
result = ofa_pipe(input)
print(result[OutputKeys.TEXT]) # ' money'
OFA(One-For-All)是通用多模态预训练模型,使用简单的序列到序列的学习框架统一模态(跨模态、视觉、语言等模态)和任务(如图片生成、视觉定位、图片描述、图片分类、文本生成等),详见我们发表于ICML 2022的论文:OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework以及我们的官方Github仓库https://github.com/OFA-Sys/OFA。
    
    
    
Github  |  Paper   |  Blog
    
        
    
| Model | Params-en | Params-zh | Backbone | Hidden size | Intermediate size | Num. of heads | Enc layers | Dec layers | 
|---|---|---|---|---|---|---|---|---|
| OFATiny | 33M | - | ResNet50 | 256 | 1024 | 4 | 4 | 4 | 
| OFAMedium | 93M | - | ResNet101 | 512 | 2048 | 8 | 4 | 4 | 
| OFABase | 180M | 160M | ResNet101 | 768 | 3072 | 12 | 6 | 6 | 
| OFALarge | 470M | 440M | ResNet152 | 1024 | 4096 | 16 | 12 | 12 | 
| OFAHuge | 930M | - | ResNet152 | 1280 | 5120 | 16 | 24 | 12 | 
OFA在视觉问答(VQA)任务的VQA 2.0上取得和近期大模型CoCa同等表现(想看榜单,点这里),具体如下:
| Task | VQA | |
|---|---|---|
| Split | test-dev | test-std | 
| OFABase | 78.0 | 78.1 | 
| OFALarge | 80.4 | 80.7 | 
| OFAHuge | 82.0 | 82.0 | 
本模型训练数据集是预训练数据集。
模型及finetune细节请参考OFA Tutorial 1.4节。
import tempfile
from modelscope.msdatasets import MsDataset
from modelscope.metainfo import Trainers
from modelscope.trainers import build_trainer
from modelscope.utils.constant import DownloadMode
from modelscope.utils.hub import snapshot_download
train_dataset = MsDataset(
    MsDataset.load('vqa_trial', subset_name='vqa_trial', split="train").remap_columns({
        'image': 'image',
        'question': 'text',
        'answer': 'answer'
    }))
test_dataset = MsDataset(
    MsDataset.load('vqa_trial', subset_name='vqa_trial', split="test").remap_columns({
        'image': 'image',
        'question': 'text',
        'answer': 'answer'
    }))
def cfg_modify_fn(cfg):
    cfg.train.hooks = [{
        'type': 'CheckpointHook',
        'interval': 2
    }, {
        'type': 'TextLoggerHook',
        'interval': 1
    }, {
        'type': 'IterTimerHook'
    }]
    cfg.train.dataloader.batch_size_per_gpu = 1
    cfg.train.max_epochs=2
    return cfg
args = dict(
    model='damo/ofa_visual-question-answering_pretrain_huge_en',
    model_revision='v1.0.1',
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    cfg_modify_fn=cfg_modify_fn,
    work_dir = tempfile.TemporaryDirectory().name)
trainer = build_trainer(name=Trainers.ofa, default_args=args)
trainer.train()
为了广泛的的适用性,本OFA模型是预训练ckpt,没有经过VQA 2.0进行finetune,直接在VQA上面进行测试可能出现效果不理想的情况。
我们一直在努力实现更好的模型提供给用户,如有需求欢迎加入modelscope群联系!
如果你觉得OFA好用,喜欢我们的工作,欢迎引用:
@article{wang2022ofa,
  author    = {Peng Wang and
               An Yang and
               Rui Men and
               Junyang Lin and
               Shuai Bai and
               Zhikang Li and
               Jianxin Ma and
               Chang Zhou and
               Jingren Zhou and
               Hongxia Yang},
  title     = {OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence
               Learning Framework},
  journal   = {CoRR},
  volume    = {abs/2202.03052},
  year      = {2022}
}