如果你想找出某个物体在图片上的位置,你只需要输入对这个物体的描述,比如“a blue turtle-like pokemon with round head”, OFA模型便能框出它的所在位置。本页面右侧提供了在线体验的服务,欢迎使用!
本系还有如下模型,欢迎试用:
玩转OFA只需区区以下数行代码,就是如此轻松!如果你觉得还不够方便,请点击右上角Notebook
按钮,我们为你提供了配备好的环境(可选CPU/GPU),你只需要在notebook里输入提供的代码,就可以把OFA玩起来了!
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
from modelscope.outputs import OutputKeys
img_visual_grounding = pipeline(
Tasks.visual_grounding,
model='damo/ofa_visual-grounding_refcoco_distilled_en')
image = 'https://xingchen-data.oss-cn-zhangjiakou.aliyuncs.com/maas/visual-grounding/visual_grounding.png'
text = 'a blue turtle-like pokemon with round head'
input = {'image': image, 'text': text}
result = img_visual_grounding(input)
print(result[OutputKeys.BOXES])
OFA(One-For-All)是通用多模态预训练模型,使用简单的序列到序列的学习框架统一模态(跨模态、视觉、语言等模态)和任务(如图片生成、视觉定位、图片描述、图片分类、文本生成等),详见我们发表于ICML 2022的论文:OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework,以及我们的官方Github仓库https://github.com/OFA-Sys/OFA。
Github  |  Paper   |  Blog
本模型是在对OFA的large模型版本基础上通过知识蒸馏进行轻量化而得到的tiny版本模型(参数量33M),方便用户部署在存储和计算资源受限的设备上。蒸馏框架介绍,详见论文:Knowledge Distillation of Transformer-based Language Models Revisited,以及我们的官方Github仓库https://github.com/OFA-Sys/OFA-Compress
模型效果如下:
Task | RefCOCO | RefCOCO+ | RefCOCOg |
---|---|---|---|
Metric | Acc@0.5 | ||
Split | val / test-a / test-b | val / test-a / test-b | val-u / test-u |
OFA-tiny (直接finetune得到) |
80.20 / 84.07 / 75.00 | 68.22 / 75.13 / 57.66 | 72.02 / 69.74 |
OFA-distill-tiny (通过蒸馏得到) |
81.29 / 85.18 / 75.29 | 71.28 / 77.08 / 61.13 | 72.08 / 71.67 |
本模型训练数据集是refcoco数据集。
开发中,敬请等待。
训练数据集自身有局限,有可能产生一些偏差,请用户自行评测后决定如何使用。
如果你觉得这个该模型对有所帮助,请考虑引用下面的相关的论文:
@article{Lu2022KnowledgeDO,
author = {Chengqiang Lu and
Jianwei Zhang and
Yunfei Chu and
Zhengyu Chen and
Jingren Zhou and
Fei Wu and
Haiqing Chen and
Hongxia Yang},
title = {Knowledge Distillation of Transformer-based Language Models Revisited},
journal = {ArXiv},
volume = {abs/2206.14366}
year = {2022}
}
@article{wang2022ofa,
author = {Peng Wang and
An Yang and
Rui Men and
Junyang Lin and
Shuai Bai and
Zhikang Li and
Jianxin Ma and
Chang Zhou and
Jingren Zhou and
Hongxia Yang},
title = {OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence
Learning Framework},
journal = {CoRR},
volume = {abs/2202.03052},
year = {2022}
}