Exploring Plain Vision Transformer Backbones for Object Detection文章复现,采用COCO数据集训练。
本模型适用范围较广,能对图片中包含的大部分前景物体(COCO 80类)进行定位。
在ModelScope框架上,提供输入图片,即可以通过简单的Pipeline调用使用当前模型。
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
object_detect = pipeline(Tasks.image_object_detection,model='damo/cv_vit_object-detection_coco')
img_path ='https://modelscope.oss-cn-beijing.aliyuncs.com/test/images/image_detection.jpg'
result = object_detect(img_path)
print(result)
Backbone | Pretrain | box mAP | mask mAP | Remark |
---|---|---|---|---|
ViT-Base | ImageNet-1k | 51.6 | 45.9 | official |
ViT-Base | ImageNet-1k | 51.1 | 45.5 | unofficial |
ViT-Base | ImageNet-1k | 51.4 | 45.7 | modelscope |
@article{Li2022ExploringPV,
title={Exploring Plain Vision Transformer Backbones for Object Detection},
author={Yanghao Li and Hanzi Mao and Ross B. Girshick and Kaiming He},
journal={ArXiv},
year={2022},
volume={abs/2203.16527}
}