给定一个输入视频,输出视频每一帧的实例分割掩膜,类别,分数(虚拟分数),矩形框和跟踪的id。
实例分割是要分割出图像中的things。things是指可数的物体,例如人,车,猫等。
如上图所示,模型包含backbone,neck和 KernelUpdateHeads三个部分。
本模型适用范围较广,能对图片中包含的大部分感兴趣物体(YouTube DataSet 40类)进行分割。
在ModelScope框架上,提供输入视频,即可通过简单的Pipeline调用来使用。
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
model_id = 'damo/cv_swinb_video-instance-segmentation'
input_url = 'https://modelscope.oss-cn-beijing.aliyuncs.com/test/videos/kitti-step_testing_image_02_0000.mp4'
seg_pipeline = pipeline(Tasks.video_instance_segmentation, model=model_id)
result = seg_pipeline(input_url)
测试时主要的预处理如下:
Backbone | Pretrain | AP | AR_10 |
---|---|---|---|
swinb (deformable fpn) | ImageNet-21K | 54.1 | 59.9 |
@inproceedings{li2022video,
title={Video k-net: A simple, strong, and unified baseline for video segmentation},
author={Li, Xiangtai and Zhang, Wenwei and Pang, Jiangmiao and Chen, Kai and Cheng, Guangliang and Tong, Yunhai and Loy, Chen Change},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={18847--18857},
year={2022}
}
git clone https://www.modelscope.cn/damo/cv_swinb_video-instance-segmentation.git