对于一个输入视频,只需在第一帧图像中用矩形框指定待跟踪目标,单目跟踪算法将在整个视频帧中持续跟踪该目标,输出跟踪目标在所有图像帧中的矩形框信息。
本模型是基于ProContEXT方案的单目标跟踪框架,使用ViT作为主干网络进行训练,是One-Stream单目标跟踪算法。
本算法将特征抽取和匹配过程通过注意力机制同步进行,获得较高的目标跟踪精度。
该模型适用于视频单目标跟踪场景,目前在TrackingNet, GOT-10K开源数据集上达到SOTA水平。
from modelscope.utils.cv.image_utils import show_video_tracking_result
from modelscope.outputs import OutputKeys
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
video_single_object_tracking = pipeline(Tasks.video_single_object_tracking, model='damo/cv_vitb_video-single-object-tracking_procontext')
video_path = "https://modelscope.oss-cn-beijing.aliyuncs.com/test/videos/dog.avi"
init_bbox = [414, 343, 514, 449] # the initial object bounding box in the first frame [x1, y1, x2, y2]
result = video_single_object_tracking((video_path, init_bbox))
show_video_tracking_result(video_path, result[OutputKeys.BOXES], "./tracking_result.avi")
print("result is : ", result[OutputKeys.BOXES])
模型在GOT-10k和TrackingNet的测试集上客观指标如下:
Tracker | GOT-10K (AO) | TrackingNet (AUC) |
---|---|---|
ProContEXT | 74.6 | 84.6 |
本模型主要参考论文如下:
@article{ProContEXT,
author = {Jin{-}Peng Lan and
Zhi{-}Qi Cheng and
Jun{-}Yan He and
Chenyang Li and
Bin Luo and
Xu Bao and
Wangmeng Xiang and
Yifeng Geng and
Xuansong Xie},
title = {ProContEXT: Exploring Progressive Context Transformer for Tracking},
journal = {CoRR},
volume = {abs/2210.15511},
year = {2022},
url = {https://doi.org/10.48550/arXiv.2210.15511},
doi = {10.48550/arXiv.2210.15511},
}
git clone https://www.modelscope.cn/damo/cv_vitb_video-single-object-tracking_procontext.git