鲁棒图像识别介绍

鲁棒图像识别要求模型在带噪声或分布外图像输入上依然返回正确分类结果。

模型描述

离散对抗训练 (Discrete Adversarial Training, DAT) 是一种针对视觉模型的鲁棒训练方法。DAT通过VQ-GAN将训练图像离散化为一组视觉词 (Visual Words)，并对抗性的修改视觉词产生用于训练的离散对抗样本。通过离散对抗样本训练的模型，可在维持正常性能不下降的情况下，显著提升其鲁棒性，在对应对抗样本或者自然图像分布外图像输入时具有更好识别能力。离散对抗训练可适用于多个视觉任务 (分类，检测等)，多种训练设定 (从零训练，微调等)。

本模型是基于掩码自编码器 (Masked AutoEncoder, MAE) 预训练的ViT-H/14，通过离散对抗训练 (DAT) 微调后得到的鲁棒图像识别模型。模型仅使用了ImageNet-1K数据集，通过EasyRobust框架训练。

论文链接：http://arxiv.org/abs/2209.07735

训练代码：https://github.com/alibaba/easyrobust/tree/main/examples/imageclassification/imagenet/dat

使用模型

from modelscope.pipelines import pipeline
robust_image_classification = pipeline('image-classification', model='aaig/easyrobust-models')
result = robust_image_classification('data/test/images/bird.JPEG')
print(result)

训练数据介绍

ImageNet

模型训练流程

暂时不支持通过ModelScope接口进行训练。如您需要详细的训练流程请参照EasyRobust中的实现：https://github.com/alibaba/easyrobust/tree/main/examples/imageclassification/imagenet/dat

数据评估及结果

Model	ImageNet-Val	V2	C (mCE↓)	R	A	Sketch	Stylized	Files
MAE-ViT-H	87.02%	78.82%	31.40	65.61%	68.92%	50.03%	32.77%	ckpt

相关论文以及引用信息

如果你觉得这个该模型对有所帮助，请考虑引用下面的相关的论文：

@inproceedings{mao2022enhance,
  title={Enhance the Visual Representation via Discrete Adversarial Training},
  author={Mao, Xiaofeng and Chen, YueFeng and Duan, Ranjie and Zhu, Yao and Qi, Gege and Ye, Shaokai and Li, Xiaodan and Zhang, Rong and others},
  booktitle={Advances in Neural Information Processing Systems},
  year={2022}
}