输入一张室内空间的全景RGB图像,室内框架估计算法将输出房间的墙线,天花板线跟地线
PanoViT 的网络框架可以分为backbone、全景视觉变换器编码器(Vision transformer encoder)、边缘增强模块(Edge enhancement module)和布局预测模块(Layout prediction module)。一张全景图送到backbone提取多尺度特征图,送到边缘增强模块得到边缘增强图。全景视觉转换器编码器以原始图像、边缘增强图和多尺度特征图为输入,输出特征向量供布局预测模块估计房间布局。网络的结构如图所示
import cv2
from modelscope.outputs import OutputKeys
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
task = Tasks.indoor_layout_estimation
model_id = 'damo/cv_panovit_indoor-layout-estimation'
input_location = 'data/test/images/indoor_layout_estimation.png'
estimator = pipeline(Tasks.indoor_layout_estimation, model=model_id)
result = estimator(input_location)
layout_vis = result[OutputKeys.LAYOUT]
cv2.imwrite('layout.jpg', layout_vis)
输入图像与mattroport3D数据集一致,为512*1024的全景图像。
在Mattroport上的结果为
@article{shen2022panovit,
title={PanoViT: Vision Transformer for Room Layout Estimation from a Single Panoramic Image},
author={Shen, Weichao and Dong, Yuan and Chen, Zonghao and Zhao, Zhengyi and Gao, Yang and Liu, Zhu},
journal={arXiv preprint arXiv:2212.12156},
year={2022}
}