基于向量量化的神经辐射场压缩
当前主流的可实时推理的NeRF方法,如Plenoxel、Dvgo存在模型存储消耗巨大的问题,一个模型需要上百兆,不利于NeRF渲染方案的实用化。本项目使用了向量量化技术对NeRF模型进行了压缩,大大减小了模型大小,同时保持了较好的重建效果。
  • 模型资讯
  • 模型资料

基于向量量化的神经网络辐射场压缩

当前主流的可实时推理的NeRF方法,如Plenoxel、Dvgo存在模型存储消耗巨大的问题,一个模型需要上百兆,不利于NeRF渲染方案的实用化。本项目使用了向量量化技术对NeRF模型进行了压缩,大大减小了模型大小,同时保持了较好的重建效果。

Pipeline

Clone with HTTP

 git clone https://www.modelscope.cn/DAMOXR/cv_nerf_3d-reconstruction_vector-quantize-compression.git

方法描述

本方法通过对当前NeRF模型存储内容的分析,发现模型中存在大量冗余的体素(Voxel),因此首先分析体素的重要性,又针对不同的场景采用自适应的阈值来划分体素的重要程度。对于重要的体素,我们使用向量量化的方法对其进行压缩,从而达到减小模型存储大小的目的。

在多个数据集上的渲染可视化效果也表明了对效果几乎没有影响。

Compare

使用方式和范围

适用范围: 当前支持两种数据类型,分别是nerf_syntheticLLFF

运行环境:

  • 模型只支持在GPU上运行,已在V100,A100,RTX3090上测试通过,具体效率与显卡性能相关
  • GPU 显存需求: 8G以上

使用方法

推理代码范例

Blender 数据集:

import os
from modelscope.msdatasets import MsDataset
from modelscope.outputs import OutputKeys
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
from modelscope.utils.test_utils import test_level

model_id = 'DAMOXR/cv_nerf_3d-reconstruction_vector-quantize-compression'
pretrained_model = 'ficus_demo.pt'
data_dir = MsDataset.load('nerf_recon_dataset', namespace='damo',
                    split='train').config_kwargs['split_config']['train']
nerf_synthetic_dataset = os.path.join(data_dir, 'nerf_synthetic')
blender_scene = 'ficus'
data_dir = os.path.join(nerf_synthetic_dataset, blender_scene)

nerf_vq_compression = pipeline(Tasks.nerf_recon_vq_compression, 
                                 model=model_id,
                                 dataset_name='blender',
                                 data_dir=data_dir,
                                 downsample=1,
                                 ndc_ray=False,
                                 ckpt_path=pretrained_model
                                 )
render_dir = os.path.join('./exp/', blender_scene)

# For evaluation test dataset to get metrics like psnr, ssim, N_vis is the number of test images
nerf_vq_compression(dict(test_mode='evaluation_test', render_dir=render_dir, N_vis=5))

# For rendering novel views, N_vis is the number of views
nerf_vq_compression(dict(test_mode='render_path', render_dir=render_dir, N_vis=30))

LLFF 数据集:

model_id = 'DAMOXR/cv_nerf_3d-reconstruction_vector-quantize-compression'
pretrained_model = 'fern_demo.pt'
data_dir = MsDataset.load(
    'DAMOXR/nerf_llff_data',
    subset_name='default',
    split='test',
).config_kwargs['split_config']['test']
nerf_llff = os.path.join(data_dir, 'nerf_llff_data')
llff_scene = 'fern'
data_dir = os.path.join(nerf_llff, llff_scene)

nerf_vq_compression = pipeline(Tasks.nerf_recon_vq_compression, 
                              model=model_id,
                              dataset_name='llff',
                              data_dir=data_dir,
                              downsample=4,
                              ndc_ray=True,
                              ckpt_path=pretrained_model
                              )
render_dir = os.path.join('./exp/', llff_scene)

# For evaluation test dataset to get metrics like psnr, ssim, N_vis is the number of test images
nerf_vq_compression(dict(test_mode='evaluation_test', render_dir=render_dir, N_vis=5))

# For rendering novel views, N_vis is the number of views
nerf_vq_compression(dict(test_mode='render_path', render_dir=render_dir, N_vis=10))

数据评估及结果

Blender 数据集

压缩前:

Scene PSNR SSIM LPIPS_ALEX LPIPS_VGG SIZE
chair 35.7760 0.9846 0.0095 0.0215 67.8907
drums 26.0150 0.9370 0.0494 0.0718 67.7655
ficus 34.1024 0.9826 0.0122 0.0222 70.7913
hotdog 37.5654 0.9828 0.0129 0.0301 84.0902
lego 36.5248 0.9834 0.0072 0.0178 68.8282
materials 30.1193 0.9523 0.0259 0.0584 84.6609
mic 34.9041 0.9885 0.0078 0.0144 66.6982
ship 30.7121 0.8941 0.0831 0.1376 70.9092
Mean 33.2149 0.9632 0.0260 0.0467 72.7043

VQRF压缩后:

Scene PSNR SSIM LPIPS_ALEX LPIPS_VGG SIZE
chair 35.1320 0.9806 0.0185 0.0355 5.6044
drums 25.9492 0.9316 0.0624 0.0983 4.9731
ficus 33.8504 0.9813 0.0136 0.0288 6.0117
hotdog 37.2531 0.9807 0.0154 0.0367 9.6663
lego 35.9736 0.9808 0.0087 0.0242 6.1286
materials 30.0846 0.9511 0.0279 0.0617 8.2645
mic 34.3908 0.9861 0.0128 0.0241 3.9262
ship 30.5233 0.8908 0.0852 0.1430 7.0612
Mean 32.8946 0.9604 0.0306 0.0565 6.4545

LLFF 数据集

压缩前:

Scene PSNR SSIM LPIPS_ALEX LPIPS_VGG SIZE
fern 24.9985 0.7981 0.1595 0.2513 179.9157
flower 28.1521 0.8574 0.1043 0.1775 179.8139
room 32.1206 0.9514 0.0770 0.1622 179.8682
leaves 21.1095 0.7439 0.1425 0.2215 179.6977
horns 28.3430 0.8835 0.1027 0.1801 179.8110
trex 27.6879 0.9107 0.0793 0.2019 179.8421
fortress 31.4531 0.8982 0.0666 0.1425 179.8664
orchids 19.8818 0.6468 0.1914 0.2786 179.9021
Mean 26.7183 0.8362 0.1154 0.2019 179.8396

VQRF压缩后:

Scene PSNR SSIM LPIPS_ALEX LPIPS_VGG SIZE
fern 24.7626 0.7861 0.1725 0.2687 16.4782
flower 27.8500 0.8421 0.1183 0.2032 16.6604
room 31.7010 0.9446 0.0900 0.1842 16.6127
leaves 20.9996 0.7262 0.1580 0.2559 16.4771
horns 27.8376 0.8613 0.1289 0.2195 16.3832
trex 27.1907 0.8979 0.0930 0.2256 16.4172
fortress 31.0268 0.8764 0.1027 0.1895 17.7717
orchids 19.7541 0.6346 0.2073 0.2984 16.3075
Mean 26.3903 0.8212 0.1338 0.2306 16.6385

方法引用

@inproceedings{li2023compressing,
  title={Compressing volumetric radiance fields to 1 mb},
  author={Li, Lingzhi and Shen, Zhen and Wang, Zhongshu and Shen, Li and Bo, Liefeng},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={4222--4231},
  year={2023}
}