该模型为文本生成360度全景图像模型,输入描述文本,实现端到端360度全景图生成。
本模型基于多阶段文本到图像生成扩散模型, 输入描述文本,返回符合文本描述的360度全景图像。仅支持英文输入。
例如,输入 “A living room.”,可能会得到如下图像:
输入 “The Mountains.”,可能会得到如下图像:
输入 “The Times Square.”,可能会得到如下图像:
该模型基于Stable Diffusion v2.1, ControlNet v1.1 与diffusers进行构建。
在 ModelScope 框架上,提供输入文本,即可以通过简单的 Pipeline 调用来使用360全景图生成模型。
conda create -n panogen python=3.8
conda activate panogen
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
pip install modelscope
pip install "modelscope[cv]" -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html
官方链接:https://github.com/xinntao/Real-ESRGAN#installation
pip install realesrgan==0.3.0
pip install -U diffusers==0.18.0
pip install xformers==0.0.16
pip install triton, accelerate, transformers
import cv2
from modelscope.outputs import OutputKeys
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
prompt = "The living room."
input = {
'prompt': prompt,
}
txt2panoimg = pipeline(Tasks.text_to_360panorama_image,
model='damo/cv_diffusion_text-to-360panorama-image_generation')
output = txt2panoimg(input)[OutputKeys.OUTPUT_IMG]
cv2.imwrite('result.png', output)
Pipeline初始化参数
Pipeline调用参数
"num_inference_steps": 20,
"guidance_scale": 7.5,
"add_prompt": "photorealistic, trend on artstation, ((best quality)), ((ultra high res))",
"negative_prompt": "persons, complex texture, small objects, sheltered, blur, worst quality, low quality, zombie, logo, text, watermark, username, monochrome, complex lighting",
"seed": -1,
"upscale": True,
"refinement": True
本方案将360全景图视作一种风格图像,采用DreamBooth方法,使用约2000张360全景图像进行风格模型finetuning,总共训练40个epochs。
本算法模型源自一些开源项目:
全景图数据来源
如果你觉得这个模型对你有所帮助,请考虑引用下面的相关论文:
@article{ruiz2022dreambooth,
title={DreamBooth: Fine Tuning Text-to-image Diffusion Models for Subject-Driven Generation},
author={Ruiz, Nataniel and Li, Yuanzhen and Jampani, Varun and Pritch, Yael and Rubinstein, Michael and Aberman, Kfir},
booktitle={arXiv preprint arxiv:2208.12242},
year={2022}
}
@misc{von-platen-etal-2022-diffusers,
author = {Patrick von Platen and Suraj Patil and Anton Lozhkov and Pedro Cuenca and Nathan Lambert and Kashif Rasul and Mishig Davaadorj and Thomas Wolf},
title = {Diffusers: State-of-the-art diffusion models},
year = {2022},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/huggingface/diffusers}}
}
@misc{zhang2023adding,
title={Adding Conditional Control to Text-to-Image Diffusion Models},
author={Lvmin Zhang and Maneesh Agrawala},
year={2023},
eprint={2302.05543},
archivePrefix={arXiv},
primaryClass={cs.CV}
}