该模型为图像生成模型,输入一张图像,指定控制类别并提供期望生成图像的描述prompt,网络会根据输入图像抽取相应的控制信息并生成精美图像。
ControlNet可以控制预训练的大型扩散模型以支持额外的输入,其以端到端的方式学习与任务相关的特定条件。其可以增强像 Stable Diffusion 这样的大型扩散模型,从而支持输入边缘图、分割图、关键点等来生成图像。
ControlNet支持输入边缘图、分割图、简笔画、人体姿态等控制信息,本项目支持选择不同的控制信息来生成相应的图像。
import cv2
import torch
from modelscope.outputs import OutputKeys
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
input_location = 'https://modelscope.oss-cn-beijing.aliyuncs.com/test/images/image_segmentation.jpg'
prompt = 'person'
output_image_path = './result.png'
input = {
'image': input_location,
'prompt': prompt
}
pipe = pipeline(
Tasks.controllable_image_generation,
model='dienstag/cv_controlnet_controllable-image-generation_nine-annotators')
output = pipe(input, control_type='hed')[OutputKeys.OUTPUT_IMG]
cv2.imwrite(output_image_path, output)
print('pipeline: the output image path is {}'.format(output_image_path))
推理参数要求:control_type可选字段包括canny,hough,hed,depth,normal,pose,seg,fake_scribble,scribble;
输入要求:输入字典input中,'image’为必须指定的字段,'prompt’为可缺省字段,也可以在调用pipeline时作为额外参数传入prompt,也可以置空:
output = scribble_to_image(input, prompt='hot air balloon')[OutputKeys.OUTPUT_IMG]
"image_resolution": 512,
"strength": 1.0,
"guess_mode": False,
"ddim_steps": 20,
"scale": 9.0,
"num_samples": 1,
"eta": 0.0,
"a_prompt": "best quality, extremely detailed",
"n_prompt": "longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality"
本算法模型构建过程参考了一些开源项目:
如果你觉得这个模型对你有所帮助,请考虑引用下面的相关论文:
@misc{zhang2023adding,
title={Adding Conditional Control to Text-to-Image Diffusion Models},
author={Lvmin Zhang and Maneesh Agrawala},
year={2023},
eprint={2302.05543},
archivePrefix={arXiv},
primaryClass={cs.CV}
}