生成扩散模型高效调优-ControlLora

对生成扩散模型进行高效调优。通过ControlLora-Tuner模块，在训练时只需训练少规模的参数，即可定制专属于你的场景的条件生成模型！

模型描述

本模型基础的Diffusion Model采用Stable-Diffusion-v1-5预训练模型，可训练的调优模块（ControlLora-Tuner）的参数量占总模型的约0.1%。

期望模型使用方式以及适用范围

如何使用

基于 ModelScope 框架，通过调用预定义的 Pipeline 可实现快速调用。

代码范例

注：当前输入的’cond’（condition image）只支持长宽相等的正方形图像。

from modelscope.pipelines import pipeline

sd_control_lora_pipeline = pipeline('efficient-diffusion-tuning', 
                                    'damo/multi-modal_efficient-diffusion-tuning-control-lora')
inputs = {'prompt': 'pale golden rod circle with old lace background',
          'cond': 'https://modelscope.oss-cn-beijing.aliyuncs.com/test/images/efficient_diffusion_tuning_sd_control_lora_source.png'}
result = sd_control_lora_pipeline(inputs)
print(f'Output: {result}.')

训练数据介绍

fill50k，用于训练ControlLora-Tuner。

模型训练及验证

以下过程基于fill50k数据集，实现了SD-ControlLora模型的训练及验证过程。

# 模型ID
model_id = 'damo/multi-modal_efficient-diffusion-tuning-control-lora'

# 加载训练集
train_dataset = MsDataset.load(
        'controlnet_dataset_condition_fill50k',
        namespace='damo',
        split='train',
        download_mode=DownloadMode.FORCE_REDOWNLOAD).to_hf_dataset(  # noqa
        ).select(range(100))  # noqa

# 加载验证集
eval_dataset = MsDataset.load(
        'controlnet_dataset_condition_fill50k',
        namespace='damo',
        split='validation',
        download_mode=DownloadMode.FORCE_REDOWNLOAD).to_hf_dataset(  # noqa
        ).select(range(20))  # noqa

tmp_dir = tempfile.TemporaryDirectory().name # 使用临时目录作为工作目录
max_epochs = 1 # 训练轮次  

# 修改配置文件
def cfg_modify_fn(cfg):
    cfg.train.max_epochs = max_epochs                 # 最大训练轮次
    cfg.train.lr_scheduler.T_max = max_epochs         # 学习率调度器的参数
    cfg.model.inference = False                       # 模型状态
    return cfg

# 构建训练器
kwargs = dict(
    model=model_id,                 # 模型id
    work_dir=tmp_dir,               # 工作目录
    train_dataset=train_dataset,    # 训练集  
    eval_dataset=eval_dataset,      # 验证集
    cfg_modify_fn=cfg_modify_fn     # 用于修改训练配置文件的回调函数
)
trainer = build_trainer(name=Trainers.vision_efficient_tuning, default_args=kwargs)

# 进行训练
trainer.train()

# 进行评估
result = trainer.evaluate()
print('result:', result)

相关论文以及引用信息

如果该模型对您有所帮助，请引用下面的相关的论文：

@inproceedings{hu2021lora,
  title={{LoRA}: Low-Rank Adaptation of Large Language Models},
  author={Hu, Edward and Shen, Yelong and Wallis, Phil and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Lu and Chen, Weizhu},
  booktitle=ICLR,
  year={2021}
}

@misc{rombach2021highresolution,
      title={High-Resolution Image Synthesis with Latent Diffusion Models}, 
      author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
      year={2021},
      eprint={2112.10752},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@software{wu2023controllora,
    author = {Wu Hecong},
    month = {2},
    title = {{ControlLoRA: A Light Neural Network To Control Stable Diffusion Spatial Information}},
    url = {https://github.com/HighCWu/ControlLoRA},
    version = {1.0.0},
    year = {2023}
}