unidiffuser-v1 - 创业邦

UniDiffuser是一个统一的diffusion框架，用于在一个转换器中拟合与有一组多模态数据相关的所有分布.
UniDiffuser能够通过设置适当的时间步来执行图像、文本、文本到图像、图像到文本和图像文本对生成，而无需额外的开销.

具体来说, UniDiffuser采用了一种称为 U-ViT的transformers变体，他对联合噪声预测网络进行参数化。其他组件充当不同模态的编码器和解码器，包括来自Stable Diffusion的预训练图像自动编码器, 预训练image ViT-B/32 CLIP encoder, 预训练 text ViT-L CLIP encoder, 以及我们微调的 GPT-2 文本解码器.

我们提供了两个版本的UniDiffuser:

UniDiffuser-v0: 这个版本是基于 LAION-5B训练的, 包含了文本图像对的嘈杂网络数据.
UniDiffuser-v1: 这个版本基于UniDiffuser-v0, 并使用了一组噪声较小的内部文本图像对继续训练. 他使用标志作为输入，并在训练期间区分网络数据和内部数据。

下载

我们提供了UniDiffusion的下载链接 this link, 并且也提供了UniDiffuser文件 this link.
这是文件是:

autoencoder_kl.pth is the weight of the image autoencoder converted from Stable Diffusion.
caption_decoder.pth is the weight of the finetuned GPT-2 text decoder.
uvit_v0.pth/uvit_v1.pth is the weight of U-ViT for UniDiffuser-v0/UniDiffuser-v1.

请注意，UniDiffuser-v0和UniDiffuser-v1共享 autoencoder_kl.pth 和 caption_decoder.pth. 只需要下载一次.
其他组件, 将自动下载.

用法

参考 UniDiffuser codebase.

模型细节

Model type: Diffusion-based multi-modal generation model
Language(s): English
License: agpl-3.0
Model Description: This is a model that can perform image, text, text-to-image, image-to-text, and image-text pair generation. Its main component is a U-ViT, which parameterizes the joint noise prediction network. Other components perform as encoders and decoders of different modalities, including a pretrained image autoencoder from Stable Diffusion, a pretrained image ViT-B/32 CLIP encoder, a pretrained text ViT-L CLIP encoder, and a GPT-2 text decoder finetuned by ourselves.
Resources for more information: GitHub Repository, Paper.

直接使用方法

该模型应按照 agpl-3.0 许可证使用。可能的用法包括

安全部署可能产生有害内容的模型。
探索和理解生成模型的局限性和偏见。
艺术作品的产生以及在设计和其他艺术过程中的使用。
在教育或创意工具中的应用。
生成模型研究。
排除的用途如下所述。

滥用、恶意使用和模型局限性

该模型不应用于故意创建或传播对人们造成敌对或疏远环境的图像。这包括生成人们会预见到会感到不安、痛苦或冒犯的图像；或传播历史或当前刻板印象的内容。

模型局限性

该模型未经过训练以真实或真实地表示人物或事件，因此使用该模型生成此类内容超出了该模型的能力范围。

滥用和恶意使用

使用该模型生成对个人残忍的内容是对该模型的滥用。这包括但不限于：

对人或其环境、文化、宗教等进行贬低、非人化或其他有害的表述。
故意宣传或传播歧视性内容或有害的刻板印象。
未经个人同意冒充个人。
未经可能看到它的人同意的色情内容。
错误信息和虚假信息
极端暴力和血腥的表现
违反其使用条款共享受版权保护或许可的材料。
共享违反其使用条款更改版权或许可材料的内容。