SDXL consists of a two-step pipeline for latent diffusion:
First, we use a base model to generate latents of the desired output size.
In the second step, we use a specialized high-resolution model and apply a technique called SDEdit (https://arxiv.org/abs/2108.01073, also known as “img2img”)
to the latents generated in the first step, using the same prompt.
from modelscope.utils.constant import Tasks
from modelscope.pipelines import pipeline
import cv2
pipe = pipeline(task=Tasks.text_to_image_synthesis,
model='AI-ModelScope/stable-diffusion-xl-base-0.9',
model_revision='v1.0.0')
prompt = 'a dog'
output = pipe({'text': prompt})
cv2.imwrite('result.png', output['output_imgs'][0])
The model is intended for research purposes only. Possible research areas and tasks include
Excluded uses are described below.
While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.
The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.