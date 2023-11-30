Enlarge/illustrate images generated using Stable Diffusion XL Turbo.

Static Diffusion XL Turbo / Benz Edwards

On Tuesday, Stabilization AI launched Stabilization Diffusion XL Turbo, an AI image-synthesis model that can rapidly generate imagery based on a written signal. So fast, in fact, that the company is touting it as “real-time” image creation, since it can quickly transform images from a source like a webcam.

The primary innovation of the SDXL Turbo lies in its ability to generate image output in a single step, a significant reduction from the 20–50 steps required by its predecessor. Sustainability credits this leap in efficiency to a technology called adversarial diffusion distillation (ADD). ADD uses score distillation, where the model learns from existing image-synthesis models and adversarial loss, which enhances the model’s ability to distinguish between real and generated images, thereby improving the realism of the output.

Stability detailed the inner workings of the model in a research paper released Tuesday that focuses on ADD technology. One of the claimed advantages of SDXL Turbo is its similarity to Generative Adversarial Networks (GANs), particularly in generating single-step image outputs.

A promotional Stable Diffusion XL Turbo video from Stability AI.

SDXL Turbo images are not as detailed as SDXL images built at higher stage counts, so it is not considered a replacement for the previous model. But for the speed savings involved, the results are surprising.

To try it out, we ran SDXL Turbo locally on an Nvidia RTX 3060 using Automatic1111 (weights are reduced exactly like SDXL weights), and it could generate a 3-phase 1024×1024 image in about 4 seconds. while 26.4 seconds for a 20-step SDXL image with similar details. Smaller images are generated much faster (less than a second for 512×768), and of course, a stronger graphics card like an RTX 3090 or 4090 will also allow much faster generation times. Contrary to the stability marketing, we found that SDXL Turbo images have the best detail at about 3-5 steps per image.

SDXL Turbo’s generation speed is where the “real time” claim comes from. Stability AI says that on an Nvidia A100 (a powerful AI-tuned GPU), the model can generate a 512×512 image in 207 ms, including encoding. Single de-noise stage, and decoding. If coherency issues can be resolved, such speeds could lead to real-time generators of AI video filters or experimental video game graphics generation. In this context, coherence means maintaining the same theme across multiple frames or generations.

Enlarge / Screenshot of the unofficial SDXL Turbo demo page on Hugging Face. Received the obligatory cat with beer.

Ars Technica

Currently, SDXL Turbo is available under a non-commercial research license, which limits its use to personal, non-commercial purposes. This move has already drawn some criticism in the stablecoin community, but StabilityAI has expressed openness to commercial applications and invites interested parties to contact them for more information.

Meanwhile, Stability AI itself has faced internal management issues, with one investor recently calling on CEO Imad Mostaq to resign. Stability management is reportedly exploring a possible sale of the company to a larger entity, but that hasn’t slowed down the pace of Stability releases. Just last week, the firm announced Stable Video Diffusion, which can turn still images into short video clips.

Stability AI offers a beta demonstration of SDXL Turbo’s capabilities on its image-editing platform, Clipdrop. You can also try an unofficial live demo on Hugging Face for free. Obviously all the usual caveats apply, including lack of source of training data and the potential for misuse. Even with those unresolved issues, technological progress in AI image synthesis is certainly not slowing down.

