Stable Diffusion XL Model or SDXL Beta is Out!

A new beta version of the Stable Diffusion XL model recently became available. The developers at Stability AI promise better face generation and image composition capabilities, a better understanding of prompts, and the most exciting part is that it can create legible text.

Does SDXL deliver on the promise? Is the output really that better?

Let’s learn more and find out!

What is Stable Diffusion XL or SDXL

Stable Diffusion XL model, or SDXL for short, is the latest AI image generation model that is currently in Beta. It’s claimed to have an improved composition and face generation, and it also can generate legible text within the images.

The model has not been released as open source yet, but it is available for Stability AI API customers and DreamStudio users. The former category also includes Nightcafe Studio and Clipdrop apps.

As it’s mentioned in the announcement, Stable Diffusion XL is tuned for better photorealistic images, more eye-pleasing aesthetics, and new capabilities like text generation within images.

Stable Diffusion XL also has the inpainting and outpainting capabilities like all the previous models.

SDXL improvements over older models

Since the model is available for generation, I thought it would be fair to check if the claims are true.

I have tested Stable Diffusion XL and here are the actual improvements and drawbacks that I’ve found:

Legible texts

To my surprise, SDXL does indeed generate legible text! Sure, it’s not always perfect, but even at this level of coherence it’s a breakthrough.

As far as I know, SDXL is the only currently available image generation model that can generate text within image.

We’ve seen some mind-blowing examples from the Deep Floyd IF model, but it still hasn’t been released.

Improved human anatomy and image composition

With SD 2.1 I had to use finetuned models and very long very specific strings of prompts with super detailed negative prompts lists to get humans and objects with correct anatomy and composition.

With SDXL this problem becomes much less of an issue. Stable Diffusion XL can generate much more coherent images of humans. They’re still not perfect, and it’s way too far behind Midjourney, but the progress is significant.

Better prompt understanding

Stable Diffusion XL is much more accurate than the previous models in this regard. Especially compared to v1.5 model. It could consistently ignore parts of the complex prompts before and now it understands much better and generates more detailed results.


