Stable Cascade: AI model that’s 243% better than Stable Diffusion
Stability AI has launched Stable Cascade, a pioneering text-to-image model. This model stands out because it’s based on the Würstchen architecture. It became available today in a research preview. The company has placed it under a non-commercial license.
Stable Cascade is distinguished for setting new benchmarks in quality, flexibility, fine-tuning capabilities, and efficiency. Its core focus is on making high-quality text-to-image generation more accessible by reducing hardware barriers.
Stable Cascade Architecture
The model incorporates three distinct stages (A, B, and C) with Stage C transforming user inputs into compact 24×24 latents, achieving a 16x reduction in training costs. Compare this to Stable Diffusion, which compresses images from 1024×1024 to 128×128. This results in faster inference speeds and cheaper training costs.
For the highest quality outputs, the 3.6 billion parameter variant is recommended.
Conversely, Stages A and B focus on image compression with optional finetuning for additional control. Here, the 1.5 billion parameter model excels at reconstructing fine details.
Stable Cascade Requirements
With approximately 20 GB of VRAM requirements—which can be reduced with smaller variants—Stable Cascade integrates efficiently into the diffusers library for inference purposes. Checkpoints and scripts for this model are accessible on the Stability GitHub page. It’s also available
Stable Cascade Performance
Performance comparisons highlight that Stable Cascade performs exceptionally well in terms of prompt alignment and aesthetic quality while offering faster inference speeds than Stable Diffusion XL, despite hosting more parameters.
Stable Cascade shows more than 240% improvement in aesthetic quality compared to Stable Diffusion XL model.
It also understands prompts much better. Tests have shown that Stable Cascade is more than 10% better at understanding prompts than the SDXL model.
Stable Cascade can also generate images two times faster than the standard basic Stable Diffusion XL model. It doesn’t compare to the speed of SDXL Turbo, though:
Additional features include generating image variations through embedding extraction and initiating image generation by adding noise to existing images. The model also supports inpainting/outpainting, Canny Edge detection, and 2x Super Resolution, according to available coding tutorials on Stability’s GitHub.
Does the Stable Cascade license allow for commercial use?
Enthusiasts should note that commercial use of Stable Cascade is prohibited under its current license. Those seeking commercial applications might consider options like the Stability AI Membership page or accessing APIs through their Developer Platform.