1.x-Distill Project Page

📌 Abstract

Diffusion models produce high-quality text-to-image results, but their iterative denoising is computationally expensive. Distribution Matching Distillation (DMD) emerges as a promising path to few-step distillation, but suffers from diversity collapse and fidelity degradation when reduced to two steps or fewer. We present 1.x-Distill, the first fractional-step distillation framework that breaks the integer-step constraint of prior few-step methods and establishes 1.x-step generation as a practical regime for distilled diffusion models. Specifically, we first analyze the overlooked role of teacher CFG in DMD and introduce a simple yet effective modification to suppress mode collapse. Then, to improve performance under extreme steps, we introduce Stagewise Focused Distillation, a two-stage strategy that learns coarse structure through diversity-preserving distribution matching and refines details with inference-consistent adversarial distillation. Furthermore, we design a lightweight compensation module for Distill–Cache co-Training, which naturally incorporates block-level caching into our distillation pipeline. Experiments on SD3-Medium and SD3.5-Large show that 1.x-Distill surpasses prior few-step methods, achieving better quality and diversity at 1.67 and 1.74 effective NFEs, respectively, with up to 33× speedup over original 28×2 NFE sampling.

💡 Motivation

🎨 Diversity issue in DMD

DMD-like methods have emerged as strong baselines for few-step generation, yet under very small step budgets they often become increasingly mode-seeking. The student is biased toward dominant modes, which leads to reduced structural diversity and less varied generations across samples.

🔍 Quality issue at extreme few steps

When the denoising process is compressed to two steps or fewer, each step must shoulder too much semantic and visual responsibility. This causes an optimization mismatch and often results in blurry textures, missing details, or unstable visual quality.

🚀 Our goal

We aim to push distribution matching distillation into the practical 1.x-NFE regime while preserving diversity, improving image quality. This motivates a unified framework that explicitly addresses these three aspects rather than optimizing only for speed.

Diversity-preserving High-quality Fractional-step Efficient caching

🧠 Method Overview

We integrate guidance control strategy and cache design into a unified two-stage training framework. Stage I: Train the generator with DMD loss. Within the DMD framework, we apply importance sampling over diffusion timestep t and control the guidance according to the sampled t when computing the real score. Stage II: Train the generator with a pixel-space adversarial loss. Our GAN framework predicts x̂₀ along the generator inference path, which naturally incorporates the block-cache design. The generator and the MLP module are jointly optimized in this stage.

🎨 Diversity: Controlling Guidance in DMD

We revisit the role of teacher CFG in distribution matching and show that overly strong guided supervision at high-noise timesteps can prematurely drive the student toward dominant modes. We therefore introduce timestep-aware guidance control to improve mode coverage and alleviate diversity collapse.

🔍 Quality: Stagewise Focused Distillation

We divide training into two complementary stages. Stage I emphasizes coarse structure learning and diversity-preserving distribution matching, while Stage II performs detail-focused adversarial refinement to recover realism and fine-grained visual quality.

⚡ Efficiency: Distill–Cache co-Training

We incorporate block-level caching into distilled inference by learning a lightweight reuse-error compensation module. This makes caching compatible with the distilled generator and enables efficient 1.x-step generation beyond conventional integer-step sampling.

Guidance control for preserving diversity in distribution matching distillation.

Cache-aware design for efficient fractional-step generation.

🖼️ Qualitative Results

Our 4-step SFD already yields cleaner and more appealing generations than existing baselines. After further enabling caching, 1.x-Distill remains strong at only 1.67 effective NFE, preserving coherent structure while improving realism and fine-detail rendering.

Qualitative comparison on SD3-Medium.

Diversity comparison on SD3-Medium.

Qualitative comparison on SD3.5-Large.

📊 Quantitative Evaluation

Across COCO-10K, DPG-Bench, LPIPS diversity, and user study, 1.x-Distill achieves a strong quality–diversity–efficiency trade-off under aggressive step compression.

Quantitative comparison on COCO-10K.

DPG-Bench and LPIPS diversity evaluation.

User study of 1.x-Distill against representative few-step baselines.

🌈 1.x-Distill: Breaking the Diversity, Quality, and Efficiency Barrier in Distribution Matching Distillation