🌈 1.x-Distill: Breaking the Diversity, Quality, and Efficiency Barrier in Distribution Matching Distillation

Haoyu Li1*, Tingyan Wen1*, Lin Qi2†, Zhe Wu2, Yihuang Chen2, Xing Zhou2, Lifei Zhu2, XueQian Wang1, Kai Zhang1†

1 Tsinghua University    2 Central Media Technology Institute, Huawei

(* Equal Contribution   Corresponding Author)

1.x-Distill teaser image

✨ 1.x-Distill mitigates diversity and quality degradation in extreme few-step distribution matching distillation, enabling practical 1.x-step generation.

📌 Abstract

Diffusion models produce high-quality text-to-image results, but their iterative denoising is computationally expensive. Distribution Matching Distillation (DMD) emerges as a promising path to few-step distillation, but suffers from diversity collapse and fidelity degradation when reduced to two steps or fewer. We present 1.x-Distill, the first fractional-step distillation framework that breaks the integer-step constraint of prior few-step methods and establishes 1.x-step generation as a practical regime for distilled diffusion models. Specifically, we first analyze the overlooked role of teacher CFG in DMD and introduce a simple yet effective modification to suppress mode collapse. Then, to improve performance under extreme steps, we introduce Stagewise Focused Distillation, a two-stage strategy that learns coarse structure through diversity-preserving distribution matching and refines details with inference-consistent adversarial distillation. Furthermore, we design a lightweight compensation module for Distill–Cache co-Training, which naturally incorporates block-level caching into our distillation pipeline. Experiments on SD3-Medium and SD3.5-Large show that 1.x-Distill surpasses prior few-step methods, achieving better quality and diversity at 1.67 and 1.74 effective NFEs, respectively, with up to 33× speedup over original 28×2 NFE sampling.

💡 Motivation

🎨 Diversity issue in DMD

DMD-like methods have emerged as strong baselines for few-step generation, yet under very small step budgets they often become increasingly mode-seeking. The student is biased toward dominant modes, which leads to reduced structural diversity and less varied generations across samples.

🔍 Quality issue at extreme few steps

When the denoising process is compressed to two steps or fewer, each step must shoulder too much semantic and visual responsibility. This causes an optimization mismatch and often results in blurry textures, missing details, or unstable visual quality.

🚀 Our goal

We aim to push distribution matching distillation into the practical 1.x-NFE regime while preserving diversity, improving image quality. This motivates a unified framework that explicitly addresses these three aspects rather than optimizing only for speed.

Diversity-preserving High-quality Fractional-step Efficient caching

🧠 Method Overview

Overview of the 1.x-Distill framework

We integrate guidance control strategy and cache design into a unified two-stage training framework. Stage I: Train the generator with DMD loss. Within the DMD framework, we apply importance sampling over diffusion timestep t and control the guidance according to the sampled t when computing the real score. Stage II: Train the generator with a pixel-space adversarial loss. Our GAN framework predicts x̂0 along the generator inference path, which naturally incorporates the block-cache design. The generator and the MLP module are jointly optimized in this stage.

🎨 Diversity: Controlling Guidance in DMD

We revisit the role of teacher CFG in distribution matching and show that overly strong guided supervision at high-noise timesteps can prematurely drive the student toward dominant modes. We therefore introduce timestep-aware guidance control to improve mode coverage and alleviate diversity collapse.

🔍 Quality: Stagewise Focused Distillation

We divide training into two complementary stages. Stage I emphasizes coarse structure learning and diversity-preserving distribution matching, while Stage II performs detail-focused adversarial refinement to recover realism and fine-grained visual quality.

⚡ Efficiency: Distill–Cache co-Training

We incorporate block-level caching into distilled inference by learning a lightweight reuse-error compensation module. This makes caching compatible with the distilled generator and enables efficient 1.x-step generation beyond conventional integer-step sampling.

Guidance control illustration

Guidance control for preserving diversity in distribution matching distillation.

Caching design figure

Cache-aware design for efficient fractional-step generation.

🖼️ Qualitative Results

Our 4-step SFD already yields cleaner and more appealing generations than existing baselines. After further enabling caching, 1.x-Distill remains strong at only 1.67 effective NFE, preserving coherent structure while improving realism and fine-detail rendering.

Qualitative comparison on SD3-Medium

Qualitative comparison on SD3-Medium.

Diversity comparison figure

Diversity comparison on SD3-Medium.

Additional qualitative comparison on SD3.5-Large

Qualitative comparison on SD3.5-Large.

📊 Quantitative Evaluation

Across COCO-10K, DPG-Bench, LPIPS diversity, and user study, 1.x-Distill achieves a strong quality–diversity–efficiency trade-off under aggressive step compression.

Quantitative comparison on COCO-10K

Quantitative comparison on COCO-10K.

DPG-Bench and LPIPS evaluation tables

DPG-Bench and LPIPS diversity evaluation.

User study figure

User study of 1.x-Distill against representative few-step baselines.