EchoDistill: Bidirectional Concept Distillation for One-Step Diffusion Personalization

Yixiong Yang1,*, Tao Wu2,*, Senmao Li3, Shiqi Yang3,†, Yaxing Wang3, Joost van de Weijer2, Kai Wang4,5,2,✉
1 Harbin Institute of Technology (Shenzhen), China 2 Computer Vision Center, Universitat Autònoma de Barcelona, Spain
3 VCIP, CS, Nankai University, China 4 City University of Hong Kong (Dongguan), China 5 City University of Hong Kong, HK SAR, China
* Equal contribution. Visiting researcher in Nankai University. Corresponding authors.

Abstract

Recent advances in accelerating text-to-image (T2I) diffusion models have enabled the synthesis of high-fidelity images even in a single step. However, personalizing these models to incorporate novel concepts remains a challenge due to the limited capacity of one-step models to capture new concept distributions effectively. We propose a bidirectional concept distillation framework, EchoDistill, to enable one-step diffusion personalization (1-SDP). Our approach involves an end-to-end training process where a multi-step diffusion model (teacher) and a one-step diffusion model (student) are trained simultaneously. The concept is first distilled from the teacher model to the student, and then echoed back from the student to the teacher. During EchoDistill, we share the text encoder between the two models to ensure consistent semantic understanding. The student model is further optimized with adversarial losses to align with real image distributions and alignment losses to maintain consistency with the teacher's outputs. Furthermore, we introduce a bidirectional echoing refinement strategy, where the student model leverages its faster generation capability to provide feedback to the teacher model. This collaborative distillation mechanism enhances the student's ability to personalize novel concepts while also improving the generative quality of the teacher model. Our experiments demonstrate that this framework significantly outperforms existing personalization methods in the 1-SDP setup, establishing a new paradigm for rapid and effective personalization in T2I diffusion models.

Method

Overview of EchoDistill

Figure 1. Overview of EchoDistill. The student and teacher jointly learn the new concept with a shared text encoder. The teacher learns from real images (green), and the text encoder is updated accordingly. The student is optimized with two objectives (gold): an adversarial loss to match real data distribution and alignment losses to match the denoised outputs of the teacher. The discriminators are trained to distinguish between the student's outputs and real images.

Results

Comparisons with existing methods

Figure 2. Our method compared with existing methods.

CustomConcept101 Part 1

Figure 3. Qualitative results on CustomConcept101 dataset (Part 1).

CustomConcept101 Part 2

Figure 4. Qualitative results on CustomConcept101 dataset (Part 2).

BibTeX

@misc{yang2025echodistill,
  title         = {EchoDistill: Bidirectional Concept Distillation for One-Step Diffusion Personalization},
  author        = {Yixiong Yang and Tao Wu and Senmao Li and Shiqi Yang and Yaxing Wang and Joost van de Weijer and Kai Wang},
  eprint        = {XXXX.XXXXX},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV},
  year          = {2025}
}