DemoDiffusion: One-Shot Human Imitation using pre-trained Diffusion Policy

Carnegie Mellon University

We present DemoDiffusion: a simple and scalable method for enabling robots to perform manipulation tasks by imitating a single human demonstration, without requiring any paired human-robot data. DemoDiffusion refines a re-targeted human demonstration trajectory using a pre-trained generalist diffusion policy, efficiently bridging the embodiment gap and enabling the robot to succeed even on tasks where the pre-trained generalist policy fails entirely.

Overview

Experiments

1. Task Rollouts (x2 speed for robot)

Human Demonstration

DemoDiffusion

Human Demonstration

DemoDiffusion

Human Demonstration

DemoDiffusion

Human Demonstration

DemoDiffusion

Human Demonstration

DemoDiffusion

Human Demonstration

DemoDiffusion

Human Demonstration

DemoDiffusion

Human Demonstration

DemoDiffusion

2. Comparison to Baselines (x2 speed for robot)

Pi-0

Kinematic Retargeting

DemoDiffusion

Pi-0

Kinematic Retargeting

DemoDiffusion

Pi-0

Kinematic Retargeting

DemoDiffusion

Pi-0

Kinematic Retargeting

DemoDiffusion

Pi-0

Kinematic Retargeting

DemoDiffusion

Pi-0

Kinematic Retargeting

DemoDiffusion

Pi-0

Kinematic Retargeting

DemoDiffusion

Pi-0

Kinematic Retargeting

DemoDiffusion

Pi-0

Kinematic Retargeting

DemoDiffusion


3. Zero-Shot Generalization to New Objects (x2 speed for robot)

Pi-0

Kinematic Retargeting

DemoDiffusion

Pi-0

Kinematic Retargeting

DemoDiffusion

Pi-0

Kinematic Retargeting

DemoDiffusion



4. Detailed Results for Hand Reconstruction and Robot Execution

Task: Drag the Basket to the Right

BibTeX

@misc{park2025demodiffusiononeshothumanimitation,
      title={DemoDiffusion: One-Shot Human Imitation using pre-trained Diffusion Policy},
      author={Sungjae Park and Homanga Bharadhwaj and Shubham Tulsiani},
      year={2025},
      eprint={2506.20668},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2506.20668},
}

Acknowledgements

We appreciate the helpful discussions with Yanbo Xu, Qitao Zhao, and Lucas Wu. We would also like to thank Kenny Shaw, Tony Tao, Jiahui Yang, Andrew Wang, Jason Liu, Hengkai Pan, Mohan Kumar Srirama for helping setting up the hardware. This work was supported by gift awards from CISCO and Google.