1SJTU ScaleLab†, 2HKU MMLab†,
3Shanghai AI Lab, 4D-Robotics,
5SZU, 6THU,
7TeleAI,
8FDU,
9USTC, 10SUSTech,
11SYSU, 12CSU,
13NEU, 14HKU-Shanghai ICRC, 15NJU,
16Lumina EAI
*Equal contribution
✉Corresponding authors †Equally leading organizations
Simulation-based data synthesis has emerged as a powerful paradigm for enhancing real-world robotic manipulation. However, existing synthetic datasets remain insufficient for robust bimanual manipulation due to two challenges: (1) the lack of an efficient, scalable data generation method for novel tasks, and (2) oversimplified simulation environments that fail to capture real-world complexity. We present RoboTwin 2.0, a scalable simulation framework that enables automated, large-scale generation of diverse and realistic data, along with unified evaluation protocols for dual-arm manipulation. We first construct RoboTwin-OD, a large-scale object library comprising 731 instances across 147 categories, each annotated with semantic and manipulation-relevant labels. Building on this foundation, we develop an expert data synthesis pipeline that combines multimodal large language models (MLLMs) with simulation-in-the-loop refinement to generate task-level execution code automatically. To improve sim-to-real transfer, RoboTwin 2.0 incorporates structured domain randomization along five axes: clutter, lighting, background, tabletop height and language instructions, thereby enhancing data diversity and policy robustness. We instantiate this framework across 50 dual-arm tasks spanning five robot embodiments, and pre-collect over 100,000 domain-randomized expert trajectories. Empirical evaluation shows a 10.9% gain in code generation success rate and improved generalization to novel real-world conditions. A vision-language-action (VLA) model fine-tuned on our dataset achieves a 367% relative improvement (42.0% vs. 9.0%) on unseen scene real-world tasks, while zero-shot models trained exclusively on our synthetic data attain a 228% relative gain, demonstrating strong generalization without real-world supervision. We release the data generator, benchmark, pre-collected dataset, and code to support scalable research in robust bimanual manipulation.
[RoboTwin 1.0, CVPR 2025 Highlight] RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins
[CVPR 2025 Challenge@MEIS Workshop] The Technical report is coming soon !
[RoboTwin early version, ECCV 2024 MAAS Workshop Best Paper] RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version)
[第十九届挑战杯官方赛题] 赛题链接
RoboTwin 2.0 is a scalable framework for data generation and benchmarking in bimanual robotic manipulation. It integrates an expert data generation pipeline and a 50-task benchmark built on the RoboTwin Object Dataset (731 objects, 147 categories). A multimodal language model agent enables automatic task program synthesis, while flexible dual-arm configurations facilitate scalable and diverse data collection. Policies trained on RoboTwin 2.0 data demonstrate improved robustness and generalization to unseen environments.
To enhance both manipulation capability and visual understanding, we construct a large-scale object dataset with rich semantic annotations, called RoboTwin-OD, covering 147 categories and 731 diverse objects. Specifically, this includes 534 instances across 111 categories with custom-generated and optimized meshes, 153 objects from 27 categories in Objaverse, and 44 articulated object instances from 9 categories in SAPIEN PartNet-Mobility. Objects from all sources, including Objaverse, are used for cluttered scene construction, with Objaverse specifically serving to further increase the visual and semantic diversity of distractor objects. Additionally, we develop a comprehensive surface and background texture library using generative AI and human-in-the-loop verification to ensure both diversity and realism. The dataset is available at https://huggingface.co/datasets/TianxingChen/RoboTwin2.0/tree/main/objects.
Building on our automated task generation framework, embodiment-adaptive behavior synthesis, and the large-scale object asset library RoboTwin-OD, we construct a suite of over 50 dual-arm collaborative manipulation tasks. In addition, we support data collection and evaluation across 5 distinct robot platforms, enabling comprehensive benchmarking of manipulation policies. The complete task set is available at http://robotwin-platform.github.io/doc/tasks/.
To improve policy robustness to real-world environmental variability, we apply domain randomization across five key dimensions: (1) cluttered placement of task-irrelevant objects, (2) background textures, (3) lighting conditions, (4) tabletop heights, and (5) diverse language instructions. This systematic diversification enriches the training data distribution and significantly improves generalization to unseen scenarios.
This paper presented RoboTwin 2.0, a scalable simulation framework for generating diverse, high-fidelity expert data to support robust bimanual manipulation. Our system integrates MLLM-based task generation, embodiment-adaptive behavior synthesis, and comprehensive domain randomization to address key limitations in prior synthetic datasets.
By leveraging an annotated object library and automating trajectory generation, RoboTwin 2.0 produces data with rich visual, linguistic, and physical diversity while minimizing manual engineering effort. Experiments demonstrate its effectiveness in improving policy robustness to cluttered environments, generalization to unseen tasks, and cross-embodiment manipulation.
These findings highlight the importance of scalable, automated generation of semantically rich, domain-randomized data for learning robust manipulation policies. RoboTwin 2.0 provides a foundation for unified benchmarks and scalable sim-to-real pipelines, with future work focusing on real-world deployment and multi-object task complexity.
[2.0 Version] RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation
@misc{chen2025robotwin20scalabledata,
title={RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation},
author={Tianxing Chen and Zanxin Chen and Baijun Chen and Zijian Cai and Yibin Liu and Qiwei Liang and Zixuan Li and Xianliang Lin and Yiheng Ge and Zhenyu Gu and Weiliang Deng and Yubin Guo and Tian Nian and Xuanbing Xie and Qiangyu Chen and Kailun Su and Tianling Xu and Guodong Liu and Mengkang Hu and Huan-ang Gao and Kaixuan Wang and Zhixuan Liang and Yusen Qin and Xiaokang Yang and Ping Luo and Yao Mu},
year={2025},
eprint={2506.18088},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2506.18088},
}
[1.0 Version] RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins. Accepted to CVPR 2025 (Highlight)
@InProceedings{Mu_2025_CVPR,
author = {Mu, Yao and Chen, Tianxing and Chen, Zanxin and Peng, Shijia and Lan, Zhiqian and Gao, Zeyu and Liang, Zhixuan and Yu, Qiaojun and Zou, Yude and Xu, Mingkun and Lin, Lunkai and Xie, Zhiqiang and Ding, Mingyu and Luo, Ping},
title = {RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins},
booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
month = {June},
year = {2025},
pages = {27649-27660}
}
[Early Version] RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version). Accepted to ECCV Workshop 2024 (Best Paper Award)
@article{mu2024robotwin,
title={RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version)},
author={Mu, Yao and Chen, Tianxing and Peng, Shijia and Chen, Zanxin and Gao, Zeyu and Zou, Yude and Lin, Lunkai and Xie, Zhiqiang and Luo, Ping},
journal={arXiv preprint arXiv:2409.02920},
year={2024}
}