Abstract

Simulation-based data synthesis has emerged as a powerful paradigm for enhancing real-world robotic manipulation. However, existing synthetic datasets remain insufficient for robust bimanual manipulation due to two challenges: (1) the lack of an efficient, scalable data generation method for novel tasks, and (2) oversimplified simulation environments that fail to capture real-world complexity. We present RoboTwin 2.0, a scalable simulation framework that enables automated, large-scale generation of diverse and realistic data, along with unified evaluation protocols for dual-arm manipulation. We first construct RoboTwin-OD, a large-scale object library comprising 731 instances across 147 categories, each annotated with semantic and manipulation-relevant labels. Building on this foundation, we develop an expert data synthesis pipeline that combines multimodal large language models (MLLMs) with simulation-in-the-loop refinement to generate task-level execution code automatically. To improve sim-to-real transfer, RoboTwin 2.0 incorporates structured domain randomization along five axes: clutter, lighting, background, tabletop height and language instructions, thereby enhancing data diversity and policy robustness. We instantiate this framework across 50 dual-arm tasks spanning five robot embodiments, and pre-collect over 100,000 domain-randomized expert trajectories. Empirical evaluation shows a 10.9% gain in code generation success rate and improved generalization to novel real-world conditions. A vision-language-action (VLA) model fine-tuned on our dataset achieves a 367% relative improvement (42.0% vs. 9.0%) on unseen scene real-world tasks, while zero-shot models trained exclusively on our synthetic data attain a 228% relative gain, demonstrating strong generalization without real-world supervision. We release the data generator, benchmark, pre-collected dataset, and code to support scalable research in robust bimanual manipulation.

Previous Works

[RoboTwin 1.0, CVPR 2025 Highlight] RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins
[CVPR 2025 Challenge@MEIS Workshop] Benchmarking Generalizable Bimanual Manipulation: RoboTwin Dual-Arm Collaboration Challenge at CVPR 2025 MEIS Workshop
[RoboTwin early version, ECCV 2024 MAAS Workshop Best Paper] RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version)
[第十九届挑战杯官方赛题] 赛题链接

Overview of RoboTwin 2.0

RoboTwin 2.0 is a scalable framework for data generation and benchmarking in bimanual robotic manipulation. It integrates an expert data generation pipeline and a 50-task benchmark built on the RoboTwin Object Dataset (731 objects, 147 categories). A multimodal language model agent enables automatic task program synthesis, while flexible dual-arm configurations facilitate scalable and diverse data collection. Policies trained on RoboTwin 2.0 data demonstrate improved robustness and generalization to unseen environments.

RoboTwin Object Dataset

To enhance both manipulation capability and visual understanding, we construct a large-scale object dataset with rich semantic annotations, called RoboTwin-OD, covering 147 categories and 731 diverse objects. Specifically, this includes 534 instances across 111 categories with custom-generated and optimized meshes, 153 objects from 27 categories in Objaverse, and 44 articulated object instances from 9 categories in SAPIEN PartNet-Mobility. Objects from all sources, including Objaverse, are used for cluttered scene construction, with Objaverse specifically serving to further increase the visual and semantic diversity of distractor objects. Additionally, we develop a comprehensive surface and background texture library using generative AI and human-in-the-loop verification to ensure both diversity and realism. The dataset is available at https://huggingface.co/datasets/TianxingChen/RoboTwin2.0/tree/main/objects.

50 RoboTwin 2.0 Bimanual Tasks

Building on our automated task generation framework, embodiment-adaptive behavior synthesis, and the large-scale object asset library RoboTwin-OD, we construct a suite of over 50 dual-arm collaborative manipulation tasks. In addition, we support data collection and evaluation across 5 distinct robot platforms, enabling comprehensive benchmarking of manipulation policies. The complete task set is available at http://robotwin-platform.github.io/doc/tasks/.

Multi-Embodiments Support

Domain Randomization

To improve policy robustness to real-world environmental variability, we apply domain randomization across five key dimensions: (1) cluttered placement of task-irrelevant objects, (2) background textures, (3) lighting conditions, (4) tabletop heights, and (5) diverse language instructions. This systematic diversification enriches the training data distribution and significantly improves generalization to unseen scenarios.

Conclusions

This paper presented RoboTwin 2.0, a scalable simulation framework for generating diverse, high-fidelity expert data to support robust bimanual manipulation. Our system integrates MLLM-based task generation, embodiment-adaptive behavior synthesis, and comprehensive domain randomization to address key limitations in prior synthetic datasets.
By leveraging an annotated object library and automating trajectory generation, RoboTwin 2.0 produces data with rich visual, linguistic, and physical diversity while minimizing manual engineering effort. Experiments demonstrate its effectiveness in improving policy robustness to cluttered environments, generalization to unseen tasks, and cross-embodiment manipulation.
These findings highlight the importance of scalable, automated generation of semantically rich, domain-randomized data for learning robust manipulation policies. RoboTwin 2.0 provides a foundation for unified benchmarks and scalable sim-to-real pipelines, with future work focusing on real-world deployment and multi-object task complexity.

BibTeX

[2.0 Version] RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

@article{chen2025robotwin,
  title={RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation},
  author={Chen, Tianxing and Chen, Zanxin and Chen, Baijun and Cai, Zijian and Liu, Yibin and Liang, Qiwei and Li, Zixuan and Lin, Xianliang and Ge, Yiheng and Gu, Zhenyu and others},
  journal={arXiv preprint arXiv:2506.18088},
  year={2025}
}

[CVPR Challenge Technical Report]Benchmarking Generalizable Bimanual Manipulation: RoboTwin Dual-Arm Collaboration Challenge at CVPR 2025 MEIS Workshop

@article{chen2025benchmarking,
  title={Benchmarking Generalizable Bimanual Manipulation: RoboTwin Dual-Arm Collaboration Challenge at CVPR 2025 MEIS Workshop},
  author={Chen, Tianxing and Wang, Kaixuan and Yang, Zhaohui and Zhang, Yuhao and Chen, Zanxin and Chen, Baijun and Dong, Wanxi and Liu, Ziyuan and Chen, Dong and Yang, Tianshuo and others},
  journal={arXiv preprint arXiv:2506.23351},
  year={2025}
}

[1.0 Version] RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins. Accepted to CVPR 2025 (Highlight)

@InProceedings{Mu_2025_CVPR,
  author = {Mu, Yao and Chen, Tianxing and Chen, Zanxin and Peng, Shijia and Lan, Zhiqian and Gao, Zeyu and Liang, Zhixuan and Yu, Qiaojun and Zou, Yude and Xu, Mingkun and Lin, Lunkai and Xie, Zhiqiang and Ding, Mingyu and Luo, Ping},
  title = {RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins},
  booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
  month = {June},
  year = {2025},
  pages = {27649-27660}
}

[Early Version] RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version). Accepted to ECCV Workshop 2024 (Best Paper Award)

@article{mu2024robotwin,
  title={RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version)},
  author={Mu, Yao and Chen, Tianxing and Peng, Shijia and Chen, Zanxin and Gao, Zeyu and Zou, Yude and Lin, Lunkai and Xie, Zhiqiang and Luo, Ping},
  journal={arXiv preprint arXiv:2409.02920},
  year={2024}
}