LLaVA-VLA¶
Contributed by IRPN Lab, HKUST(GZ)
Email: songwenxuan0115@gmail.com, sunxiaoquan@hust.edu.cn
1. Environment Setup¶
See LLaVA-VLA installation for more details.
2. Download Model¶
Please download the corresponding model from the model zoo.
3. Collect RoboTwin Data¶
See RoboTwin Tutorial (Usage Section) for more details.
4. Generate Image and Data¶
First, create the pictures folder in the policy/LLaVA-VLA directory:
mkdir pictures && training_data
cd scripts && cd helper
bash image_extraction.sh ${task_name} ${task_config}
# bash image_extraction.sh grab_roller demo_randomized
# bash image_extraction.sh all demo_randomized
# In task_name, you can directly select a task(such as: grab_roller) or choose "all" (just modify it in task_list).
bash process_data.sh ${task_name} ${task_config} ${future_chunk}
# bash process_data.sh grab_roller demo_randomized 5
# bash process_data.sh all demo_randomized 5
# In task_name, you can directly select a task(such as: grab_roller) or choose "all" (just modify it in task_list).
# future_chunk: The number of output steps in the future (default is 5).
training_data
├── ${task_1}
│ ├── ${task_config_1}
| | |── episode0.json
| | |── episode1.json
│ ├── ${task_config_2}
| | |── episode0.json
| | |── episode1.json
├── ${task_2}
│ ├── ...
├── ...
pictures
├── ${task_1}
│ ├── ${task_config_1}
| | |── episode0
| | | |── 01.jpg
| | | |── 02.jpg
│ ├── ${task_config_2}
| | |── episode0
| | | |── 01.jpg
| | | |── ...
├── ${task_2}
│ ├── ...
├── ...
5. merge json and Generate yaml file¶
In this step, we need to merge all the JSON files generated by the previous process_data
step into a single JSON file.
python llava/process_data/merge_json.py
# please replace `yourpath` with your actual path!
python llava/process_data/yaml_general.py
6. Pre-Training¶
Before starting the training, please replace yourpath
with your actual path!
bash calvin_finetune_obs.sh
7. Fine-tuning¶
Please note to change MODEL_NAME_OR_PATH
to the checkpoint generated in the previous step. For the dataset you fine-tuned, please regenerate the ACTION_STAT
file and modify JSON_PATH
.Then
bash calvin_finetune_obs.sh
8. Eval on RoboTwin¶
You need to modify the corresponding path in the deploy_policy.yml file: 1. model_path
: Path to the checkpoint. 2. action_stat
: Path to dataset_statistic.yaml.
bash eval.sh ${gpu_id}
# bash eval.sh 0
eval_result
directory under the project root. 9. Citation¶
If you find our works useful for your research and applications, please cite using these BibTeX:
@article{pdvla,
title={Accelerating Vision-Language-Action Model Integrated with Action Chunking via Parallel Decoding},
author={Song, Wenxuan and Chen, Jiayi and Ding, Pengxiang and Zhao, Han and Zhao, Wei and Zhong, Zhide and Ge, Zongyuan and Ma, Jun and Li, Haoang},
journal={arXiv preprint arXiv:2503.02310},
year={2025}
}