Configuration Tutorial¶
All configuration files are stored in the task_config
folder and follow the standard YAML format.
You can run bash task_config/create_task_config.sh ${task_config_name}
to create new task configuration.
1. ✅ Minimal Example¶
An episode's data will be stored in one HDF5 file. Specifically, the images will be stored as bit streams. If you want to recover the image, you can use the following code:
image = cv2.imdecode(np.frombuffer(image_bit, np.uint8), cv2.IMREAD_COLOR)
Below is a minimal configuration file to start a typical data collection session:
render_freq: 0
episode_num: 100
use_seed: false
save_freq: 15
embodiment:
- aloha-agilex
language_num: 100
domain_randomization:
random_background: true
cluttered_table: true
clean_background_rate: 0.02
random_head_camera_dis: 0
random_table_height: 0.03
random_light: true
crazy_random_light_rate: 0.02
random_embodiment: false
camera:
head_camera_type: D435
wrist_camera_type: D435
collect_head_camera: true
collect_wrist_camera: true
data_type:
rgb: true
third_view: false
depth: false
pointcloud: false
observer: false
endpose: false
qpos: true
mesh_segmentation: false
actor_segmentation: false
pcd_down_sample_num: 1024
pcd_crop: true
dual_arm: true
save_path: ./data
clear_cache_freq: 5
collect_data: true
eval_video_log: true
2. 🔧 Configuration Breakdown¶
2.1 🎯 Task & Embodiment Settings¶
Field | Type | Required | Description |
---|---|---|---|
embodiment | list | ✅ | List of robot embodiment(s). For a dual-arm robot, use [name] , e.g., [aloha-agilex] ; to combine two single-arm robots, use [left, right, interval] , e.g., embodiment: [piper, franka-panda, 0.6] , embodiment: [franka-panda, franka-panda, 0.8] . The interval specifies the distance between arms (typically 0.6–0.8 meters). Available Embodiment: ur5-wsg , ARX-X5 , franka-panda , piper , aloha-agilex (dual-arm) |
dual_arm | bool | optional | Whether to use both arms. Default: true . |
use_seed | bool | ✅ | Whether to use a predefined seed list from data/${task_name}/${task_config}/seed.txt . If false , the system will automatically explore viable seeds. |
episode_num | int | ✅ | Number of successful episodes to collect. |
language_num | int | optional | If using language-conditioned task planning, sets the number of language descriptions to sample for each task. |
2.2 🧠 Domain Randomization¶
Configure task variation for better generalization.
domain_randomization:
random_background: true
cluttered_table: true
clean_background_rate: 0.02
random_head_camera_dis: 0
random_table_height: 0.03
random_light: true
crazy_random_light_rate: 0.02
random_embodiment: false
Field | Type | Description |
---|---|---|
random_background | bool | Enable random textures for the table and background. |
cluttered_table | bool | Add distractor objects to the table to simulate a cluttered environment. |
clean_background_rate | float | Ratio of clean backgrounds (e.g., 0.02 = 2%). Only effective if random_background is true . |
random_head_camera_dis | float | Random displacement applied to the head camera position (in meters). |
random_table_height | float | Random variation in the table height (in meters). |
random_light | bool | Enable randomized lighting during simulation. |
crazy_random_light_rate | float | Probability of applying extreme lighting. Only effective if random_light is true . |
random_embodiment | bool | Enable embodiment randomization (experimental, currently not fully supported). |
2.3 📷 Camera Configuration¶
camera:
head_camera_type: D435
wrist_camera_type: D435
collect_head_camera: true
collect_wrist_camera: true
Field | Type | Description |
---|---|---|
head_camera_type | str | Camera used for global observation. Options: see task_config/_camera_config.yml . |
wrist_camera_type | str | Camera used for close-up view. |
collect_head_camera | bool | Whether to collect head-view data. |
collect_wrist_camera | bool | Whether to collect wrist-view data. |
2.4 📦 Data Collection Settings¶
Field | Type | Description |
---|---|---|
collect_data | bool | Enable actual data saving. |
save_freq | int | Save every N steps. Per-step indicates 0.004s in the real world. |
save_path | str | Directory to save data. Default: ./data . |
clear_cache_freq | int | Controls the frequency (in episodes) at which the Sapien scene cache is cleared. This helps manage GPU memory usage, especially when domain randomization is enabled and many diverse assets accumulate in memory. A smaller value (e.g., 1) increases clearing frequency but incurs additional time cost. |
eval_video_log | bool | Save evaluation videos for replay. |
2.5 💾 Data Type¶
Specify which data to collect in each episode:
data_type:
rgb: true
third_view: false
depth: false
pointcloud: false
observer: false
endpose: false
qpos: true
mesh_segmentation: false
actor_segmentation: false
Type | Description |
---|---|
rgb | RGB image from multiple views. |
third_view | Third-person video. |
depth | Depth images from cameras (mm). |
pointcloud | Merged point cloud of the scene. |
observer | Observer-view RGB frame. |
endpose | End-effector pose and gripper opening ratio. |
qpos | Robot joint angles. |
mesh_segmentation | Per-object segmentation from mesh. |
actor_segmentation | Per-actor segmentation from RGB. |
2.5.1 Note¶
endpose
will get an dict containingleft_endpose
,left_gripper
,right_endpose
andright_gripper
. Theleft_endpose
andright_endpose
are list of 7 elements represent the position in world and orientation of the end-effectors, following the orderx, y, z, qw, qx, qy, qz
. And theleft_gripper
andright_gripper
are float numbers, which repersent the opening ratio of the gripper, ranging from 0 to 1. The rotation of end-effector is as the image below: for all embodiments, the end-effector rotation is consistent, with the x-axis pointing across the gripper and the z-axis pointing across the camera.
¶
2.6 🔍 Point Cloud Settings¶
Field | Type | Description |
---|---|---|
pcd_down_sample_num | int | FPS (Farthest Point Sampling) number; set 0 to keep all points. |
pcd_crop | bool | Whether to crop out table/walls based on known transforms. |
2.7 🎥 Rendering¶
Field | Type | Description |
---|---|---|
render_freq | int | Render visualization every N steps. Set to 0 to disable. For servers without display, recommend 0 . If you want to visualize the task, try to modify it to 20 (as example) |
3. 📌 Notes¶
- All task names must correspond to files in
env/<task_name>.py
. -
For available embodiments and cameras, refer to:
-
task_config/_embodiment_config.yml
task_config/_camera_config.yml
- The system supports both dual-arm and single-arm setups.
- Seeds, if used, are located in
task_config/seeds/
.