Common Issue¶

1. Modify `clear_cache_freq` to Reduce GPU Memory Pressure¶

If you find that your GPU memory is insufficient—especially if the program consistently runs out of memory after several episodes (during data collection or evaluation)—try adjusting the clear_cache_freq parameter in the corresponding task configuration. Note that setting a smaller clear_cache_freq value can reduce GPU memory usage but may also slow down runtime performance (For more details, see Configurations).

2. `[svulkan2] [error] OIDN Error: invalid handle`¶

Try [SAPIEN issue: https://github.com/haosulab/SAPIEN/issues/243], or modify the clear_cache_freq into 1.

3. Stuck While Collecting Data and Evaluating¶

According to the feedback, you can try to remove the “pencil” object (assets/objects/objaverse/pencil folder). This is said to resolve most issues, though the reason remains unclear. We have updated the object dataset on the Huggingface also.

Please check your GPU model. According to user feedback and known issues reported on SAPIEN [SAPIEN issue: https://github.com/haosulab/SAPIEN/issues/219], Hopper/Ampere series GPUs (e.g., A100, H100) may occasionally experience unexpected hangs during data collection.

You can run python data/process_stuck.py ${task_name} ${task_config} ${seed_id} to skip the specific seed and rerun the data collection script. You can identify the ${seed_id} from the terminal output. For example, if you get stuck at saving: episode = 478 index = 105, then 478 is the ${seed_id}. This will replace the affected seed (in data/${task_name}/${task_config}/seed.txt) and trajectory data (in data/${task_name}/${task_config}/_traj_data/) with the last seed and episode data. IMPORTANT: this may result in insufficient episode data, so please modify the episode_num parameter in the task config to collect more seeds and ensure the final dataset has enough episodes.

4. 50 Series GPU¶

See issue.

5. Join the RoboTwin Community¶

Consider joining the WeChat Community to stay connected and receive updates.