🚀 Deploy Your Policy¶

To deploy and evaluate your policy, you need to modify the following three files:

eval.sh
deploy_policy.yml
deploy_policy.py

1. 🔧 `deploy_policy.yml`¶

You are free to add any parameters needed in deploy_policy.yml to specify your model setup (e.g., checkpoint path, model type, architecture details). The entire YAML content will be passed to deploy_policy.py as usr_args, which will be available in the get_model() function.

2. 🖥️ `eval.sh`¶

Update the script to pass additional arguments to override default values in deploy_policy.yml.

#!/bin/bash

policy_name=Your_Policy
task_name=${1}
task_config=${2}
ckpt_setting=${3}
seed=${4}
gpu_id=${5}
# [TODO] Add your custom command-line arguments here

export CUDA_VISIBLE_DEVICES=${gpu_id}
echo -e "\033[33mgpu id (to use): ${gpu_id}\033[0m"

cd ../.. # move to project root

python script/eval_policy.py --config policy/$policy_name/deploy_policy.yml \
    --overrides \
    --task_name ${task_name} \
    --task_config ${task_config} \
    --ckpt_setting ${ckpt_setting} \
    --seed ${seed} \
    --policy_name ${policy_name} 
    # [TODO] Add your custom arguments here

3. 🧠 `deploy_policy.py`¶

You need to implement the following methods in deploy_policy.py:

3.1 `encode_obs(obs: dict) -> dict`¶

Optional. This function is used to preprocess the raw environment observation (e.g., color channel normalization, reshaping, etc.). If not needed, it can be left unchanged.

3.2 `get_model(usr_args: dict) -> Any`¶

Required. This function receives the full configuration from deploy_policy.yml via usr_args and must return the initialized model. You can define your own loading logic here, including parsing checkpoints and network parameters.

3.3 `eval(env, model, observation, instruction) -> Any`¶

Required. The main evaluation loop. Given the current environment instance, model, and observation (as a dictionary), and a natural language instruction (string), this function must compute the next action and execute it in the environment.

3.4 `update_obs(obs: dict) -> None`¶

Optional. Used to update any internal state of the model or observation buffer. Useful if your model requires a history of frames or a memory-based context.

3.5 `get_action(model, obs: dict) -> Any`¶

Optional. Given a model and current observation, return the action to be executed. This is useful if action computation is separated from the evaluation loop.

3.6 `reset_model() -> None`¶

Optional but recommended. This function is called before the evaluation of each episode, allowing you to reset model states such as recurrent memory, history buffers, or context encodings.

4. 📌 Notes¶

The variable instruction is a string containing the language command describing the task. You can choose how (or whether) to use it.
Your policy should be compatible with the input/output format expected by the simulator.

🚀 Deploy Your Policy¶

1. 🔧 deploy_policy.yml¶

2. 🖥️ eval.sh¶

3. 🧠 deploy_policy.py¶

3.1 encode_obs(obs: dict) -> dict¶

3.2 get_model(usr_args: dict) -> Any¶

3.3 eval(env, model, observation, instruction) -> Any¶

3.4 update_obs(obs: dict) -> None¶

3.5 get_action(model, obs: dict) -> Any¶

3.6 reset_model() -> None¶