Reward Wrappers#
This module defines custom reward shaping techniques using Gymnasium’s RewardWrapper.
These wrappers are used to modify the reward function to improve learning dynamics.
Descriptions#
PositionDelayWrapper: Delays reward until the agent crosses a predefined position (position_delay). Also includes a control cost penalty to discourage erratic actions.
Classes#
- class objectrl.utils.environment.reward_wrappers.PositionDelayWrapper(env: Env, position_delay: float = 2, ctrl_w: float = 0.001)[source]#
Bases:
RewardWrapperA Gymnasium wrapper that modifies the reward function based on position delay and control cost. This wrapper delays reward until the agent reaches a certain position (position_delay). It also penalizes large control signals to encourage smoother actions.
- env#
The environment to wrap.
- Type:
gym.Env
- position_delay#
Minimum x-position the agent must reach before receiving reward.
- Type:
float
- ctrl_w#
Weight for the control cost penalty term.
- Type:
float
- __init__(env: Env, position_delay: float = 2, ctrl_w: float = 0.001) None[source]#
Initialize the PositionDelayWrapper.
- Parameters:
env (gym.Env) – The environment to wrap.
position_delay (float) – Minimum x-position the agent must reach before receiving reward.
ctrl_w (float) – Weight for the control cost penalty term.
- Returns:
None
- step(action: ndarray) tuple[source]#
Take a step in the environment, modifying the reward. The environment’s reward is replaced with a custom one that combines delayed forward movement reward and a control cost.
- Parameters:
action (np.ndarray) – Action taken by the agent.
- Returns:
- (observation, modified_reward, terminated, truncated, info)
info[“x_pos”]: Current x-position of the agent.
info[“action_norm”]: Squared norm of the action.
- Return type:
tuple
- reward(observation: ndarray, action: ndarray) float[source]#
Compute the modified reward based on position delay and control penalty.
- Parameters:
observation – Current observation (unused here, but kept for compatibility).
action (np.ndarray) – Action taken by the agent.
- Returns:
Modified reward value.
- Return type:
float