Reward Wrappers#

This module defines custom reward shaping techniques using Gymnasium’s RewardWrapper. These wrappers are used to modify the reward function to improve learning dynamics.

Descriptions#

  • PositionDelayWrapper: Delays reward until the agent crosses a predefined position (position_delay). Also includes a control cost penalty to discourage erratic actions.

Classes#

class objectrl.utils.environment.reward_wrappers.PositionDelayWrapper(env: Env, position_delay: float = 2, ctrl_w: float = 0.001)[source]#

Bases: RewardWrapper

A Gymnasium wrapper that modifies the reward function based on position delay and control cost. This wrapper delays reward until the agent reaches a certain position (position_delay). It also penalizes large control signals to encourage smoother actions.

env#

The environment to wrap.

Type:

gym.Env

position_delay#

Minimum x-position the agent must reach before receiving reward.

Type:

float

ctrl_w#

Weight for the control cost penalty term.

Type:

float

__init__(env: Env, position_delay: float = 2, ctrl_w: float = 0.001) None[source]#

Initialize the PositionDelayWrapper.

Parameters:
  • env (gym.Env) – The environment to wrap.

  • position_delay (float) – Minimum x-position the agent must reach before receiving reward.

  • ctrl_w (float) – Weight for the control cost penalty term.

Returns:

None

step(action: ndarray) tuple[source]#

Take a step in the environment, modifying the reward. The environment’s reward is replaced with a custom one that combines delayed forward movement reward and a control cost.

Parameters:

action (np.ndarray) – Action taken by the agent.

Returns:

(observation, modified_reward, terminated, truncated, info)
  • info[“x_pos”]: Current x-position of the agent.

  • info[“action_norm”]: Squared norm of the action.

Return type:

tuple

reward(observation: ndarray, action: ndarray) float[source]#

Compute the modified reward based on position delay and control penalty.

Parameters:
  • observation – Current observation (unused here, but kept for compatibility).

  • action (np.ndarray) – Action taken by the agent.

Returns:

Modified reward value.

Return type:

float