Reward Wrappers#

This module defines custom reward shaping techniques using Gymnasium’s RewardWrapper. These wrappers are used to modify the reward function to improve learning dynamics.

Descriptions#

PositionDelayWrapper: Delays reward until the agent crosses a predefined position (position_delay). Also includes a control cost penalty to discourage erratic actions.

Classes#

class objectrl.utils.environment.reward_wrappers.PositionDelayWrapper(env: Env, position_delay: float = 2, ctrl_w: float = 0.001)[source]#

Bases: RewardWrapper

A Gymnasium wrapper that modifies the reward function based on position delay and control cost. This wrapper delays reward until the agent reaches a certain position (position_delay). It also penalizes large control signals to encourage smoother actions.

env#

The environment to wrap.

Type:: gym.Env

position_delay#

Minimum x-position the agent must reach before receiving reward.

Type:: float

ctrl_w#

Weight for the control cost penalty term.

Type:: float

__init__(env: Env, position_delay: float = 2, ctrl_w: float = 0.001) → None[source]#

Initialize the PositionDelayWrapper.

Parameters:

env (gym.Env) – The environment to wrap.
position_delay (float) – Minimum x-position the agent must reach before receiving reward.
ctrl_w (float) – Weight for the control cost penalty term.

Returns:

None

step(action: ndarray) → tuple[source]#

Take a step in the environment, modifying the reward. The environment’s reward is replaced with a custom one that combines delayed forward movement reward and a control cost.

Parameters:

action (np.ndarray) – Action taken by the agent.

Returns:

(observation, modified_reward, terminated, truncated, info)

info[“x_pos”]: Current x-position of the agent.
info[“action_norm”]: Squared norm of the action.

Return type:

tuple

reward(observation: ndarray, action: ndarray) → float[source]#

Compute the modified reward based on position delay and control penalty.

Parameters:

observation – Current observation (unused here, but kept for compatibility).
action (np.ndarray) – Action taken by the agent.

Returns:

Modified reward value.

Return type:

float

Reward Wrappers

Contents

Reward Wrappers#

Descriptions#

Classes#