Twin Delayed DDPG (TD3)#

off-policy deterministic twin-critics delayed-update

Paper: Addressing Function Approximation Error in Actor-Critic Methods

Pseudocode#

Configuration#

Specific configuration for the TD3 algorithm (in config/model_configs/).#

@dataclass
class ActorNoiseConfig:
    """
    Configuration for noise added to actor actions in TD3.

    Attributes:
        policy_noise (float): Std dev of noise added during training.
        target_policy_noise (float): Std dev of noise added to target policy actions.
        target_policy_noise_clip (float): Clipping range for target policy noise.
    """

    policy_noise: float = 0.1
    target_policy_noise: float = 0.2
    target_policy_noise_clip: float = 0.5


@dataclass
class TD3ActorConfig:
    """
    Configuration for the TD3 actor network.

    Attributes:
        arch (type): Actor network architecture class.
        actor_type (type): Actor class type.
        has_target (bool): Whether the actor has a target network.
    """

    arch: type = ActorNet
    actor_type: type = TD3Actor
    has_target: bool = True


@dataclass
class TD3CriticConfig:
    """
    Configuration for the TD3 critic network ensemble.

    Attributes:
        arch (type): Critic network architecture class.
        critic_type (type): Critic class type.
    """

    arch: type = CriticNet
    critic_type: type = TD3Critic


@dataclass
class TD3Config:
    """
    Main TD3 algorithm configuration.

    Attributes:
        name (str): Algorithm identifier.
        noise (ActorNoiseConfig): Noise parameters for exploration.
        loss (str): Loss function for critic training.
        policy_delay (int): Number of critic updates per actor update.
        tau (float): Polyak averaging coefficient for target network updates.
        actor (TD3ActorConfig): Actor network configuration.
        critic (TD3CriticConfig): Critic network configuration.
    """

    name: str = "td3"
    noise: ActorNoiseConfig = field(default_factory=ActorNoiseConfig)
    loss: str = "MSELoss"
    policy_delay: int = 2
    tau: float = 0.005

    actor: TD3ActorConfig = field(default_factory=TD3ActorConfig)
    critic: TD3CriticConfig = field(default_factory=TD3CriticConfig)

    def __post_init__(self):
        if isinstance(self.noise, dict):
            self.noise = ActorNoiseConfig(**self.noise)

UML Diagram#

We use the UML diagram to illustrate the relationships between the classes in our TD3 implementation.

The diagram shows how the TD3Actor and TD3Critic classes inherit from the base classes Actor and CriticEnsemble, respectively. TwinDelayedDeepDeterministicPolicyGradient class also inherits from ActorCritic class which inherits from Agent.

We illustrate each class's crucial attributes and methods for TD3. Specifically:

get_bellman_target() method in TD3Critic class is implemented to compute the Bellman target for the critic in TD3 style.

act(), act_target(), and loss() methods in TD3Actor class are implemented to act in TD3 style and update the actor's policy.

Classes#

class objectrl.models.td3.TD3Actor(config: MainConfig, dim_state: int, dim_act: int)[source]#

Bases: Actor

TD3 actor network with action noise for exploration and target policy smoothing.

Parameters:

config (MainConfig) – Configuration object.
dim_state (int) – Observation space dimensions.
dim_act (int) – Action space dimensions.

Twin Delayed DDPG (TD3)

Contents

Twin Delayed DDPG (TD3)#

Pseudocode#

Configuration#

UML Diagram#

Classes#