Deep Deterministic Policy Gradient (DDPG)#

off-policy deterministic

Paper: Continuous Control with Deep Reinforcement Learning

Pseudocode#

Configuration#

Specific configuration for the DDPG algorithm (in config/model_configs/).#

@dataclass
class ActorNoiseConfig:
    """
    Configuration for the Ornstein-Uhlenbeck process used to add noise to actions during training.

    Attributes:
        mu (float): Long-running mean of the noise process.
        theta (float): Speed of mean reversion.
        sigma (float): Volatility (standard deviation of noise).
        dt (float): Time step size.
        x0 (float): Initial state of the noise process (optional).
    """

    mu: float = 0
    theta: float = 0.15
    sigma: float = 0.2
    dt: float = 1e-2
    x0: float | None = None


@dataclass
class DDPGActorConfig:
    """
    Configuration for the DDPG actor network.

    Attributes:
        arch (type): Class for the actor network architecture.
        actor_type (type): Actor class to be used.
        has_target (bool): Whether the actor maintains a target network.
    """

    arch: type = ActorNet
    actor_type: type = DDPGActor
    has_target: bool = True


@dataclass
class DDPGCriticConfig:
    """
    Configuration for the DDPG critic network.

    Attributes:
        arch (type): Class for the critic network architecture.
        critic_type (type): Critic class to be used.
        n_members (int): Number of critic networks to use in the ensemble.
        loss (str): Name of the loss function to use ( "MSELoss").
        policy_delay (int): Number of critic updates per actor update.
        tau (float): Soft update coefficient for Polyak averaging.
    """

    arch: type = CriticNet
    critic_type: type = DDPGCritic
    n_members: int = 1


@dataclass
class DDPGConfig:
    """
    Top-level configuration for the Deep Deterministic Policy Gradient (DDPG) agent.

    Attributes:
        name (str): Name of the algorithm.
        noise (ActorNoiseConfig): Noise configuration for exploration.
        loss (str): Loss function for critic training.
        policy_delay (int): How often to update the actor policy.
        tau (float): Soft update coefficient for target networks.
        actor (DDPGActorConfig): Configuration for the actor.
        critic (DDPGCriticConfig): Configuration for the critic.
    """

    name: str = "ddpg"
    noise: ActorNoiseConfig = field(default_factory=ActorNoiseConfig)
    loss: str = "MSELoss"
    policy_delay: int = 1
    tau: float = 0.005
    actor: DDPGActorConfig = field(default_factory=DDPGActorConfig)
    critic: DDPGCriticConfig = field(default_factory=DDPGCriticConfig)

    def __post_init__(self) -> None:
        """
        Converts `noise` from a dictionary to an ActorNoiseConfig if needed.
        Useful when loading from a JSON or dict-based config file.

        Args:
            None
        Returns:
            None
        """
        if isinstance(self.noise, dict):
            self.noise = ActorNoiseConfig(**self.noise)

UML Diagram#

We use the UML diagram to illustrate the relationships between the classes in our DDPG implementation.

The diagram shows how the DDPGActor and DDPGCritic classes inherit from the base classes Actor and CriticEnsemble, respectively. DeepDeterministicPolicyGradient class also inherits from ActorCritic class which inherits from Agent.

We illustrate each class's crucial attributes and methods for DDPG. Specifically:

get_bellman_target() method in DDPGCritic class is implemented to compute the Bellman target for the critic in DDPG style.

act() and loss() methods in DDPGActor class are implemented to act in DDPG style and update the actor's policy.

Exploration Noise#

We use the OrnsteinUhlenbeckNoise class to inject noise into the actor's actions to encourage exploration. This noise process is commonly used in DDPG for continuous action spaces due to its temporal correlation.

class objectrl.models.ddpg.OrnsteinUhlenbeckNoise(dim_act: int, mu: float = 0, theta: float = 0.15, sigma: float = 0.2, dt: float = 0.01, x0: float | None = None)[source]#

Bases: object

Implements Ornstein-Uhlenbeck process to generate temporally correlated noise. Commonly used in DDPG to add exploration noise to continuous actions.

Parameters:

Deep Deterministic Policy Gradient (DDPG)

Contents

Deep Deterministic Policy Gradient (DDPG)#

Pseudocode#

Configuration#

UML Diagram#

Exploration Noise#

Classes#