Optimistic Actor-Critic (OAC)#

optimistic exploration

Paper: Better Exploration with Optimistic Actor-Critic

Pseudocode#

Configuration#

Specific configuration for the OAC algorithm (in config/model_configs/).#

@dataclass
class OACActorConfig:
    """
    Configuration for the OAC Actor network.

    Attributes:
        arch (type): The architecture class used for the actor network.
        actor_type (type): The actor implementation class (OACActor).
    """

    arch: type = ActorNetProbabilistic
    actor_type: type = OACActor


@dataclass
class OACCriticConfig:
    """
    Configuration for the OAC Critic network.

    Attributes:
        arch (type): The architecture class used for the critic network.
        critic_type (type): The critic implementation class ( OACCritic).
    """

    arch: type = CriticNet
    critic_type: type = OACCritic


@dataclass
class CriticNoiseConfig:
    """
    Configuration for Gaussian noise added to critic target actions.

    Attributes:
        sigma_target (float): Standard deviation of Gaussian noise.
        noise_clamp (float): Maximum absolute value to clamp the noise.
    """

    sigma_target: float = math.sqrt(0.2)
    noise_clamp: float = 0.5


@dataclass
class ActorExplorationConfig:
    """
    Configuration for optimistic exploration noise in OAC.

    Attributes:
        delta (float): Uncertainty scaling factor for exploration.
        beta_ub (float): Upper bound multiplier on critic std deviation.
    """

    delta: float = 0.1
    beta_ub: float = 4.66


@dataclass
class OACConfig:
    """
    Top-level configuration for the OAC (Optimistic Actor Critic) agent.

    Attributes:
        name (str): Agent name identifier.
        loss (str): Loss function used for training.
        policy_delay (int): Number of critic updates per actor update.
        tau (float): Polyak averaging coefficient for target networks.
        noise (CriticNoiseConfig): Configuration for Gaussian target noise.
        exploration (ActorExplorationConfig): Config for exploration noise.
        target_entropy (float or None): Target policy entropy.
        alpha (float): Entropy regularization coefficient.
        actor (OACActorConfig): Actor network configuration.
        critic (OACCriticConfig): Critic network configuration.
    """

    name: str = "oac"
    loss: str = "MSELoss"
    policy_delay: int = 1
    tau: float = 0.005
    noise: CriticNoiseConfig = field(default_factory=CriticNoiseConfig)
    exploration: ActorExplorationConfig = field(default_factory=ActorExplorationConfig)
    target_entropy: float | None = None
    alpha: float = 1.0
    actor: OACActorConfig = field(default_factory=OACActorConfig)
    critic: OACCriticConfig = field(default_factory=OACCriticConfig)

    def __post_init__(self):
        if isinstance(self.noise, dict):
            self.noise = CriticNoiseConfig(**self.noise)
        if isinstance(self.exploration, dict):
            self.exploration = ActorExplorationConfig(**self.exploration)

UML Diagram#

We use the UML diagram to illustrate the relationships between the classes in our OAC implementation.

The diagram shows how the OACActor and OACCritic classes inherit from the base classes Actor and CriticEnsemble, respectively. OptimisticActorCritic class also inherits from ActorCritic class which inherits from Agent.

We illustrate each class's crucial attributes and methods for OAC. Specifically:

get_bellman_target() method in OACCritic class is implemented to compute the Bellman target using actions perturbed by Gaussian noise, following the TD3-style target smoothing.

OptimisticNoise class is introduced to compute exploration noise by adjusting the mean of the actor's action distribution in the direction of the Q-value upper confidence bound.

select_action() method in OptimisticActorCritic class is modified to optionally use the optimistic mean for exploration, enabling uncertainty-aware policy execution.

Classes#

class objectrl.models.oac.OptimisticNoise(beta_ub: float, delta: float)[source]#

Bases: object

Computes optimistic exploration noise as described in the OAC algorithm.

beta_ub#

Optimistic Actor-Critic (OAC)

Contents

Optimistic Actor-Critic (OAC)#

Pseudocode#

Configuration#

UML Diagram#

Classes#