Soft Actor Critic (SAC)#

off-policy stochastic

Paper: Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Pseudocode#

Configuration#

Specific configuration for the SAC algorithm (in config/model_configs/).#

@dataclass
class SACActorConfig:
    """
    Configuration for the SAC actor network.

    Attributes:
        arch (type): Neural network architecture class for the actor.
        actor_type (type): Actor class type.
    """

    arch: type = ActorNetProbabilistic
    actor_type: type = SACActor


@dataclass
class SACCriticConfig:
    """
    Configuration for the SAC critic network ensemble.

    Attributes:
        arch (type): Neural network architecture class for the critic.
        critic_type (type): Critic class type.
    """

    arch: type = CriticNet
    critic_type: type = SACCritic


@dataclass
class SACConfig:
    """
    Main SAC algorithm configuration class.

    Attributes:
        name (str): Algorithm identifier.
        loss (str): Loss function used for critic training.
        policy_delay (int): Number of critic updates per actor update.
        tau (float): Polyak averaging coefficient for target network updates.
        target_entropy (float | None): Target entropy for automatic temperature tuning.
        alpha (float): Initial temperature parameter.
        actor (SACActorConfig): Actor configuration.
        critic (SACCriticConfig): Critic configuration.
    """

    name: str = "sac"
    loss: str = "MSELoss"
    policy_delay: int = 1
    tau: float = 0.005
    target_entropy: float | None = None
    alpha: float = 1.0

    actor: SACActorConfig = field(default_factory=SACActorConfig)
    critic: SACCriticConfig = field(default_factory=SACCriticConfig)

UML Diagram#

We use the UML diagram to illustrate the relationships between the classes in our SAC implementation.

The diagram shows how the SACActor and SACCritic classes inherit from the base classes Actor and CriticEnsemble, respectively. SoftActorCritic class also inherits from ActorCritic class which inherits from Agent.

We illustrate each class's crucial attributes and methods for SAC. Specifically:

get_bellman_target() method in SACCritic class is implemented to compute the Bellman target for the critic in SAC style.

update_alpha(), loss(), and update() methods in SACActor class are implemented to update the actor's policy and temperature parameter in SAC style.

Classes#

class objectrl.models.sac.SACActor(config: MainConfig, dim_state: int, dim_act: int)[source]#

Bases: Actor

Soft Actor network with automatic temperature tuning.

Parameters:

config (MainConfig) – Configuration object with hyperparameters.
dim_state (int) – Observation space dimensions.
dim_act (int) – Action space dimensions.

Soft Actor Critic (SAC)

Contents

Soft Actor Critic (SAC)#

Pseudocode#

Configuration#

UML Diagram#

Classes#