Distributional Soft Actor Critic (DSAC)#

distributional rl quantile regression

Paper: DSAC: Distributional Soft Actor-Critic for Risk-Sensitive Reinforcement Learning

Pseudocode#

Configuration#

Specific configuration for the DSAC algorithm (in config/model_configs/).#

@dataclass
class CriticLossConfig:
    """
    Configuration for the DSAC critic loss.

    Attributes:
        kappa (float): Huber loss threshold.
    """

    kappa: float = 1.0


@dataclass
class DSACActorConfig:
    """
    Configuration for the DSAC Actor network.

    Attributes:
        arch (type): Architecture class for the actor network.
        actor_type (type): Actor class type.
        has_target (bool): Whether to use a target network.
    """

    arch: type = ActorNetProbabilistic
    actor_type: type = DSACActor
    has_target: bool = True


@dataclass
class DSACCriticConfig:
    """
    Configuration for the DSAC Critic network.

    Attributes:
        arch (type): Architecture class for the critic network.
        critic_type (type): Critic class type.
        n_quantiles (int): Number of atoms for quantile regression.
        has_target (bool): Whether to use a target network.
        n_members (int): Number of critic members.
        tau_type (Literal["fix", "iqn"]): Type of quantile regression.
    """

    arch: type = QuantileCriticNet
    critic_type: type = DSACCritic
    norm: bool = True
    n_quantiles: int = 8
    has_target: bool = True
    n_members: int = 2
    tau_type: Literal["fix", "iqn"] = "iqn"

    def __post_init__(self):
        self.dim_out = self.n_quantiles


@dataclass
class DSACConfig:
    """
    Main DSAC algorithm configuration.

    Attributes:
        name (str): Name of the algorithm.
        loss (str): Loss function used.
        policy_delay (int): Delay for policy updates.
        tau (float): Soft update coefficient.
        target_entropy (float | None): Target entropy for the policy.
        learnable_alpha (bool): Whether the temperature parameter alpha is learnable.
        alpha (float): Initial value of the temperature parameter alpha.
        actor (DSACActorConfig): Configuration for the actor network.
        critic (DSACCriticConfig): Configuration for the critic network.
    """

    name: str = "dsac"
    lossparams: CriticLossConfig = field(default_factory=CriticLossConfig)
    loss: str = "DSACLoss"
    policy_delay: int = 1
    tau: float = 0.005
    target_entropy: float | None = None
    learnable_alpha: bool = True
    alpha: float = 1.0

    actor: DSACActorConfig = field(default_factory=DSACActorConfig)
    critic: DSACCriticConfig = field(default_factory=DSACCriticConfig)

    def __post_init__(self):
        if isinstance(self.lossparams, dict):
            self.lossparams = CriticLossConfig(**self.lossparams)

UML Diagram#

We use the UML diagram to illustrate the relationships between the classes in our DSAC implementation.

The diagram shows how the DSACActor and DSACCritic classes inherit from SACActor and SACCritic, respectively. DistributionalSoftActorCritic class also inherits from ActorCritic class which inherits from Agent.

We illustrate each class's crucial attributes and methods for DSAC. Specifically:

DSACActor adapts the SAC actor to support both fixed and learnable entropy temperature alpha.

When learnable_alpha=False, the temperature is frozen, and no optimizer is maintained. The actor loss is modified to use quantile-weighted Q-values produced by the distributional critic.

DSACCritic implements quantile-based value estimation. The get_tau() method generates quantile fractions either uniformly (fixed) or using an IQN-style sampling strategy. The Q() and Q_t() methods evaluate the ensemble over these quantile midpoints using torch.vmap, returning full value distributions.

The get_bellman_target() method computes entropy-regularized distributional Bellman targets by applying SAC's clipped minimum across ensemble quantile outputs. The update() method performs quantile regression to align predicted and target quantile distributions.

Classes#

class objectrl.models.dsac.DSACActor(config, dim_state, dim_act)[source]#

Bases: SACActor

Distributional Soft Actor-Critic (DSAC) Actor.

Distributional Soft Actor Critic (DSAC)

Contents

Distributional Soft Actor Critic (DSAC)#

Pseudocode#

Configuration#

UML Diagram#

Classes#