Distributional Soft Actor Critic (DSAC) ======================================= .. raw:: html distributional rl quantile regression **Paper**: `DSAC: Distributional Soft Actor-Critic for Risk-Sensitive Reinforcement Learning `_ Pseudocode ---------- .. pdf-include:: ../../_static/pseudocodes/dsac.pdf :width: 100% Configuration ---------------- .. literalinclude:: ../../../objectrl/config/model_configs/dsac.py :language: python :start-after: [start-config] :end-before: [end-config] :caption: Specific configuration for the DSAC algorithm (in config/model_configs/). UML Diagram ---------------- .. figure:: ../../_static/imgs/dsac.png :width: 100% :align: center :alt: UML diagram for the DSAC algorithm. UML diagram for the DSAC algorithm. .. raw:: html

We use the UML diagram to illustrate the relationships between the classes in our DSAC implementation.

The diagram shows how the DSACActor and DSACCritic classes inherit from SACActor and SACCritic, respectively. DistributionalSoftActorCritic class also inherits from ActorCritic class which inherits from Agent.

We illustrate each class's crucial attributes and methods for DSAC. Specifically:

DSACActor adapts the SAC actor to support both fixed and learnable entropy temperature alpha.

When learnable_alpha=False, the temperature is frozen, and no optimizer is maintained. The actor loss is modified to use quantile-weighted Q-values produced by the distributional critic.

DSACCritic implements quantile-based value estimation. The get_tau() method generates quantile fractions either uniformly (fixed) or using an IQN-style sampling strategy. The Q() and Q_t() methods evaluate the ensemble over these quantile midpoints using torch.vmap, returning full value distributions.

The get_bellman_target() method computes entropy-regularized distributional Bellman targets by applying SAC's clipped minimum across ensemble quantile outputs. The update() method performs quantile regression to align predicted and target quantile distributions.

Classes ------- .. autoclass:: objectrl.models.dsac.DSACActor :undoc-members: :show-inheritance: :private-members: :members: :exclude-members: _abc_impl .. autoclass:: objectrl.models.dsac.DSACCritic :undoc-members: :show-inheritance: :private-members: :members: :exclude-members: _abc_impl .. autoclass:: objectrl.models.dsac.DistributionalSoftActorCritic :undoc-members: :show-inheritance: :private-members: :members: :exclude-members: _abc_impl