Distributional Random Network Distillation (DRND)#

exploration distributional RL

Paper: Exploration and Anti-Exploration with Distributional Random Network Distillation

Pseudocode#

Configuration#

Specific configuration for the DRND algorithm (in config/model_configs/).#

# [start-bonus-config]
@dataclass
class DRNDBonusConfig:
    """
    Configuration for the DRND exploration bonus network.
    Implements the randomized target ensemble used for exploration.
    The default values follow Yang et al., 2024

    Attributes:
        depth (int): Number of hidden layers.
        width (int): Width (number of units) in each hidden layer.
        norm (bool): Whether to apply layer normalization.
        activation (str): Activation function to use ('relu' or 'crelu'). User should add other activation functions if needed.
        dim_out (int): Output dimensionality of the bonus network.
        scale_factor (float): Scaling factor between two bonus terms.
        n_members (int): Size of the target ensemble.
        learning_rate (float): Learning rate for training the predictor network.
    """

    depth: int = 4
    width: int = 256
    norm: bool = True
    activation: Literal["relu", "crelu"] = "relu"
    dim_out: int = 32
    scale_factor: float = 0.9
    n_members: int = 10
    learning_rate: float = 1e-4


# [end-bonus-config]


@dataclass
class DRNDActorConfig:
    """
    Configuration for the actor network used in DRND.

    Attributes:
        arch (type): The neural network architecture to use.
        actor_type (type): The actor class (typically DRNDActor).
        lambda_actor (float): Scaling coefficient for exploration bonus in the actor loss.
    """

    arch: type = ActorNetProbabilistic
    actor_type: type = DRNDActor
    lambda_actor: float = 1.0


@dataclass
class DRNDCriticConfig:
    """
    Configuration for the critic network used in DRND.

    Attributes:
        arch (type): The neural network architecture to use.
        critic_type (type): The critic class (typically DRNDCritics).
        lambda_critic (float): Scaling coefficient for exploration bonus in the critic target.
    """

    arch: type = CriticNet
    critic_type: type = DRNDCritics
    lambda_critic: float = 1.0


@dataclass
class DRNDConfig:
    """
    Full configuration for the DRND algorithm.

    Attributes:
        name (str): Name of the algorithm.
        bonus_conf (DRNDBonusConfig): Configuration for bonus (RND) component.
        target_entropy (float | None): Target entropy for entropy regularization.
        alpha (float): Entropy regularization coefficient.
        loss (str): Type of loss function ('MSELoss').
        policy_delay (int): Number of critic updates per actor update.
        tau (float): Soft update coefficient for Polyak averaging
        actor (DRNDActorConfig): Configuration for actor.
        critic (DRNDCriticConfig): Configuration for critic.
    """

    name: str = "drnd"
    bonus_conf: DRNDBonusConfig = field(default_factory=DRNDBonusConfig)
    target_entropy: float | None = None
    alpha: float = 1.0
    loss: str = "MSELoss"
    policy_delay: int = 1
    tau: float = 0.005
    actor: DRNDActorConfig = field(default_factory=DRNDActorConfig)
    critic: DRNDCriticConfig = field(default_factory=DRNDCriticConfig)

    def __post_init__(self):
        if isinstance(self.bonus_conf, dict):
            self.bonus_conf = DRNDBonusConfig(**self.bonus_conf)

UML Diagram#

We use the UML diagram to illustrate the relationships between the classes in our DRND implementation.

The diagram shows how the DRNDActor and DRNDCritic classes inherit from the base classes Actor and CriticEnsemble, respectively. DistributionalRandomNetworkDistillation class also inherits from ActorCritic class which inherits from Agent.

We illustrate each class's crucial attributes and methods for DRND. Specifically:

get_bellman_target() method in DRNDCritic class is implemented to compute the Bellman target with exploration bonuses derived from the DRND module.

DRNDBonus class is introduced to compute exploration signals by measuring the discrepancy between a fixed and a predictor network, similar to RND but using a distributional target.

The actor uses DRNDActor class to sample actions based on the regular policy and is trained using entropy-regularized policy gradients.

Classes#

class objectrl.models.drnd.DRNDBonus(config: MainConfig, dim_state: int, dim_act: int)[source]#

Bases: Module

Distributional Random Network Distillation (DRND) bonus module. Provides an exploration bonus based on disagreement between an ensemble of target networks and a learned predictor network. Based on Yang et al. (2024).

Parameters:

Distributional Random Network Distillation (DRND)

Contents

Distributional Random Network Distillation (DRND)#

Pseudocode#

Configuration#

UML Diagram#

Classes#