Randomized Ensembled Double Q-Learning (REDQ)#

off-policy critic-ensemble

Paper: Randomized Ensembled Double Q-Learning: Learning Fast Without a Model

Pseudocode#

Configuration#

Specific configuration for the REDQ algorithm (in config/model_configs/).#

@dataclass
class REDQActorConfig(SACActorConfig):
    """Actor configuration for REDQ, inherits SACActorConfig without changes."""

    pass


# [start-critic-config]
@dataclass
class REDQCriticConfig:
    """
    Configuration class for the REDQ critic ensemble.

    Attributes:
        arch (type): Neural network architecture for critics.
        critic_type (type): Critic class type.
        n_members (int): Number of critics in the ensemble.
        reduce (str): Reduction method during training.
        target_reduce (str): Reduction method for target Q-value computation.
    """

    arch: type = CriticNet
    critic_type: type = REDQCritic
    n_members: int = 10
    reduce: str = "mean"
    target_reduce: str = "min"


# [end-critic-config]


@dataclass
class REDQConfig(SACConfig):
    """
    Main configuration class for the REDQ algorithm,
    extending SACConfig with REDQ-specific parameters.

    Attributes:
        name (str): Algorithm name identifier.
        n_in_target (int): Number of critics randomly sampled in target Q-value computation.
        policy_delay (int): Number of critic updates per actor update.
        actor (REDQActorConfig): Actor configuration.
        critic (REDQCriticConfig): Critic configuration.
    """

    name: str = "redq"
    n_in_target: int = 2
    policy_delay: int = 20

    actor: REDQActorConfig = field(default_factory=REDQActorConfig)
    critic: REDQCriticConfig = field(default_factory=REDQCriticConfig)

UML Diagram#

We use the UML diagram to illustrate the relationships between the classes in our REDQ implementation.

The diagram shows how we use SACActor as the actor of REDQ and REDQCritic class inherit from the SACCritic. RandomizedEnsembledDoubleQLearning class also inherits from ActorCritic class which inherits from Agent.

We illustrate each class's crucial attributes and methods for REDQ. Specifically:

reduce() method in REDQCritic class is implemented to sample a distinct subset of critics and calculate the critic target by reduction.

Classes#

class objectrl.models.redq.REDQCritic(config: MainConfig, dim_state: int, dim_act: int)[source]#

Bases: SACCritic

REDQ critic ensemble implementing Randomized Ensembled Double Q-learning.

Parameters:

config (MainConfig) – Configuration object containing model hyperparameters.
dim_state (int) – Dimensionality of the state space.
dim_act (int) – Dimensionality of the action space.

This class extends the SAC critic ensemble by implementing a randomized target Q-value estimation with sub-ensemble sampling.

Randomized Ensembled Double Q-Learning (REDQ)

Contents

Randomized Ensembled Double Q-Learning (REDQ)#

Pseudocode#

Configuration#

UML Diagram#

Classes#