Randomized Ensembled Double Q-Learning (REDQ)#

off-policy critic-ensemble

Paper: Randomized Ensembled Double Q-Learning: Learning Fast Without a Model

Pseudocode#

Configuration#

Specific configuration for the REDQ algorithm (in config/model_configs/).#
@dataclass
class REDQActorConfig(SACActorConfig):
    """Actor configuration for REDQ, inherits SACActorConfig without changes."""

    pass


# [start-critic-config]
@dataclass
class REDQCriticConfig:
    """
    Configuration class for the REDQ critic ensemble.

    Attributes:
        arch (type): Neural network architecture for critics.
        critic_type (type): Critic class type.
        n_members (int): Number of critics in the ensemble.
        reduce (str): Reduction method during training.
        target_reduce (str): Reduction method for target Q-value computation.
    """

    arch: type = CriticNet
    critic_type: type = REDQCritic
    n_members: int = 10
    reduce: str = "mean"
    target_reduce: str = "min"


# [end-critic-config]


@dataclass
class REDQConfig(SACConfig):
    """
    Main configuration class for the REDQ algorithm,
    extending SACConfig with REDQ-specific parameters.

    Attributes:
        name (str): Algorithm name identifier.
        n_in_target (int): Number of critics randomly sampled in target Q-value computation.
        policy_delay (int): Number of critic updates per actor update.
        actor (REDQActorConfig): Actor configuration.
        critic (REDQCriticConfig): Critic configuration.
    """

    name: str = "redq"
    n_in_target: int = 2
    policy_delay: int = 20

    actor: REDQActorConfig = field(default_factory=REDQActorConfig)
    critic: REDQCriticConfig = field(default_factory=REDQCriticConfig)


UML Diagram#

UML diagram for the REDQ algorithm.

UML diagram for the REDQ algorithm.#

We use the UML diagram to illustrate the relationships between the classes in our REDQ implementation.

The diagram shows how we use SACActor as the actor of REDQ and REDQCritic class inherit from the SACCritic. RandomizedEnsembledDoubleQLearning class also inherits from ActorCritic class which inherits from Agent.

We illustrate each class's crucial attributes and methods for REDQ. Specifically:

reduce() method in REDQCritic class is implemented to sample a distinct subset of critics and calculate the critic target by reduction.

Classes#

class objectrl.models.redq.REDQCritic(config: MainConfig, dim_state: int, dim_act: int)[source]#

Bases: SACCritic

REDQ critic ensemble implementing Randomized Ensembled Double Q-learning.

Parameters:
  • config (MainConfig) – Configuration object containing model hyperparameters.

  • dim_state (int) – Dimensionality of the state space.

  • dim_act (int) – Dimensionality of the action space.

This class extends the SAC critic ensemble by implementing a randomized target Q-value estimation with sub-ensemble sampling.

__init__(config: MainConfig, dim_state: int, dim_act: int) None[source]#

Initialize the critic ensemble.

Parameters:
  • config (MainConfig) – Configuration object with model parameters.

  • dim_state (int) – Dimension of the state space.

  • dim_act (int) – Dimension of the action space.

Returns:

None

reduce(q_val_list: Tensor, reduce_type='min') Tensor[source]#

Randomly samples a subset of critics from the ensemble and reduces their Q-values.

Parameters:
  • q_val_list (torch.Tensor) – List of Q-value tensors from each critic in the ensemble.

  • reduce_type (str) – Reduction method.

Returns:

Reduced Q-values obtained by taking the minimum over sampled critics.

Return type:

torch.Tensor

class objectrl.models.redq.RandomizedEnsembledDoubleQLearning(config: MainConfig, critic_type: type = <class 'objectrl.models.redq.REDQCritic'>, actor_type: type = <class 'objectrl.models.sac.SACActor'>)[source]#

Bases: ActorCritic

REDQ agent combining REDQCritic and SACActor. Chen et al. (2021): Randomized Ensembled Double Q-Learning: Learning Fast Without a Model

_agent_name = 'REDQ'#
__init__(config: MainConfig, critic_type: type = <class 'objectrl.models.redq.REDQCritic'>, actor_type: type = <class 'objectrl.models.sac.SACActor'>) None[source]#

Initializes the REDQ agent.

Parameters:
  • config (MainConfig) – Configuration dataclass instance.

  • critic_type (type) – Critic class type.

  • actor_type (type) – Actor class type.

Returns:

None