Randomized Ensembled Double Q-Learning (REDQ)#
off-policy critic-ensemblePaper: Randomized Ensembled Double Q-Learning: Learning Fast Without a Model
Pseudocode#
Configuration#
@dataclass
class REDQActorConfig(SACActorConfig):
"""Actor configuration for REDQ, inherits SACActorConfig without changes."""
pass
# [start-critic-config]
@dataclass
class REDQCriticConfig:
"""
Configuration class for the REDQ critic ensemble.
Attributes:
arch (type): Neural network architecture for critics.
critic_type (type): Critic class type.
n_members (int): Number of critics in the ensemble.
reduce (str): Reduction method during training.
target_reduce (str): Reduction method for target Q-value computation.
"""
arch: type = CriticNet
critic_type: type = REDQCritic
n_members: int = 10
reduce: str = "mean"
target_reduce: str = "min"
# [end-critic-config]
@dataclass
class REDQConfig(SACConfig):
"""
Main configuration class for the REDQ algorithm,
extending SACConfig with REDQ-specific parameters.
Attributes:
name (str): Algorithm name identifier.
n_in_target (int): Number of critics randomly sampled in target Q-value computation.
policy_delay (int): Number of critic updates per actor update.
actor (REDQActorConfig): Actor configuration.
critic (REDQCriticConfig): Critic configuration.
"""
name: str = "redq"
n_in_target: int = 2
policy_delay: int = 20
actor: REDQActorConfig = field(default_factory=REDQActorConfig)
critic: REDQCriticConfig = field(default_factory=REDQCriticConfig)
UML Diagram#
UML diagram for the REDQ algorithm.#
We use the UML diagram to illustrate the relationships between the classes in our REDQ implementation.
The diagram shows how we use SACActor as the actor of REDQ and REDQCritic class inherit from the SACCritic. RandomizedEnsembledDoubleQLearning class also inherits from ActorCritic class which inherits from Agent.
We illustrate each class's crucial attributes and methods for REDQ. Specifically:
reduce() method in REDQCritic class is implemented to sample a distinct subset of critics and calculate the critic target by reduction.
Classes#
- class objectrl.models.redq.REDQCritic(config: MainConfig, dim_state: int, dim_act: int)[source]#
Bases:
SACCriticREDQ critic ensemble implementing Randomized Ensembled Double Q-learning.
- Parameters:
config (MainConfig) – Configuration object containing model hyperparameters.
dim_state (int) – Dimensionality of the state space.
dim_act (int) – Dimensionality of the action space.
This class extends the SAC critic ensemble by implementing a randomized target Q-value estimation with sub-ensemble sampling.
- __init__(config: MainConfig, dim_state: int, dim_act: int) None[source]#
Initialize the critic ensemble.
- Parameters:
config (MainConfig) – Configuration object with model parameters.
dim_state (int) – Dimension of the state space.
dim_act (int) – Dimension of the action space.
- Returns:
None
- reduce(q_val_list: Tensor, reduce_type='min') Tensor[source]#
Randomly samples a subset of critics from the ensemble and reduces their Q-values.
- Parameters:
q_val_list (torch.Tensor) – List of Q-value tensors from each critic in the ensemble.
reduce_type (str) – Reduction method.
- Returns:
Reduced Q-values obtained by taking the minimum over sampled critics.
- Return type:
torch.Tensor
- class objectrl.models.redq.RandomizedEnsembledDoubleQLearning(config: MainConfig, critic_type: type = <class 'objectrl.models.redq.REDQCritic'>, actor_type: type = <class 'objectrl.models.sac.SACActor'>)[source]#
Bases:
ActorCriticREDQ agent combining REDQCritic and SACActor. Chen et al. (2021): Randomized Ensembled Double Q-Learning: Learning Fast Without a Model
- _agent_name = 'REDQ'#
- __init__(config: MainConfig, critic_type: type = <class 'objectrl.models.redq.REDQCritic'>, actor_type: type = <class 'objectrl.models.sac.SACActor'>) None[source]#
Initializes the REDQ agent.
- Parameters:
config (MainConfig) – Configuration dataclass instance.
critic_type (type) – Critic class type.
actor_type (type) – Actor class type.
- Returns:
None