ActorCritic#
This module combines an actor and a critic into a single model to enable joint policy and value learning.
Key Points:
Integrates the policy network (actor) with a value estimator (critic) for efficient reinforcement learning.
Supports synchronized updates and coordinated forward passes.
Provides access to actor and critic components separately when needed.
Facilitates algorithms like DDPG, TD3, or SAC that rely on actor-critic architecture.
Manages device placement and configuration consistency between components.
Usage Notes:
The
ActorCriticclass simplifies the training pipeline by encapsulating both networks.Enables computation of both action selections and value predictions from the same input states.
Attention
The synchronization between actor and critic updates is crucial for stable training.
Note
Target networks, if used, should be updated with care using soft updates (Polyak averaging).
Here are the detailed methods and attributes.
- class objectrl.models.basic.ac.ActorCritic(config: MainConfig, critic_type: type[CriticEnsemble], actor_type: type[Actor])[source]#
Bases:
AgentBase Actor-Critic agent combining an actor policy and a critic ensemble. This class serves as a foundation for various Actor-Critic algorithms.
- Parameters:
config (MainConfig) – Configuration object containing model hyperparameters.
critic_type (type[CriticEnsemble]) – Type of critic to use.
actor_type (type[Actor]) – Type of actor to use.
- critic#
Critic network ensemble instance.
- Type:
- policy_delay#
Number of critic updates per actor update.
- Type:
int
- n_iter#
Iteration counter for training steps.
- Type:
int
- _agent_name = 'AC'#
- __init__(config: MainConfig, critic_type: type[CriticEnsemble], actor_type: type[Actor]) None[source]#
Initializes the ActorCritic agent with actor and critic networks.
- Parameters:
config (MainConfig) – Configuration dataclass instance.
critic_type (type[CriticEnsemble]) – Critic class type.
actor_type (type[Actor]) – Actor class type.
- Returns:
None
- learn(max_iter: int = 1, n_epochs: int = 0) None[source]#
Perform the learning process for the agent.
- Parameters:
max_iter (int) – Maximum number of iterations for learning.
n_epochs (int) – Number of epochs for training. If 0, random sampling is used.
- Returns:
None