ActorCritic#

This module combines an actor and a critic into a single model to enable joint policy and value learning.

Key Points:

Integrates the policy network (actor) with a value estimator (critic) for efficient reinforcement learning.
Supports synchronized updates and coordinated forward passes.
Provides access to actor and critic components separately when needed.
Facilitates algorithms like DDPG, TD3, or SAC that rely on actor-critic architecture.
Manages device placement and configuration consistency between components.

Usage Notes:

The ActorCritic class simplifies the training pipeline by encapsulating both networks.
Enables computation of both action selections and value predictions from the same input states.

Attention

The synchronization between actor and critic updates is crucial for stable training.

Note

Target networks, if used, should be updated with care using soft updates (Polyak averaging).

Here are the detailed methods and attributes.

class objectrl.models.basic.ac.ActorCritic(config: MainConfig, critic_type: type[CriticEnsemble], actor_type: type[Actor])[source]#

Bases: Agent

Base Actor-Critic agent combining an actor policy and a critic ensemble. This class serves as a foundation for various Actor-Critic algorithms.

Parameters:

config (MainConfig) – Configuration object containing model hyperparameters.
critic_type (type[CriticEnsemble]) – Type of critic to use.
actor_type (type[Actor]) – Type of actor to use.

critic#

Critic network ensemble instance.

Type:: CriticEnsemble

actor#

Actor network instance.

Type:: Actor

policy_delay#

Number of critic updates per actor update.

Type:: int

n_iter#

Iteration counter for training steps.

Type:: int

_agent_name = 'AC'#

__init__(config: MainConfig, critic_type: type[CriticEnsemble], actor_type: type[Actor]) → None[source]#

Initializes the ActorCritic agent with actor and critic networks.

Parameters:

config (MainConfig) – Configuration dataclass instance.
critic_type (type[CriticEnsemble]) – Critic class type.
actor_type (type[Actor]) – Actor class type.

Returns:

None

learn(max_iter: int = 1, n_epochs: int = 0) → None[source]#

Perform the learning process for the agent.

Parameters:

max_iter (int) – Maximum number of iterations for learning.
n_epochs (int) – Number of epochs for training. If 0, random sampling is used.

Returns:

None

select_action(state: Tensor, is_training: bool = True) → Tensor[source]#

Select an action based on the current state.

Parameters:

state (torch.Tensor) – The current state.
is_training (bool) – Whether the agent is in training mode.

Returns:

The selected action.

Return type:

torch.Tensor

reset() → None[source]#

Reset the agent.

Parameters:: None
Returns:: None

ActorCritic

Contents

ActorCritic#