Actor#
This module defines the actor network responsible for selecting actions based on the current state.
Key Points:
The
Actormodels the policy function, mapping states to actions.Supports stochastic or deterministic action outputs depending on architecture.
Designed to work seamlessly with the critic networks for policy gradient or actor-critic algorithms.
Includes device management and flexible architecture configurations.
Provides methods to sample or compute actions and to evaluate policy log probabilities if applicable.
Usage Notes:
The actor receives states as input and outputs actions compatible with the environment.
Can be extended or customized by modifying the underlying network architecture or sampling strategy.
Important
The actor network must output actions consistent with the environment’s action space. Ensure proper action normalization or bounding if required.
Attention
For custom policy distributions (e.g., discrete, Gaussian), ensure your act method correctly samples actions.
Here are the detailed methods and attributes.
- class objectrl.models.basic.actor.Actor(config: MainConfig, dim_state: int, dim_act: int)[source]#
Bases:
Module,ABCAbstract base class for Actor network in Actor-Critic algorithms.
Handles policy network, optional target network, and optimization.
- config#
Configuration object.
- Type:
MainConfig
- device#
Device for tensor computations.
- Type:
torch.device
- verbose#
Verbosity flag.
- Type:
bool
- has_target#
Flag for using a target network.
- Type:
bool
- iter#
Training iteration counter.
- Type:
int
- dim_state#
Observation space shape.
- Type:
int
- dim_act#
Action space shape.
- Type:
int
- _tau#
Polyak averaging coefficient for target updates.
- Type:
float
- _gamma#
Discount factor for returns.
- Type:
float
- _reset#
Flag whether to reset model at initialization.
- Type:
bool
- model#
Main actor network.
- Type:
nn.Module
- target#
Target actor network.
- Type:
nn.Module, optional
- optim#
Optimizer for the actor parameters.
- Type:
torch.optim.Optimizer
- __init__(config: MainConfig, dim_state: int, dim_act: int) None[source]#
Initializes the Actor.
- Parameters:
config (MainConfig) – Configuration dataclass instance.
dim_state (int) – Dimension of observation space.
dim_act (int) – Dimension of action space.
- Returns:
None
- reset() None[source]#
Initializes or resets the main and target policy networks and optimizer. Also sets the model architecture based on the configuration.
- Parameters:
None
- Returns:
None
- init_target() None[source]#
Copies the main model parameters to the target network.
- Parameters:
None
- Returns:
None
- act(state: Tensor, is_training: bool = True) dict[source]#
Computes actions given input states.
- Parameters:
state (torch.Tensor) – Input state tensor.
is_training (bool) – Whether in training mode.
- Returns:
Dictionary containing action tensor and optionally log probabilities.
- Return type:
dict
- act_target(state: Tensor) dict[source]#
Computes actions using the target policy network.
- Parameters:
state (torch.Tensor) – Input state tensor.
- Returns:
Dictionary containing action tensor and log probabilities.
- Return type:
dict
- update_target() None[source]#
Performs a soft update of the target network using Polyak averaging.
- Parameters:
None
- Returns:
None
- abstractmethod loss(*args, **kwargs) Tensor[source]#
Abstract method to compute the loss for the actor. Should be overridden in subclasses.
- Parameters:
*args – Variable length argument list.
**kwargs – Arbitrary keyword arguments.
- Returns:
Computed loss tensor.
- Return type:
torch.Tensor
- update(state: Tensor, critics: CriticEnsemble) None[source]#
Performs a gradient update on the actor network.
- Parameters:
state (Tensor) – Input state batch.
critics (object) – Critic networks for computing Q-values.
- Returns:
None