Actor

Actor#

This module defines the actor network responsible for selecting actions based on the current state.

Key Points:

The Actor models the policy function, mapping states to actions.
Supports stochastic or deterministic action outputs depending on architecture.
Designed to work seamlessly with the critic networks for policy gradient or actor-critic algorithms.
Includes device management and flexible architecture configurations.
Provides methods to sample or compute actions and to evaluate policy log probabilities if applicable.

Usage Notes:

The actor receives states as input and outputs actions compatible with the environment.
Can be extended or customized by modifying the underlying network architecture or sampling strategy.

Important

The actor network must output actions consistent with the environment’s action space. Ensure proper action normalization or bounding if required.

Attention

For custom policy distributions (e.g., discrete, Gaussian), ensure your act method correctly samples actions.

Here are the detailed methods and attributes.

class objectrl.models.basic.actor.Actor(config: MainConfig, dim_state: int, dim_act: int)[source]#

Bases: Module, ABC

Abstract base class for Actor network in Actor-Critic algorithms.

Handles policy network, optional target network, and optimization.

config#

Configuration object.

Type:: MainConfig

device#

Device for tensor computations.

Type:: torch.device

verbose#

Verbosity flag.

Type:: bool

has_target#

Flag for using a target network.

Type:: bool

iter#

Training iteration counter.

Type:: int

dim_state#

Observation space shape.

Type:: int

dim_act#

Action space shape.

Type:: int

_tau#

Polyak averaging coefficient for target updates.

Type:: float

_gamma#

Discount factor for returns.

Type:: float

_reset#

Flag whether to reset model at initialization.

Type:: bool

model#

Main actor network.

Type:: nn.Module

target#

Target actor network.

Type:: nn.Module, optional

optim#

Optimizer for the actor parameters.

Type:: torch.optim.Optimizer

__init__(config: MainConfig, dim_state: int, dim_act: int) → None[source]#

Initializes the Actor.

Parameters:

config (MainConfig) – Configuration dataclass instance.
dim_state (int) – Dimension of observation space.
dim_act (int) – Dimension of action space.

Returns:

None

reset() → None[source]#

Initializes or resets the main and target policy networks and optimizer. Also sets the model architecture based on the configuration.

Parameters:: None
Returns:: None

init_target() → None[source]#

Copies the main model parameters to the target network.

Parameters:: None
Returns:: None

act(state: Tensor, is_training: bool = True) → dict[source]#

Computes actions given input states.

Parameters:

state (torch.Tensor) – Input state tensor.
is_training (bool) – Whether in training mode.

Returns:

Dictionary containing action tensor and optionally log probabilities.

Return type:

dict

act_target(state: Tensor) → dict[source]#

Computes actions using the target policy network.

Parameters:: state (torch.Tensor) – Input state tensor.
Returns:: Dictionary containing action tensor and log probabilities.
Return type:: dict

update_target() → None[source]#

Performs a soft update of the target network using Polyak averaging.

Parameters:: None
Returns:: None

abstractmethod loss(*args, **kwargs) → Tensor[source]#

Abstract method to compute the loss for the actor. Should be overridden in subclasses.

Parameters:

*args – Variable length argument list.
**kwargs – Arbitrary keyword arguments.

Returns:

Computed loss tensor.

Return type:

torch.Tensor

update(state: Tensor, critics: CriticEnsemble) → None[source]#

Performs a gradient update on the actor network.

Parameters:

state (Tensor) – Input state batch.
critics (object) – Critic networks for computing Q-values.

Returns:

None

Actor

Contents

Actor#