Tutorial 1: Step-by-Step AC Implementation#
This tutorial walks through implementing a custom Actor-Critic (AC) algorithm named MyAC using ObjectRL.
We cover:
Defining the MyAC agent, actor, and critic
Customizing architecture via configuration
Extending or replacing networks and extractors
Registering and running the model
Evaluating results from saved logs
1. Create the MyAC Agent#
The MyAC agent inherits from the base ActorCritic class in objectrl/models/basic/ac.py. We define this custom agent and its specific actor and critic by creating a new file:
objectrl/models/myac.py
in which we define
The agent:
MyACclass (subclass ofActorCritic)The actor:
MyACActor(e.g., inheriting fromActor)The critic:
MyACCritic(e.g., inheriting fromCriticEnsemble)
from objectrl.models.basic.ac import ActorCritic
from objectrl.models.basic.actor import Actor
from objectrl.models.basic.critic import CriticEnsemble
class MyACActor(Actor):
pass
class MyACCritic(CriticEnsemble):
pass
class MyAC(ActorCritic):
_agent_name = "MyAC"
def __init__(self, config, critic_type=MyACCritic, actor_type=MyACActor):
super().__init__(config, critic_type, actor_type)
In this tutorial, we keep the behavior of these identical to their parents, but they could overwrite or extend the behavior of the base actor-critic components as needed.
2. Define Model-Specific Configuration#
We define their corresponding configuration dataclasses for each of the three classes in objectrl/config/model_configs/myac.py.
In this example we’ll define new architectures for actor and critic.
This file contains dataclasses that define:
MyACActorConfigMyACCriticConfigMyACConfig(with references to the actor and critic config classes)
from dataclasses import dataclass, field
from objectrl.models.myac import MyACActor, MyACCritic
from objectrl.nets.actor_nets import MyActorNet
from objectrl.nets.critic_nets import MyCriticNet
@dataclass
class MyACActorConfig:
arch: type = MyActorNet
@dataclass
class MyACCriticConfig:
arch: type = MyCriticNet
@dataclass
class MyACConfig:
name: str = "myac"
actor: MyACActorConfig = field(default_factory=MyACActorConfig)
critic: MyACCriticConfig = field(default_factory=MyACCriticConfig)
These override and extend the default settings from objectrl/config/model.py and allow model-specific control over architecture and training behavior.
3. Define the Actor and Critic Networks#
By default, ObjectRL uses the MLP class from objectrl/utils/net_utils.py as the feature extractor inside both actor and critic networks.
To use custom actor and critic architectures:
Define
MyActorNetinobjectrl/nets/actor_nets.pyDefine
MyCriticNetinobjectrl/nets/critic_nets.py
from objectrl.utils.net_utils import MLP
import torch.nn as nn
class MyActorNet(nn.Module):
def __init__(self, dim_in, dim_out, depth, width, act, has_norm):
super().__init__()
self.net = MLP(dim_in, dim_out, depth, width, act, has_norm)
def forward(self, x):
return self.net(x)
from objectrl.utils.net_utils import MLP
import torch.nn as nn
class MyCriticNet(nn.Module):
def __init__(self, dim_in, dim_out, depth, width, act, has_norm):
super().__init__()
self.net = MLP(dim_in, dim_out, depth, width, act, has_norm)
def forward(self, x):
return self.net(x)
If you want to modify the feature extractor, write a new class (e.g., MyFeatureExtractor) in:
objectrl/utils/net_utils.py
Then use this extractor inside your custom actor/critic networks. The names of the actor and critic networks are set in the model-specific config files mentioned above.
4. Register the Agent#
Once MyAC is implemented, you should register it in your model factory function by updating get_model:
case "myac":
return MyAC(config, critic.critic_type, actor.actor_type)
5. Run Training#
You can train the model using ObjectRL’s command line interface:
python objectrl/main.py --model.name "myac" --env.name "cheetah"
For most common settings (e.g., environment, training steps, seed), command-line arguments are an easy and effective way to configure your experiment.
For more structured or complex configurations, you can write a YAML configuration file:
model:
name: myac
env:
name: cheetah
training:
max_steps: 100_000
system:
seed: 42
This allows you to override default values defined in the dataclass configuration while keeping experiments reproducible and clean.
6. Evaluate and Visualize#
All logs and evaluation results are saved in the following directory structure:
_logs/env_name/model_name/seed/timestamp/
This directory contains:
log.log: Logs of training progresslearning-curve.png: Training performance curveeval-curve.png: Evaluation performance curveepisode_rewards.npy, step_rewards.npy, eval_results.npy: Numpy arrays with rewards and evaluation metrics
To inspect or visualize your experiment results, refer to these logs and images directly.