Experiments#
Overview#
The experiments/ module defines the logic for running reinforcement learning experiments. It contains reusable base classes and specialized experiment implementations that manage:
Environment creation and initialization
Agent instantiation
Training loops
Evaluation procedures
Logging, saving, and resetting mechanisms
This design allows for flexible experimentation across different environments, training setups, and agent architectures.
Module Structure#
base_experiment.py: Contains the abstract base class Experiment that sets up the environment and agent. It provides a general interface (train, test) but does not implement training or evaluation logic.
control_experiment.py: Defines ControlExperiment, a concrete implementation of Experiment, that implements the full training and evaluation lifecycle. It is suitable for continuous control tasks in Gym-like environments.
Base Experiment#
class Experiment:
"""
A base class for setting up and managing reinforcement learning experiments.
Attributes:
config : MainConfig
The configuration used throughout the experiment.
n_total_steps : int
Counter for the total number of steps taken during training.
env : gym.Env
The main training environment.
eval_env : gym.Env
The evaluation environment used to test agent performance.
agent : Any
The reinforcement learning agent initialized based on the provided config.
"""
def __init__(self, config: "MainConfig") -> None:
"""
Initializes the Experiment with a given configuration.
"""
self.config = config
self.n_total_steps = 0
self.env = make_env(
self.config.env.name, self.config.system.seed, self.config.env
)
self.eval_env = make_env(
self.config.env.name,
self.config.system.seed,
self.config.env,
eval_env=True,
num_envs=(
self.config.training.eval_episodes
if self.config.training.parallelize_eval
else 1
),
)
# Extract environmental hyperparameters into the general config file
self.config.env.env = cast(gym.Env, self.env)
self.agent = get_model(self.config)
self.agent.logger.log(f"Model: \n{self.agent}")
self._discrete_action_space = isinstance(
self.env.action_space, gym.spaces.Discrete
)
if self._discrete_action_space and not self.agent.requires_discrete_actions():
raise NotImplementedError(
"The chosen agent is only available for continuous action spaces."
)
elif not self._discrete_action_space and self.agent.requires_discrete_actions():
raise NotImplementedError(
"The chosen agent is only available for discrete action spaces."
)
def train(self) -> None:
"""Starts the training process of the reinforcement learning agent.
Args:
None
Returns:
None
"""
raise NotImplementedError(
f"train() not implemented for {self.__class__.__name__}!"
)
def test(self) -> None:
"""Evaluates the performance of the trained agent.
Args:
None
Returns:
None
"""
raise NotImplementedError(
f"test() not implemented for {self.__class__.__name__}!"
)
The Experiment class serves as a blueprint for all experiment types. It initializes the environment and agent using the provided MainConfig object. This base class is extended by other experiment types that implement concrete logic for train() and test() methods.
Control Experiment#
Training#
"""
The Experiment class for training and evaluating.
This class defines the core training loop, manages interaction with the environment,
performs evaluations at regular intervals, and handles model saving and logging.
Args:
max_steps : int
Maximum number of training steps.
warmup_steps : int
Number of initial steps using random actions before policy-based action selection.
device : torch.device
Device (CPU/GPU) used for tensor computations.
"""
def __init__(self, config: "MainConfig"):
super().__init__(config)
# Retrieve training parameters from the configuration
self.max_steps: int = self.config.training.max_steps
self.warmup_steps: int = self.config.training.warmup_steps
self.device = torch.device(config.system.device)
self._vectorized_eval = self.config.training.parallelize_eval
self.verbose = self.config.verbose
def train(self) -> None:
"""
Runs the training loop for the agent, managing interactions with the environment,
learning updates, evaluations, and logging.
Args:
None
Returns:
None
"""
time_start = time.time()
# Dictionary to store rewards and steps for logging
information_dict = {
"episode_rewards": torch.zeros(self.max_steps),
"episode_steps": torch.zeros(self.max_steps),
"step_rewards": np.empty((2 * self.max_steps), dtype=object),
}
# Initialize the environment and state
state, _ = self.env.reset()
state = totorch(state, device=self.device)
r_cum = np.zeros(1)
episode = 0
e_step = 0
# Training loop
for step in tqdm(
range(self.max_steps),
leave=True,
disable=not self.config.progress,
):
e_step += 1
# Reset agent periodically if configured
if (
step > self.warmup_steps
and self.config.training.reset_frequency > 0
and step % self.config.training.reset_frequency == 0
):
self.agent.reset()
# Evaluate the agent at specified intervals
if step % self.config.training.eval_frequency == 0:
self.eval(step)
# Select an action (random during warmup, policy-based afterward)
if step < self.warmup_steps:
action = self.env.action_space.sample()
action = totorch(np.clip(action, -1.0, 1.0), device=self.device)
act_dict = {"action": action}
else:
act_dict = self.agent.select_action(state)
action = act_dict["action"].clip(-1.0, 1.0)
# Take a step in the environment
next_state, reward, terminated, truncated, info = self.env.step(
int(action) if self._discrete_action_space else tonumpy(action)
)
next_state = totorch(next_state, device=self.device)
transition_kwargs = {
**act_dict,
"state": state,
"next_state": next_state,
"reward": reward,
"terminated": terminated,
"truncated": truncated,
"step": step + 1,
}
transition = self.agent.generate_transition(**transition_kwargs)
# Store the transition in replay buffer
self.agent.store_transition(transition)
# Log per-step reward
information_dict["step_rewards"][self.n_total_steps + step] = (
episode,
step,
reward,
)
state = next_state # Update state
r_cum += reward # Update cumulative reward
# Perform learning updates at specified intervals
if (
step >= self.warmup_steps
and (step % self.config.training.learn_frequency) == 0
):
self.agent.learn(
max_iter=self.config.training.max_iter,
n_epochs=self.config.training.n_epochs,
)
# Episode termination
if terminated or truncated:
information_dict["episode_rewards"][episode] = r_cum.item()
information_dict["episode_steps"][episode] = step
# Save episode summary
self.agent.logger.episode_summary(episode, step, information_dict)
# Reset the environment for the next episode
state, _ = self.env.reset()
state = totorch(state, device=self.device)
r_cum = np.zeros(1)
episode += 1
e_step = 0
# Save model and logs at specified intervals
if step % self.config.logging.save_frequency == 0:
self.agent.logger.save(information_dict, episode, step)
self.agent.save()
# Final evaluation after training
self.eval(step)
time_end = time.time()
self.agent.save()
self.agent.logger.save(information_dict, episode, step)
self.agent.logger.log(f"Training time: {time_end - time_start:.2f} seconds")
# ruff: noqa: C901
@torch.inference_mode()
ControlExperiment defines a standard training loop for training reinforcement learning agents. It supports warmup phases, periodic evaluations, and conditional resets based on training progress. It also handles interaction with the environment and manages learning updates and model saving.
Evaluation#
"""
Evaluates the agent over multiple episodes in the evaluation environment.
Args:
n_step (int): The current training step at which evaluation is performed.
Returns:
None
"""
self.agent.eval() # Set agent to evaluation mode
# Save RNG states
torch_rng_state = torch.get_rng_state()
if torch.cuda.is_available():
cuda_rng_state = torch.cuda.get_rng_state_all()
# Set deterministic seed for eval
eval_seed = self.config.system.seed + 12345
torch.manual_seed(eval_seed)
if torch.cuda.is_available():
torch.cuda.manual_seed(eval_seed)
torch.cuda.manual_seed_all(eval_seed)
# Store rewards for evaluation episodes
results = torch.zeros(self.config.training.eval_episodes)
if self._vectorized_eval:
states, _ = self.eval_env.reset()
states = totorch(states, device=self.device)
dones = torch.zeros(self.config.training.eval_episodes, dtype=torch.bool)
while not torch.all(dones):
actions = self.agent.select_action(states, is_training=False)["action"]
if self._discrete_action_space:
actions = actions.int()
next_states, rewards, term, trunc, _ = self.eval_env.step(
tonumpy(actions)
)
done = torch.tensor(term) | torch.tensor(trunc)
results += torch.tensor(rewards) * (
~done
) # only add reward to running environments
dones |= done
states = totorch(next_states, device=self.device)
else:
# Run multiple evaluation episodes
for episode in range(self.config.training.eval_episodes):
state, info = self.eval_env.reset()
state = totorch(state, device=self.device)
done = False
while not done:
# Select action using the agent's policy (without exploration)
action = self.agent.select_action(state, is_training=False)[
"action"
]
# Execute action in the environment
next_state, reward, term, trunc, info = self.eval_env.step(
int(action) if self._discrete_action_space else tonumpy(action)
)
# Check termination condition
done = term or trunc
# Update state and record reward
state = totorch(next_state, device=self.device)
results[episode] += reward
# If using Sparse MetaWorld env, adjust reward to reflect success
# mean_reward in save_eval_results will be equal to success rate
if "success" in info and self.config.env.sparse_rewards:
results[episode] = reward + 1.0
self.agent.logger.save_eval_results(n_step, results)
if self.verbose:
tqdm.write(f"{n_step}: {results.mean():.4f} +/- {results.std():.4f}")
During evaluation, the agent runs multiple episodes in a separate evaluation environment. It uses its learned policy (without exploration noise) to gather performance statistics, which are logged and saved for later analysis. Evaluation runs do not affect the training process.
Experiment Management#
Configuration and Integration#
Experiments are initialized using a MainConfig object, typically loaded from a YAML file or data classes. This configuration:
Specifies environment and training parameters
Defines the model architecture
Controls system-level behavior like device usage
Sets logging and evaluation frequencies
The experiment automatically uses this configuration to construct all required components, making the system fully reproducible and easily tunable.
Usage Example#
Here’s a minimal example of how to instantiate and run an experiment:
from objectrl.config.config import MainConfig
from objectrl.experiments.control_experiment import ControlExperiment
config = MainConfig.from_config(config_dict, model_name="my_model")
experiment = ControlExperiment(config)
experiment.train()
# Optional: run evaluation after training
experiment.eval(config.training.max_steps)
Extending Experiments#
To create a new type of experiment (e.g., curriculum learning, adversarial training), subclass the Experiment base class:
class MyCustomExperiment(Experiment):
def train(self):
# Custom training logic
pass
def test(self):
# Optional test logic
pass
Attention
When extending Experiment, you must implement the train method. Optionally, you can override test`, ``eval, or run for more customized workflows.
This design allows for experimentation with novel agent–environment interactions while reusing shared logic for environment and agent setup.