Experiments#

Overview#

The experiments/ module defines the logic for running reinforcement learning experiments. It contains reusable base classes and specialized experiment implementations that manage:

Environment creation and initialization
Agent instantiation
Training loops
Evaluation procedures
Logging, saving, and resetting mechanisms

This design allows for flexible experimentation across different environments, training setups, and agent architectures.

Module Structure#

base_experiment.py: Contains the abstract base class Experiment that sets up the environment and agent. It provides a general interface (train, test) but does not implement training or evaluation logic.
control_experiment.py: Defines ControlExperiment, a concrete implementation of Experiment, that implements the full training and evaluation lifecycle. It is suitable for continuous control tasks in Gym-like environments.

Base Experiment#

Base Experiment initialization.#

class Experiment:
    """
    A base class for setting up and managing reinforcement learning experiments.

    Attributes:
        config : MainConfig
            The configuration used throughout the experiment.
        n_total_steps : int
            Counter for the total number of steps taken during training.
        env : gym.Env
            The main training environment.
        eval_env : gym.Env
            The evaluation environment used to test agent performance.
        agent : Any
            The reinforcement learning agent initialized based on the provided config.
    """

    def __init__(self, config: "MainConfig") -> None:
        """
        Initializes the Experiment with a given configuration.
        """
        self.config = config
        self.n_total_steps = 0

        self.env = make_env(
            self.config.env.name, self.config.system.seed, self.config.env
        )
        self.eval_env = make_env(
            self.config.env.name,
            self.config.system.seed,
            self.config.env,
            eval_env=True,
            num_envs=(
                self.config.training.eval_episodes
                if self.config.training.parallelize_eval
                else 1
            ),
        )

        # Extract environmental hyperparameters into the general config file
        self.config.env.env = cast(gym.Env, self.env)

        self.agent = get_model(self.config)
        self.agent.logger.log(f"Model: \n{self.agent}")

        self._discrete_action_space = isinstance(
            self.env.action_space, gym.spaces.Discrete
        )

        if self._discrete_action_space and not self.agent.requires_discrete_actions():

            raise NotImplementedError(
                "The chosen agent is only available for continuous action spaces."
            )
        elif not self._discrete_action_space and self.agent.requires_discrete_actions():

            raise NotImplementedError(
                "The chosen agent is only available for discrete action spaces."
            )

    def train(self) -> None:
        """Starts the training process of the reinforcement learning agent.

        Args:
            None
        Returns:
            None
        """
        raise NotImplementedError(
            f"train() not implemented for {self.__class__.__name__}!"
        )

    def test(self) -> None:
        """Evaluates the performance of the trained agent.

        Args:
            None
        Returns:
            None
        """
        raise NotImplementedError(
            f"test() not implemented for {self.__class__.__name__}!"
        )

The Experiment class serves as a blueprint for all experiment types. It initializes the environment and agent using the provided MainConfig object. This base class is extended by other experiment types that implement concrete logic for train() and test() methods.

Control Experiment#

Training#

Training loop implementation in ControlExperiment.#

    """
    The Experiment class for training and evaluating.
    This class defines the core training loop, manages interaction with the environment,
    performs evaluations at regular intervals, and handles model saving and logging.

    Args:
    max_steps : int
        Maximum number of training steps.
    warmup_steps : int
        Number of initial steps using random actions before policy-based action selection.
    device : torch.device
        Device (CPU/GPU) used for tensor computations.
    """

    def __init__(self, config: "MainConfig"):
        super().__init__(config)

        # Retrieve training parameters from the configuration
        self.max_steps: int = self.config.training.max_steps
        self.warmup_steps: int = self.config.training.warmup_steps
        self.device = torch.device(config.system.device)
        self._vectorized_eval = self.config.training.parallelize_eval
        self.verbose = self.config.verbose

    def train(self) -> None:
        """
        Runs the training loop for the agent, managing interactions with the environment,
        learning updates, evaluations, and logging.

        Args:
            None
        Returns:
            None
        """
        time_start = time.time()

        # Dictionary to store rewards and steps for logging
        information_dict = {
            "episode_rewards": torch.zeros(self.max_steps),
            "episode_steps": torch.zeros(self.max_steps),
            "step_rewards": np.empty((2 * self.max_steps), dtype=object),
        }

        # Initialize the environment and state
        state, _ = self.env.reset()
        state = totorch(state, device=self.device)
        r_cum = np.zeros(1)
        episode = 0
        e_step = 0

        # Training loop
        for step in tqdm(
            range(self.max_steps),
            leave=True,
            disable=not self.config.progress,
        ):
            e_step += 1

            # Reset agent periodically if configured
            if (
                step > self.warmup_steps
                and self.config.training.reset_frequency > 0
                and step % self.config.training.reset_frequency == 0
            ):
                self.agent.reset()

            # Evaluate the agent at specified intervals
            if step % self.config.training.eval_frequency == 0:
                self.eval(step)

            # Select an action (random during warmup, policy-based afterward)
            if step < self.warmup_steps:
                action = self.env.action_space.sample()
                action = totorch(np.clip(action, -1.0, 1.0), device=self.device)
                act_dict = {"action": action}
            else:
                act_dict = self.agent.select_action(state)
                action = act_dict["action"].clip(-1.0, 1.0)

            # Take a step in the environment
            next_state, reward, terminated, truncated, info = self.env.step(
                int(action) if self._discrete_action_space else tonumpy(action)
            )
            next_state = totorch(next_state, device=self.device)

            transition_kwargs = {
                **act_dict,
                "state": state,
                "next_state": next_state,
                "reward": reward,
                "terminated": terminated,
                "truncated": truncated,
                "step": step + 1,
            }
            transition = self.agent.generate_transition(**transition_kwargs)

            # Store the transition in replay buffer
            self.agent.store_transition(transition)

            # Log per-step reward
            information_dict["step_rewards"][self.n_total_steps + step] = (
                episode,
                step,
                reward,
            )

            state = next_state  # Update state
            r_cum += reward  # Update cumulative reward

            # Perform learning updates at specified intervals
            if (
                step >= self.warmup_steps
                and (step % self.config.training.learn_frequency) == 0
            ):
                self.agent.learn(
                    max_iter=self.config.training.max_iter,
                    n_epochs=self.config.training.n_epochs,
                )

            # Episode termination
            if terminated or truncated:
                information_dict["episode_rewards"][episode] = r_cum.item()
                information_dict["episode_steps"][episode] = step

                # Save episode summary
                self.agent.logger.episode_summary(episode, step, information_dict)

                # Reset the environment for the next episode
                state, _ = self.env.reset()
                state = totorch(state, device=self.device)
                r_cum = np.zeros(1)
                episode += 1
                e_step = 0

            # Save model and logs at specified intervals
            if step % self.config.logging.save_frequency == 0:
                self.agent.logger.save(information_dict, episode, step)
                self.agent.save()

        # Final evaluation after training
        self.eval(step)
        time_end = time.time()
        self.agent.save()
        self.agent.logger.save(information_dict, episode, step)
        self.agent.logger.log(f"Training time: {time_end - time_start:.2f} seconds")

    # ruff: noqa: C901
    @torch.inference_mode()

ControlExperiment defines a standard training loop for training reinforcement learning agents. It supports warmup phases, periodic evaluations, and conditional resets based on training progress. It also handles interaction with the environment and manages learning updates and model saving.

Evaluation#

Evaluation procedure in ControlExperiment.#

        """
        Evaluates the agent over multiple episodes in the evaluation environment.

        Args:
            n_step (int): The current training step at which evaluation is performed.
        Returns:
            None
        """
        self.agent.eval()  # Set agent to evaluation mode

        # Save RNG states
        torch_rng_state = torch.get_rng_state()
        if torch.cuda.is_available():
            cuda_rng_state = torch.cuda.get_rng_state_all()

        # Set deterministic seed for eval
        eval_seed = self.config.system.seed + 12345
        torch.manual_seed(eval_seed)
        if torch.cuda.is_available():
            torch.cuda.manual_seed(eval_seed)
            torch.cuda.manual_seed_all(eval_seed)

        # Store rewards for evaluation episodes
        results = torch.zeros(self.config.training.eval_episodes)

        if self._vectorized_eval:
            states, _ = self.eval_env.reset()
            states = totorch(states, device=self.device)

            dones = torch.zeros(self.config.training.eval_episodes, dtype=torch.bool)
            while not torch.all(dones):
                actions = self.agent.select_action(states, is_training=False)["action"]

                if self._discrete_action_space:
                    actions = actions.int()

                next_states, rewards, term, trunc, _ = self.eval_env.step(
                    tonumpy(actions)
                )

                done = torch.tensor(term) | torch.tensor(trunc)
                results += torch.tensor(rewards) * (
                    ~done
                )  # only add reward to running environments
                dones |= done

                states = totorch(next_states, device=self.device)

        else:
            # Run multiple evaluation episodes
            for episode in range(self.config.training.eval_episodes):
                state, info = self.eval_env.reset()
                state = totorch(state, device=self.device)

                done = False

                while not done:
                    # Select action using the agent's policy (without exploration)
                    action = self.agent.select_action(state, is_training=False)[
                        "action"
                    ]

                    # Execute action in the environment
                    next_state, reward, term, trunc, info = self.eval_env.step(
                        int(action) if self._discrete_action_space else tonumpy(action)
                    )

                    # Check termination condition
                    done = term or trunc
                    # Update state and record reward
                    state = totorch(next_state, device=self.device)
                    results[episode] += reward

                # If using Sparse MetaWorld env, adjust reward to reflect success
                # mean_reward in save_eval_results will be equal to success rate
                if "success" in info and self.config.env.sparse_rewards:
                    results[episode] = reward + 1.0

        self.agent.logger.save_eval_results(n_step, results)
        if self.verbose:
            tqdm.write(f"{n_step}: {results.mean():.4f} +/- {results.std():.4f}")

During evaluation, the agent runs multiple episodes in a separate evaluation environment. It uses its learned policy (without exploration noise) to gather performance statistics, which are logged and saved for later analysis. Evaluation runs do not affect the training process.

Experiment Management#

Configuration and Integration#

Experiments are initialized using a MainConfig object, typically loaded from a YAML file or data classes. This configuration:

Specifies environment and training parameters
Defines the model architecture
Controls system-level behavior like device usage
Sets logging and evaluation frequencies

The experiment automatically uses this configuration to construct all required components, making the system fully reproducible and easily tunable.

Usage Example#

Here’s a minimal example of how to instantiate and run an experiment:

from objectrl.config.config import MainConfig
from objectrl.experiments.control_experiment import ControlExperiment

config = MainConfig.from_config(config_dict, model_name="my_model")
experiment = ControlExperiment(config)
experiment.train()

# Optional: run evaluation after training
experiment.eval(config.training.max_steps)

Extending Experiments#

To create a new type of experiment (e.g., curriculum learning, adversarial training), subclass the Experiment base class:

class MyCustomExperiment(Experiment):
    def train(self):
        # Custom training logic
        pass

    def test(self):
        # Optional test logic
        pass

Attention

When extending Experiment, you must implement the train method. Optionally, you can override test`, ``eval, or run for more customized workflows.

This design allows for experimentation with novel agent–environment interactions while reusing shared logic for environment and agent setup.

Experiments

Contents

Experiments#

Overview#

Module Structure#

Base Experiment#

Control Experiment#

Training#

Evaluation#

Experiment Management#

Configuration and Integration#

Usage Example#

Extending Experiments#