Config#

Overview#

The configuration system is organized into multiple Python files across the config/ folder and its submodules. Each file serves a distinct role in defining the behavior and structure of your agent. Here’s how the configuration is structured:

  • config.py (in config/): Contains the core configuration dataclasses that define the experiment setup. These include:

    • NoiseConfig – optional Gaussian noise added to observations or actions

    • EnvConfig – parameters specific to the RL environment

    • TrainingConfig – training-related hyperparameters

    • SystemConfig – runtime and system-level settings like device and seeds

    • LoggingConfig – logging, checkpointing, and result-saving options

    • MainConfig – top-level config that combines all above configs and supports loading from external YAML or command-line overrides

    • HarvestConfig – evaluation, visualization, and aggregation of multiple runs

  • model.py (in config/): Defines model-specific components that control the neural network architectures and behavior of the agent. These include:

    • ActorConfig – configuration of the actor (policy) network

    • CriticConfig – configuration of the critic (value function) network

    • ModelConfig – wrapper for selecting and loading the appropriate model-specific configurations

  • utils.py (in config/): Provides advanced utilities for dynamic config composition and serialization. You normally do not need to modify this file unless developing core library extensions.

    • Enhanced serialization: enhanced_asdict and nested_asdict extend dataclasses.asdict to support dynamically added attributes and nested structures.

    • Deep merging: NestedDict enables recursive merging of nested configuration dictionaries using the | operator.

    • Dynamic type conversion: parse_value converts string inputs into appropriate Python types for CLI parsing.

    • Configuration setup: setup_config merges YAML, CLI, and Tyro-generated configs into a unified MainConfig object.

    • CLI argument filtering: filter_model_args separates model-specific arguments from general CLI input.

    • Dynamic dataclass creation: dict_to_dataclass builds dataclass types from nested dictionaries at runtime.

    • Improved introspection: enhanced_repr and create_field_dict support detailed string representations and introspection of dataclasses.

    • Diff tracking: diff_dict highlights configuration differences for debugging or logging.

    • Tyro integration: print_tyro_help formats help messages using Tyro’s CLI interface.

  • model_configs/ folder: Holds model-specific overrides for ActorConfig and CriticConfig, making it easy to define distinct variants like TD3, SAC, etc. These are referenced in the from_config()` methods of the model classes, and their details are shared in model specific pages of the documentation.

    Tip

    For detailed documentation on each model-specific configuration (e.g., SAC, TD3), see the Models section of the documentation and select the model.

  • model_yamls/ folder: Contains YAML files that define the configurations for specific models. These files can be loaded dynamically to override default settings, allowing for flexible experimentation without modifying the codebase.

This structure allows for:

  • Clear separation of concerns

  • Easy composition of experiments

  • Support for dynamic loading from YAML or command-line overrides

  • Easy extensibility to add new models or new configuration sections

What follows is a detailed breakdown of each configuration class.

config.py File#

Noise Configuration#

@dataclass
class NoiseConfig:
    """
    Configuration for injecting Gaussian noise into actions and observations of the agent.
    Attributes:
        noisy_act (float): Standard deviation of noise added to actions. Default is 0.0 (no noise).
        noisy_obs (float): Standard deviation of noise added to observations. Default is 0.0 (no noise).
    """

    noisy_act: float = 0.0
    noisy_obs: float = 0.0


Noise configuration allows injecting Gaussian noise into either the agent's observations or actions. This is particularly useful for testing robustness, simulating real-world sensor or actuation errors, or inducing regularization effects during training.

Environment Configuration#

@dataclass
class EnvConfig:
    """
    Configuration for setting up the reinforcement learning environment.
    Attributes:
        name (str): Environment name from a predefined set.
        noisy (NoiseConfig | None): Optional noise configuration.
        position_delay (float | None): Optional delay in position updates.
        control_cost_weight (float | ): Optional weight for control cost penalty.
        sparse_rewards (bool): Whether to use sparse rewards.
    """

    name: (
        Literal[
            "ant",
            "cartpole",
            "cheetah",
            "hopper",
            "humanoid",
            "reacher",
            "swimmer",
            "walker2d",
            "dmc-quadruped-run",
            "dmc-humanoid-run",
            "dmc-cheetah-run",
            "dmc-hopper-hop",
            "dmc-walker-run",
            "metaworld-window-close",
            "metaworld-window-open",
            "metaworld-drawer-close",
            "metaworld-drawer-open",
            "metaworld-reach",
            "metaworld-button-press-topdown",
            "metaworld-door-open",
        ]
        | str
    ) = "cheetah"
    noisy: NoiseConfig | None = None
    position_delay: float | None = None
    control_cost_weight: float | None = None
    sparse_rewards: bool = False


Environment configuration defines the environment-related settings used during training and evaluation. The name field specifies the environment (e.g., "cheetah", "hopper") while you can also use the original environment name (e.g., "HalfCheetah-v5", "Hopper-v5"), while optional fields like noisy, position_delay, and control_cost_weight allow fine-grained manipulation of the environment dynamics. This flexibility is useful for simulating real-world uncertainties or testing the agent's robustness under perturbed conditions.

Training Configuration#

@dataclass
class TrainingConfig:
    """
    Configuration for training hyperparameters.
    Attributes include learning rate, batch size, discount factor, buffer size,
    evaluation frequency, and more.
    """

    learning_rate: float = 3e-4
    batch_size: int = 256
    gamma: float = 0.99
    max_steps: int = 1_000_000
    warmup_steps: int = 10_000
    buffer_size: int = 1_000_000

    ### Training frequency settings
    reset_frequency: int = 0
    learn_frequency: int = 1
    max_iter: int = 1
    n_epochs: int = 0

    ### Evaluation settings
    eval_episodes: int = 10
    eval_frequency: int = 20_000
    # Run evaluations in parallel or sequentially
    parallelize_eval: bool = False

    optimizer: str = "Adam"


Training configuration controls the learning process of the agent. It includes essential hyperparameters such as learning rate, batch size, and number of training steps. These parameters directly influence the stability, speed, and performance of the algorithm. With this configuration, you can easily tune experiments to meet different research goals or computational constraints. parallelize_eval initializes eval_episodes number of environments to evaluate in parallel instead of sequentially. If memory is not a bottleneck, this flag provides strong speed improvements as the number of evaluation episodes grows.

System Configuration#

@dataclass
class SystemConfig:
    """
    Configuration for system-level execution.

    Attributes:
        num_threads (int): Number of threads (-1 for auto).
        seed (int): Random seed.
        random_seed (int): Let the config sample a random seed
        device (str): Runtime device ("cpu" or "cuda").
        storing_device ("cpu" or "cuda'): Device used for storing models/data. Store on the CPU if memory is a constraint
            otherwise prefer the gpu
    """

    num_threads: int = -1
    seed: int = 1
    # Initialize with a random seed
    random_seed: bool = False
    device: Literal["cpu", "cuda"] = "cuda"
    storing_device: Literal["cpu", "cuda"] = "cuda"

    def __post_init__(self):
        if self.random_seed:
            self.seed = numpy.random.randint(2**32)


System configuration manages low-level runtime behavior and hardware settings. This includes control over the number of threads, random seed for reproducibility, and device selection (e.g., CPU or CUDA). These settings are especially useful for debugging, benchmarking, or deploying agents across heterogeneous hardware setups.

Logging Configuration#

@dataclass
class LoggingConfig:
    """
    Configuration for logging experiment outputs.

    Attributes:
        result_path (str): Path to save experiment results.
        save_frequency (int): Save logs every N steps.
        save_params (bool): Whether to save model parameters at the end.
    """

    result_path: str = "../_logs"
    save_frequency: int = 20_000
    save_params: bool = False

    def __post_init__(self):
        """Convert string paths to Path objects."""
        self.result_path = Path(self.result_path)


Logging configuration handles all aspects of experiment output and result storage. You can specify the path for saving logs, frequency of saving, and whether to persist the final model parameters. These options support systematic experiment tracking and simplify post-hoc analysis and reproducibility.

Main Configuration#

@dataclass
class MainConfig:
    """
    Main configuration combining environment, training, system, logging, and model settings.
    This class allows for central management and construction of experiment
    configurations and supports loading from dictionary or YAML files.
    """

    # Provide additional output
    verbose: bool = False
    # Show a progress bar
    progress: bool = False
    # An optional config path
    config: Path | None = None

    # Environmental config
    env: EnvConfig = field(default_factory=EnvConfig)
    # Training related configuration
    training: TrainingConfig = field(default_factory=TrainingConfig)
    # Model related configuration. These cannot be changed via the CLI
    model: tyro.conf.Suppress[ModelConfig] = field(default_factory=ModelConfig)
    # model: ModelConfig = field(default_factory=ModelConfig)

    system: SystemConfig = field(default_factory=SystemConfig)
    logging: LoggingConfig = field(default_factory=LoggingConfig)

    @classmethod
    def from_config(cls, config: dict[str, Any]) -> "MainConfig":
        """
        Override configuration values from a YAML file.

        Args:
            config (dict[str, Any]): Dictionary with optional keys:
                - 'env'
                - 'training'
                - 'system'
                - 'logging'
                - 'model' (required)
        Returns:
            MainConfig: A fully initialized configuration object.
        """

        config = copy.deepcopy(config)

        env_conf = config.pop("env", {})
        training_conf = config.pop("training", {})
        system_conf = config.pop("system", {})
        logging_conf = config.pop("logging", {})

        env = EnvConfig(**env_conf) if env_conf else EnvConfig()
        training = (
            TrainingConfig(**training_conf) if training_conf else TrainingConfig()
        )
        system = SystemConfig(**system_conf) if system_conf else SystemConfig()
        logging = LoggingConfig(**logging_conf) if logging_conf else LoggingConfig()

        model_conf = config.pop("model", {})
        assert model_conf, "Need to specify a model"
        model_name = model_conf["name"]
        assert model_name in model_configs.keys(), f"{model_name} is not available"

        if "actor" in model_configs[model_name].__annotations__:
            actor_conf = model_conf.pop("actor", {})
            actor = ActorConfig.from_config(actor_conf, model_name)
            model_conf["actor"] = actor
        if "critic" in model_configs[model_name].__annotations__:
            critic_conf = model_conf.pop("critic", {})
            critic = CriticConfig.from_config(critic_conf, model_name)
            model_conf["critic"] = critic

        model = model_configs[model_name](**model_conf)

        return cls(
            env=env,
            training=training,
            system=system,
            logging=logging,
            model=model,
            **config,
        )


Main configuration serves as the central entry point, composing sub-configurations such as EnvConfig, TrainingConfig, SystemConfig, LoggingConfig, and ModelConfig. It supports overriding values from external YAML files via from_config. This design ensures clarity, reproducibility, and ease of experiment management.

Harvest Configuration#

@dataclass
class HarvestConfig:
    """
    Configuration for evaluation and visualization of experiments.

    Attributes:
        verbose (bool): Whether to provide verbose output.
        logs_path (str): Path to log files.
        result_path (str): Path to save results.
        env_names (list[str]): List of environment names to evaluate on.
        model_names (list[str]): List of model names to evaluate.
        seeds (list[int]): Random seeds to evaluate across.
        smoothing_window (int): Window size for reward smoothing.
        height (int): Plot height.
        width (int): Plot width.
        dpi (int): Plot DPI.
        y_axis (str): Label for the y-axis in plots.
    """

    # Provide additional output
    verbose: bool = True
    # Path to logs
    logs_path: str = "../_logs"
    # path to save results
    result_path: str = "../_results"
    # envs
    env_names: list[
        Literal[
            "ant",
            "cartpole",
            "cheetah",
            "hopper",
            "humanoid",
            "reacher",
            "swimmer",
            "walker2d",
            "dmc-quadruped-run",
            "dmc-humanoid-run",
            "dmc-cheetah-run",
            "dmc-hopper-hop",
            "dmc-walker-run",
            "metaworld-window-close",
            "metaworld-window-open",
            "metaworld-drawer-close",
            "metaworld-drawer-open",
            "metaworld-reach",
            "metaworld-button-press-topdown",
            "metaworld-door-open",
        ]
    ] = field(default_factory=lambda: ["cheetah"])

    # models
    models = Literal[tuple(model_configs.keys())]
    model_names: list[models] = field(default_factory=lambda: ["ddpg"])
    del models

    # seeds
    seeds: list[int] = field(default_factory=lambda: list(range(1, 11)))

    # smoothing window >= 1 if 1 then no smoothing
    # smoothing_window should be odd
    smoothing_window: int = 1

    # plotting
    height: int = 5
    width: int = 10
    dpi: int = 200
    # label for the y axis, e.g., "Return" or "Success Rate"
    y_axis: str = "Return"

    def __post_init__(self):
        """Convert log and result paths to Path objects."""

        self.logs_path = Path(self.logs_path)
        self.result_path = Path(self.result_path)


Harvest configuration defines evaluation and result aggregation parameters. It supports running evaluations across multiple models and seeds and controls how results are saved and visualized.

model.py File#

Actor Configuration#

@enhanced_repr
@dataclass
class ActorConfig:
    """
    Configuration for the actor network in RL models.

    Attributes:
        depth (int): Number of hidden layers.
        width (int): Number of units per hidden layer.
        norm (bool): Enable normalization layers.
        activation (str): Activation function ('relu' or 'crelu'). User should add other activation functions if needed.
        has_target (bool): Whether to use a target actor network.
        n_actors (int): Number of parallel actor networks.
        reset (bool): Whether to reset the actor.
        n_heads (int): Number of heads in a multi-head actor.
    """

    depth: int = 3
    width: int = 256
    # Disable normalization in the actor network
    norm: bool = False
    # Activation function for the actor network
    activation: Literal["relu", "crelu"] = "relu"
    has_target: bool = False
    n_actors: int = 1
    reset: bool = False
    n_heads: int = 1
    max_grad_norm: float = 0.0  # Optional maximum gradient norm for clipping

    @classmethod
    def from_config(cls, config: dict, model_name: str) -> "ActorConfig":
        """
        Create an ActorConfig from a custom config dict and default model values.

        Args:
            config (dict): Custom configuration parameters.
            model_name (str): The name of the model whose defaults to load.
        Returns:
            ActorConfig: Initialized with both default and overridden settings.
        """
        config = create_field_dict(actor_configs[model_name]) | config

        known_names = {field.name for field in fields(cls)}
        known_attr = {k: v for k, v in config.items() if k in known_names}
        extra_attr = {k: v for k, v in config.items() if k not in known_names}

        instance = cls(**known_attr)

        for k, v in extra_attr.items():
            setattr(instance, k, v)
        return instance

    def to_dict(self) -> dict:
        """
        Convert the ActorConfig to a dictionary.

        Args:
            None
        Returns:
            dict: Dictionary representation of the config.
        """
        return asdict(self)


Actor configuration defines the architecture and behavior of the policy network. This level of control is essential for ablation studies and investigating architectural impacts on learning dynamics.

Critic Configuration#

@enhanced_repr
@dataclass
class CriticConfig:
    """
    Configuration for the critic network in RL models.

    Attributes:
        depth (int): Number of hidden layers.
        width (int): Number of units per hidden layer.
        norm (bool): Normalization layers.
        activation (str): Activation function ('relu' or 'crelu'). User should add other activation functions if needed.
        n_members (int): Number of critic networks.
        reduce (str): Method for reducing outputs of an ensemble of critics.
        target_reduce (str): Method for reducing outputs of an ensemble of target critics.
        has_target (bool): Whether to use a target critic network.
        reset (bool): Whether to reset the critic.
    """

    depth: int = 3
    width: int = 256
    norm: bool = False
    activation: Literal["relu", "crelu"] = "relu"
    n_members: int = 2
    reduce: str = "min"
    target_reduce: str = "min"
    has_target: bool = True
    reset: bool = False
    max_grad_norm: float = 0.0  # Optional maximum gradient norm for clipping

    @classmethod
    def from_config(cls, config: dict, model_name: str) -> "CriticConfig":
        """
        Construct a CriticConfig from a user-defined config and base model name.

        Args:
            config (dict): Configuration overrides.
            model_name (str): Name of the model to fetch default critic settings from.

        Returns:
            CriticConfig: Populated configuration object.
        """
        config = create_field_dict(critic_configs[model_name]) | config

        known_names = {field.name for field in fields(cls)}
        known_attr = {k: v for k, v in config.items() if k in known_names}
        extra_attr = {k: v for k, v in config.items() if k not in known_names}

        instance = cls(**known_attr)

        for k, v in extra_attr.items():
            setattr(instance, k, v)

        return instance

    def to_dict(self):
        """
        Convert this CriticConfig to a dictionary.
        """
        return asdict(self)


Critic configuration defines the structure of the value function estimator(s). This helps in adapting the value estimation to different algorithms (e.g., TD3 vs. SAC).

Model Configuration#

@dataclass
class ModelConfig:
    """
    Base configuration for a model.

    Attributes:
        name (str): Identifier for the model type.
    """

    name: str = "abstract"

    @classmethod
    def from_config(cls, config: dict, model_name: str) -> "ModelConfig":
        """
        Construct a ModelConfig from a user config and model definition.

        Args:
            config (dict): Configuration overrides.
            model_name (str): Model name to use as the default template.
        Returns:
            ModelConfig: A fully populated configuration object.
        """
        config = model_configs[model_name] | config

        known_names = {field.name for field in fields(cls)}
        known_attr = {k: v for k, v in config.items() if k in known_names}
        extra_attr = {k: v for k, v in config.items() if k not in known_names}

        instance = cls(**known_attr)

        for k, v in extra_attr.items():
            setattr(instance, k, v)

        return instance

    def to_dict(self) -> dict:
        """
        Convert this ModelConfig to a dictionary.

        Args:
            None
        Returns:
            dict: Dictionary representation of the config.
        """
        return asdict(self)


Model configuration acts as a lightweight entry class that dynamically delegates to the specific actor and critic configurations based on the selected model name. This abstraction allows for easy extension when introducing new algorithmic variants while preserving a consistent interface for configuration loading.