Config#
Overview#
The configuration system is organized into multiple Python files across the config/ folder and its submodules. Each file serves a distinct role in defining the behavior and structure of your agent. Here’s how the configuration is structured:
config.py (in
config/): Contains the core configuration dataclasses that define the experiment setup. These include:NoiseConfig – optional Gaussian noise added to observations or actions
EnvConfig – parameters specific to the RL environment
TrainingConfig – training-related hyperparameters
SystemConfig – runtime and system-level settings like device and seeds
LoggingConfig – logging, checkpointing, and result-saving options
MainConfig – top-level config that combines all above configs and supports loading from external YAML or command-line overrides
HarvestConfig – evaluation, visualization, and aggregation of multiple runs
model.py (in
config/): Defines model-specific components that control the neural network architectures and behavior of the agent. These include:ActorConfig – configuration of the actor (policy) network
CriticConfig – configuration of the critic (value function) network
ModelConfig – wrapper for selecting and loading the appropriate model-specific configurations
utils.py (in
config/): Provides advanced utilities for dynamic config composition and serialization. You normally do not need to modify this file unless developing core library extensions.Enhanced serialization:
enhanced_asdictandnested_asdictextenddataclasses.asdictto support dynamically added attributes and nested structures.Deep merging:
NestedDictenables recursive merging of nested configuration dictionaries using the|operator.Dynamic type conversion:
parse_valueconverts string inputs into appropriate Python types for CLI parsing.Configuration setup:
setup_configmerges YAML, CLI, and Tyro-generated configs into a unifiedMainConfigobject.CLI argument filtering:
filter_model_argsseparates model-specific arguments from general CLI input.Dynamic dataclass creation:
dict_to_dataclassbuilds dataclass types from nested dictionaries at runtime.Improved introspection:
enhanced_reprandcreate_field_dictsupport detailed string representations and introspection of dataclasses.Diff tracking:
diff_dicthighlights configuration differences for debugging or logging.Tyro integration:
print_tyro_helpformats help messages using Tyro’s CLI interface.
model_configs/ folder: Holds model-specific overrides for ActorConfig and CriticConfig, making it easy to define distinct variants like TD3, SAC, etc. These are referenced in the
from_config()`methods of the model classes, and their details are shared in model specific pages of the documentation.Tip
For detailed documentation on each model-specific configuration (e.g., SAC, TD3), see the Models section of the documentation and select the model.
model_yamls/ folder: Contains YAML files that define the configurations for specific models. These files can be loaded dynamically to override default settings, allowing for flexible experimentation without modifying the codebase.
This structure allows for:
Clear separation of concerns
Easy composition of experiments
Support for dynamic loading from YAML or command-line overrides
Easy extensibility to add new models or new configuration sections
What follows is a detailed breakdown of each configuration class.
config.py File#
Noise Configuration#
@dataclass
class NoiseConfig:
"""
Configuration for injecting Gaussian noise into actions and observations of the agent.
Attributes:
noisy_act (float): Standard deviation of noise added to actions. Default is 0.0 (no noise).
noisy_obs (float): Standard deviation of noise added to observations. Default is 0.0 (no noise).
"""
noisy_act: float = 0.0
noisy_obs: float = 0.0
Noise configuration allows injecting Gaussian noise into either the agent's observations or actions. This is particularly useful for testing robustness, simulating real-world sensor or actuation errors, or inducing regularization effects during training.
Environment Configuration#
@dataclass
class EnvConfig:
"""
Configuration for setting up the reinforcement learning environment.
Attributes:
name (str): Environment name from a predefined set.
noisy (NoiseConfig | None): Optional noise configuration.
position_delay (float | None): Optional delay in position updates.
control_cost_weight (float | ): Optional weight for control cost penalty.
sparse_rewards (bool): Whether to use sparse rewards.
"""
name: (
Literal[
"ant",
"cartpole",
"cheetah",
"hopper",
"humanoid",
"reacher",
"swimmer",
"walker2d",
"dmc-quadruped-run",
"dmc-humanoid-run",
"dmc-cheetah-run",
"dmc-hopper-hop",
"dmc-walker-run",
"metaworld-window-close",
"metaworld-window-open",
"metaworld-drawer-close",
"metaworld-drawer-open",
"metaworld-reach",
"metaworld-button-press-topdown",
"metaworld-door-open",
]
| str
) = "cheetah"
noisy: NoiseConfig | None = None
position_delay: float | None = None
control_cost_weight: float | None = None
sparse_rewards: bool = False
Environment configuration defines the environment-related settings used during training and evaluation. The name field specifies the environment (e.g., "cheetah", "hopper") while you can also use the original environment name (e.g., "HalfCheetah-v5", "Hopper-v5"), while optional fields like noisy, position_delay, and control_cost_weight allow fine-grained manipulation of the environment dynamics. This flexibility is useful for simulating real-world uncertainties or testing the agent's robustness under perturbed conditions.
Training Configuration#
@dataclass
class TrainingConfig:
"""
Configuration for training hyperparameters.
Attributes include learning rate, batch size, discount factor, buffer size,
evaluation frequency, and more.
"""
learning_rate: float = 3e-4
batch_size: int = 256
gamma: float = 0.99
max_steps: int = 1_000_000
warmup_steps: int = 10_000
buffer_size: int = 1_000_000
### Training frequency settings
reset_frequency: int = 0
learn_frequency: int = 1
max_iter: int = 1
n_epochs: int = 0
### Evaluation settings
eval_episodes: int = 10
eval_frequency: int = 20_000
# Run evaluations in parallel or sequentially
parallelize_eval: bool = False
optimizer: str = "Adam"
Training configuration controls the learning process of the agent. It includes essential hyperparameters such as learning rate, batch size, and number of training steps. These parameters directly influence the stability, speed, and performance of the algorithm. With this configuration, you can easily tune experiments to meet different research goals or computational constraints. parallelize_eval initializes eval_episodes number of environments to evaluate in parallel instead of sequentially. If memory is not a bottleneck, this flag provides strong speed improvements as the number of evaluation episodes grows.
System Configuration#
@dataclass
class SystemConfig:
"""
Configuration for system-level execution.
Attributes:
num_threads (int): Number of threads (-1 for auto).
seed (int): Random seed.
random_seed (int): Let the config sample a random seed
device (str): Runtime device ("cpu" or "cuda").
storing_device ("cpu" or "cuda'): Device used for storing models/data. Store on the CPU if memory is a constraint
otherwise prefer the gpu
"""
num_threads: int = -1
seed: int = 1
# Initialize with a random seed
random_seed: bool = False
device: Literal["cpu", "cuda"] = "cuda"
storing_device: Literal["cpu", "cuda"] = "cuda"
def __post_init__(self):
if self.random_seed:
self.seed = numpy.random.randint(2**32)
System configuration manages low-level runtime behavior and hardware settings. This includes control over the number of threads, random seed for reproducibility, and device selection (e.g., CPU or CUDA). These settings are especially useful for debugging, benchmarking, or deploying agents across heterogeneous hardware setups.
Logging Configuration#
@dataclass
class LoggingConfig:
"""
Configuration for logging experiment outputs.
Attributes:
result_path (str): Path to save experiment results.
save_frequency (int): Save logs every N steps.
save_params (bool): Whether to save model parameters at the end.
"""
result_path: str = "../_logs"
save_frequency: int = 20_000
save_params: bool = False
def __post_init__(self):
"""Convert string paths to Path objects."""
self.result_path = Path(self.result_path)
Logging configuration handles all aspects of experiment output and result storage. You can specify the path for saving logs, frequency of saving, and whether to persist the final model parameters. These options support systematic experiment tracking and simplify post-hoc analysis and reproducibility.
Main Configuration#
@dataclass
class MainConfig:
"""
Main configuration combining environment, training, system, logging, and model settings.
This class allows for central management and construction of experiment
configurations and supports loading from dictionary or YAML files.
"""
# Provide additional output
verbose: bool = False
# Show a progress bar
progress: bool = False
# An optional config path
config: Path | None = None
# Environmental config
env: EnvConfig = field(default_factory=EnvConfig)
# Training related configuration
training: TrainingConfig = field(default_factory=TrainingConfig)
# Model related configuration. These cannot be changed via the CLI
model: tyro.conf.Suppress[ModelConfig] = field(default_factory=ModelConfig)
# model: ModelConfig = field(default_factory=ModelConfig)
system: SystemConfig = field(default_factory=SystemConfig)
logging: LoggingConfig = field(default_factory=LoggingConfig)
@classmethod
def from_config(cls, config: dict[str, Any]) -> "MainConfig":
"""
Override configuration values from a YAML file.
Args:
config (dict[str, Any]): Dictionary with optional keys:
- 'env'
- 'training'
- 'system'
- 'logging'
- 'model' (required)
Returns:
MainConfig: A fully initialized configuration object.
"""
config = copy.deepcopy(config)
env_conf = config.pop("env", {})
training_conf = config.pop("training", {})
system_conf = config.pop("system", {})
logging_conf = config.pop("logging", {})
env = EnvConfig(**env_conf) if env_conf else EnvConfig()
training = (
TrainingConfig(**training_conf) if training_conf else TrainingConfig()
)
system = SystemConfig(**system_conf) if system_conf else SystemConfig()
logging = LoggingConfig(**logging_conf) if logging_conf else LoggingConfig()
model_conf = config.pop("model", {})
assert model_conf, "Need to specify a model"
model_name = model_conf["name"]
assert model_name in model_configs.keys(), f"{model_name} is not available"
if "actor" in model_configs[model_name].__annotations__:
actor_conf = model_conf.pop("actor", {})
actor = ActorConfig.from_config(actor_conf, model_name)
model_conf["actor"] = actor
if "critic" in model_configs[model_name].__annotations__:
critic_conf = model_conf.pop("critic", {})
critic = CriticConfig.from_config(critic_conf, model_name)
model_conf["critic"] = critic
model = model_configs[model_name](**model_conf)
return cls(
env=env,
training=training,
system=system,
logging=logging,
model=model,
**config,
)
Main configuration serves as the central entry point, composing sub-configurations such as EnvConfig, TrainingConfig, SystemConfig, LoggingConfig, and ModelConfig. It supports overriding values from external YAML files via from_config. This design ensures clarity, reproducibility, and ease of experiment management.
Harvest Configuration#
@dataclass
class HarvestConfig:
"""
Configuration for evaluation and visualization of experiments.
Attributes:
verbose (bool): Whether to provide verbose output.
logs_path (str): Path to log files.
result_path (str): Path to save results.
env_names (list[str]): List of environment names to evaluate on.
model_names (list[str]): List of model names to evaluate.
seeds (list[int]): Random seeds to evaluate across.
smoothing_window (int): Window size for reward smoothing.
height (int): Plot height.
width (int): Plot width.
dpi (int): Plot DPI.
y_axis (str): Label for the y-axis in plots.
"""
# Provide additional output
verbose: bool = True
# Path to logs
logs_path: str = "../_logs"
# path to save results
result_path: str = "../_results"
# envs
env_names: list[
Literal[
"ant",
"cartpole",
"cheetah",
"hopper",
"humanoid",
"reacher",
"swimmer",
"walker2d",
"dmc-quadruped-run",
"dmc-humanoid-run",
"dmc-cheetah-run",
"dmc-hopper-hop",
"dmc-walker-run",
"metaworld-window-close",
"metaworld-window-open",
"metaworld-drawer-close",
"metaworld-drawer-open",
"metaworld-reach",
"metaworld-button-press-topdown",
"metaworld-door-open",
]
] = field(default_factory=lambda: ["cheetah"])
# models
models = Literal[tuple(model_configs.keys())]
model_names: list[models] = field(default_factory=lambda: ["ddpg"])
del models
# seeds
seeds: list[int] = field(default_factory=lambda: list(range(1, 11)))
# smoothing window >= 1 if 1 then no smoothing
# smoothing_window should be odd
smoothing_window: int = 1
# plotting
height: int = 5
width: int = 10
dpi: int = 200
# label for the y axis, e.g., "Return" or "Success Rate"
y_axis: str = "Return"
def __post_init__(self):
"""Convert log and result paths to Path objects."""
self.logs_path = Path(self.logs_path)
self.result_path = Path(self.result_path)
Harvest configuration defines evaluation and result aggregation parameters. It supports running evaluations across multiple models and seeds and controls how results are saved and visualized.
model.py File#
Actor Configuration#
@enhanced_repr
@dataclass
class ActorConfig:
"""
Configuration for the actor network in RL models.
Attributes:
depth (int): Number of hidden layers.
width (int): Number of units per hidden layer.
norm (bool): Enable normalization layers.
activation (str): Activation function ('relu' or 'crelu'). User should add other activation functions if needed.
has_target (bool): Whether to use a target actor network.
n_actors (int): Number of parallel actor networks.
reset (bool): Whether to reset the actor.
n_heads (int): Number of heads in a multi-head actor.
"""
depth: int = 3
width: int = 256
# Disable normalization in the actor network
norm: bool = False
# Activation function for the actor network
activation: Literal["relu", "crelu"] = "relu"
has_target: bool = False
n_actors: int = 1
reset: bool = False
n_heads: int = 1
max_grad_norm: float = 0.0 # Optional maximum gradient norm for clipping
@classmethod
def from_config(cls, config: dict, model_name: str) -> "ActorConfig":
"""
Create an ActorConfig from a custom config dict and default model values.
Args:
config (dict): Custom configuration parameters.
model_name (str): The name of the model whose defaults to load.
Returns:
ActorConfig: Initialized with both default and overridden settings.
"""
config = create_field_dict(actor_configs[model_name]) | config
known_names = {field.name for field in fields(cls)}
known_attr = {k: v for k, v in config.items() if k in known_names}
extra_attr = {k: v for k, v in config.items() if k not in known_names}
instance = cls(**known_attr)
for k, v in extra_attr.items():
setattr(instance, k, v)
return instance
def to_dict(self) -> dict:
"""
Convert the ActorConfig to a dictionary.
Args:
None
Returns:
dict: Dictionary representation of the config.
"""
return asdict(self)
Actor configuration defines the architecture and behavior of the policy network. This level of control is essential for ablation studies and investigating architectural impacts on learning dynamics.
Critic Configuration#
@enhanced_repr
@dataclass
class CriticConfig:
"""
Configuration for the critic network in RL models.
Attributes:
depth (int): Number of hidden layers.
width (int): Number of units per hidden layer.
norm (bool): Normalization layers.
activation (str): Activation function ('relu' or 'crelu'). User should add other activation functions if needed.
n_members (int): Number of critic networks.
reduce (str): Method for reducing outputs of an ensemble of critics.
target_reduce (str): Method for reducing outputs of an ensemble of target critics.
has_target (bool): Whether to use a target critic network.
reset (bool): Whether to reset the critic.
"""
depth: int = 3
width: int = 256
norm: bool = False
activation: Literal["relu", "crelu"] = "relu"
n_members: int = 2
reduce: str = "min"
target_reduce: str = "min"
has_target: bool = True
reset: bool = False
max_grad_norm: float = 0.0 # Optional maximum gradient norm for clipping
@classmethod
def from_config(cls, config: dict, model_name: str) -> "CriticConfig":
"""
Construct a CriticConfig from a user-defined config and base model name.
Args:
config (dict): Configuration overrides.
model_name (str): Name of the model to fetch default critic settings from.
Returns:
CriticConfig: Populated configuration object.
"""
config = create_field_dict(critic_configs[model_name]) | config
known_names = {field.name for field in fields(cls)}
known_attr = {k: v for k, v in config.items() if k in known_names}
extra_attr = {k: v for k, v in config.items() if k not in known_names}
instance = cls(**known_attr)
for k, v in extra_attr.items():
setattr(instance, k, v)
return instance
def to_dict(self):
"""
Convert this CriticConfig to a dictionary.
"""
return asdict(self)
Critic configuration defines the structure of the value function estimator(s). This helps in adapting the value estimation to different algorithms (e.g., TD3 vs. SAC).
Model Configuration#
@dataclass
class ModelConfig:
"""
Base configuration for a model.
Attributes:
name (str): Identifier for the model type.
"""
name: str = "abstract"
@classmethod
def from_config(cls, config: dict, model_name: str) -> "ModelConfig":
"""
Construct a ModelConfig from a user config and model definition.
Args:
config (dict): Configuration overrides.
model_name (str): Model name to use as the default template.
Returns:
ModelConfig: A fully populated configuration object.
"""
config = model_configs[model_name] | config
known_names = {field.name for field in fields(cls)}
known_attr = {k: v for k, v in config.items() if k in known_names}
extra_attr = {k: v for k, v in config.items() if k not in known_names}
instance = cls(**known_attr)
for k, v in extra_attr.items():
setattr(instance, k, v)
return instance
def to_dict(self) -> dict:
"""
Convert this ModelConfig to a dictionary.
Args:
None
Returns:
dict: Dictionary representation of the config.
"""
return asdict(self)
Model configuration acts as a lightweight entry class that dynamically delegates to the specific actor and critic configurations based on the selected model name. This abstraction allows for easy extension when introducing new algorithmic variants while preserving a consistent interface for configuration loading.