Bayesian Layers

Bayesian Layers#

This module provides implementations of Bayesian neural network layers designed for uncertainty modeling in deep learning architectures. These layers support different variational inference techniques and activation moment propagation.

Detailed Descriptions#

BayesianLinear#

Abstract base class for Bayesian neural network layers.

Defines core attributes like weight_mu, weight_rho, bias_mu, bias_rho.
Supports optional bias, prior distributions for weights.
Allows softplus transformation for standard deviation parameters.
Includes MAP mode and KL divergence computation.

BBBLinear#

Implements a Bayesian layer using Bayes by Backprop:

Samples weights and biases during forward pass from learned posterior.
In MAP mode, uses only mean parameters without sampling.

LRLinear#

Implements a Bayesian layer using the Local Reparameterization Trick:

Samples output activations instead of weights for more efficient variance reduction.
Propagates mean and variance through layers.
Supports MAP mode.

CLTLinear#

Implements Bayesian layer using Central Limit Theorem (CLT) approximations:

Supports ReLU and CReLU activations.
Propagates mean and variance analytically through the network.
Supports input/output layer distinctions and MAP mode.

CLTLinearDet#

Deterministic version of CLTLinear:

Disables uncertainty modeling by removing learned standard deviations.
Overrides methods to raise errors for standard deviation and KL divergence calls.
Supports MAP mode and variance propagation accordingly.

Usage Example#

import torch
from nets.layers.bayesian_layers import BBBLinear

layer = BBBLinear(in_features=128, out_features=64, bias=True)
x = torch.randn(32, 128)
output = layer(x)
print(output.shape)  # torch.Size([32, 64])

Notes#

The map() method in BayesianLinear switches between MAP (deterministic) and sampling modes.
The KL divergence calculation sums over all parameters for regularization.
CLTLinear uses moment matching and normal CDF/PDF to approximate nonlinear activations.
The deterministic variant CLTLinearDet raises errors if variance or KL methods are called.

Classes#

class objectrl.nets.layers.bayesian_layers.BayesianLinear(in_features: int, out_features: int, bias: bool = True, prior_mean: float | Tensor | None = None, prior_std: float | Tensor | None = None, use_softplus: bool = False, manual_reset: bool = False, device=None, dtype=None)[source]#

Bases: ABC, Module

Abstract base class for Bayesian neural network layers.

use_softplus#

Whether to apply softplus to std dev parameters.

Type:: bool

_manual_reset#

If True, keep the random state

Type:: bool

weight_mu#

Mean of the weight distribution.

Type:: nn.Parameter

weight_rho#

Rho (transformed std) of the weight distribution.

Type:: nn.Parameter

bias_mu#

Mean of the bias distribution (if bias=True).

Type:: nn.Parameter | None

bias_rho#

Rho of the bias distribution (if bias=True).

Type:: nn.Parameter | None

prior_mean#

Mean of the prior distribution.

Type:: torch.Tensor | None

prior_std#

Standard deviation of the prior distribution.

Type:: torch.Tensor | None

_map: bool = False#

__init__(in_features: int, out_features: int, bias: bool = True, prior_mean: float | Tensor | None = None, prior_std: float | Tensor | None = None, use_softplus: bool = False, manual_reset: bool = False, device=None, dtype=None) → None[source]#

Parameters:

in_features (int) – Size of input features.
out_features (int) – Size of output features.
bias (bool) – Whether to include a bias term.
prior_mean (float or torch.Tensor, optional) – Prior mean.
prior_std (float or torch.Tensor, optional) – Prior std deviation.
use_softplus (bool) – If True, apply softplus to std parameters.
manual_reset (bool) – If True, keep the random state
device (torch.device, optional) – Device to use.
dtype (torch.dtype, optional) – Data type to use.

Returns:

None

in_features: int#

out_features: int#

reset_parameters() → None[source]#

Parameters:: None
Returns:: None

reset_randomness() → None[source]#

set_manual_reset(manual_reset: bool = True) → None[source]#

get_manual_reset() → bool[source]#

map(on: bool = True)[source]#

Switch maximum a posteriori (MAP) on or off

Parameters:: on (bool) – If True, sets MAP mode on.
Returns:: None

update_prior(prior_mean: Tensor, prior_std: Tensor) → None[source]#

static inv_softplus(x: Tensor) → Tensor[source]#

Inverse of the softplus function.

Parameters:: x (torch.Tensor) – Input tensor.
Returns:: Inverse softplus tensor.
Return type:: torch.Tensor

static softplus(x: Tensor) → Tensor[source]#

Softplus activation function.

Parameters:: x (torch.Tensor) – Input tensor.
Returns:: Softplus tensor.
Return type:: torch.Tensor

mean() → tuple[Tensor, Tensor | None][source]#

Returns:: Mean of the weight distribution and optionally bias distribution.
Return type:: tuple

std() → tuple[Tensor, Tensor | None][source]#

Returns:: Standard deviation of the weight distribution and optionally bias distribution.
Return type:: tuple

var() → tuple[Tensor, Tensor | None][source]#

Returns:: Variance of the weight distribution and optionally bias distribution.
Return type:: tuple

abstractmethod forward(input: Tensor) → Tensor[source]#

Parameters:: input (torch.Tensor) – Input tensor.
Returns:: Output tensor after applying the layer.
Return type:: torch.Tensor

KL() → tuple[Tensor, int][source]#

Computes the KL divergence between posterior and prior.

Parameters:: None
Returns:: KL divergence and number of parameters.
Return type:: Tuple[torch.Tensor, int]

class objectrl.nets.layers.bayesian_layers.BBBLinear(in_features: int, out_features: int, bias: bool = True, prior_mean: float | Tensor | None = None, prior_std: float | Tensor | None = None, use_softplus: bool = False, manual_reset: bool = False, device=None, dtype=None)[source]#

Bases: BayesianLinear

Implements a Bayesian Layer following Bayes by Backprop (Blundell et al., 2015) Samples weights and biases during the forward pass from the learned posterior distribution. In MAP mode, only the means are used.

Parameters:

in_features (int) – Number of input features.
out_features (int) – Number of output features.
bias (bool) – Whether to include a bias term.
prior_mean (float | torch.Tensor | None) – Prior mean for weights.
prior_std (float | torch.Tensor | None) – Prior standard deviation for weights.
use_softplus (bool) – Whether to apply softplus activation to std parameters.
manual_reset (bool) – If True, keep the random state
device (torch.device, optional) – Device to use for the layer.
dtype (torch.dtype, optional) – Data type for the layer parameters.

in_features#

Number of input features.

Type:: int

out_features#

Number of output features.

Type:: int

use_softplus#

Whether to apply softplus to std dev parameters.

Type:: bool

_manual_reset#

If True, keep the random state

Type:: bool

weight_mu#

Mean of the weight distribution.

Type:: nn.Parameter

weight_rho#

Rho (transformed std) of the weight distribution.

Type:: nn.Parameter

bias_mu#

Mean of the bias distribution (if bias=True).

Type:: nn.Parameter | None

bias_rho#

Rho of the bias distribution (if bias=True).

Type:: nn.Parameter | None

prior_mean#

Mean of the prior distribution.

Type:: torch.Tensor | None

prior_std#

Standard deviation of the prior distribution.

Type:: torch.Tensor | None

forward(input: Tensor) → Tensor[source]#

Forward pass of the layer.

Parameters:: input (torch.Tensor) – Input tensor.
Returns:: Output tensor after applying the layer.
Return type:: torch.Tensor

class objectrl.nets.layers.bayesian_layers.LRLinear(in_features: int, out_features: int, bias: bool = True, prior_mean: float | Tensor | None = None, prior_std: float | Tensor | None = None, use_softplus: bool = False, manual_reset: bool = False, device=None, dtype=None)[source]#

Bases: BayesianLinear

Implements a Bayesian layer using a local reparameterization trick (Kingma et al., 2015). Instead of sampling weights, it samples output activations using propagated mean and variance. More efficient and less noisy than direct weight sampling.

Parameters:

in_features (int) – Number of input features.
out_features (int) – Number of output features.
bias (bool) – Whether to include a bias term.
prior_mean (float | torch.Tensor | None) – Prior mean for weights.
prior_std (float | torch.Tensor | None) – Prior standard deviation for weights.
use_softplus (bool) – Whether to apply softplus activation to std parameters.
device (torch.device, optional) – Device to use for the layer.
dtype (torch.dtype, optional) – Data type for the layer parameters.

in_features#

Number of input features.

Type:: int

out_features#

Number of output features.

Type:: int

use_softplus#

Whether to apply softplus to std dev parameters.

Type:: bool

weight_mu#

Mean of the weight distribution.

Type:: nn.Parameter

weight_rho#

Rho (transformed std) of the weight distribution.

Type:: nn.Parameter

bias_mu#

Mean of the bias distribution (if bias=True).

Type:: nn.Parameter | None

bias_rho#

Rho of the bias distribution (if bias=True).

Type:: nn.Parameter | None

prior_mean#

Mean of the prior distribution.

Type:: torch.Tensor | None

prior_std#

Standard deviation of the prior distribution.

Type:: torch.Tensor | None

forward(input: Tensor) → Tensor[source]#

Forward pass of the layer using local reparameterization trick.

Parameters:: input (torch.Tensor) – Input tensor.
Returns:: Output tensor after applying the layer.
Return type:: torch.Tensor

class objectrl.nets.layers.bayesian_layers.CLTLinear(*args, act: Literal['relu', 'crelu'] = 'relu', is_input: bool = False, is_output: bool = False, **kwargs)[source]#

Bases: BayesianLinear

Implements a Bayesian layer using a central limit theorem (Wu et al., 2019; Haussmann, 2021). Supports ReLU and CReLU activations. During forward pass, propagates mean and variance analytically instead of sampling.

Parameters:

in_features (int) – Number of input features.
out_features (int) – Number of output features.
bias (bool) – Whether to include a bias term.
prior_mean (float | torch.Tensor | None) – Prior mean for weights.
prior_std (float | torch.Tensor | None) – Prior standard deviation for weights.
use_softplus (bool) – Whether to apply softplus activation to std parameters.
device (torch.device, optional) – Device to use for the layer.
dtype (torch.dtype, optional) – Data type for the layer parameters.

act#

Activation type (‘relu’ or ‘crelu’).

Type:: str

is_input#

Whether this is the input layer.

Type:: bool

is_output#

Whether this is the output layer.

Type:: bool

__init__(*args, act: Literal['relu', 'crelu'] = 'relu', is_input: bool = False, is_output: bool = False, **kwargs) → None[source]#

Initializes the CLTLinear layer.

Parameters:

act (Literal["relu", "crelu"]) – Activation function to use (‘relu’ or ‘crelu’).
is_input (bool) – Whether this is the input layer.
is_output (bool) – Whether this is the output layer.

Returns:

None

reset_randomness() → None[source]#

static normal_cdf(x, mu: float | Tensor = 0.0, sigma: float | Tensor = 1.0) → Tensor[source]#

Computes the cumulative distribution function (CDF) of a normal distribution.

Parameters:

x (torch.Tensor) – Input tensor.
mu (float or torch.Tensor) – Mean of the normal distribution.
sigma (float or torch.Tensor) – Standard deviation of the normal distribution.

Returns:

CDF values for the input tensor.

Return type:

torch.Tensor

static normal_pdf(x, mu: float | Tensor = 0.0, sigma: float | Tensor = 1.0) → Tensor[source]#

Computes the probability density function (PDF) of a normal distribution.

Parameters:

x (torch.Tensor) – Input tensor.
mu (float or torch.Tensor) – Mean of the normal distribution.
sigma (float or torch.Tensor) – Standard deviation of the normal distribution.

Returns:

PDF values for the input tensor.

Return type:

torch.Tensor

static relu_moments(mu: Tensor, sigma: Tensor) → tuple[Tensor, Tensor][source]#

Computes the mean and variance of the ReLU activation function.

Parameters:

mu (torch.Tensor) – Mean of the input tensor.
sigma (torch.Tensor) – Standard deviation of the input tensor.

Returns:

Mean and variance of the ReLU activation.

Return type:

tuple

static neg_relu_moments(mu: Tensor, sigma: Tensor) → tuple[Tensor, Tensor][source]#

Computes the mean and variance of the negative ReLU activation function.

Parameters:

mu (torch.Tensor) – Mean of the input tensor.
sigma (torch.Tensor) – Standard deviation of the input tensor.

Returns:

Mean and variance of the negative ReLU activation.

Return type:

tuple

static crelu_moments(mu: Tensor, sigma: Tensor) → tuple[Tensor, Tensor][source]#

Computes the mean and variance of the CReLU activation function.

Parameters:

mu (torch.Tensor) – Mean of the input tensor.
sigma (torch.Tensor) – Standard deviation of the input tensor.

Returns:

Mean and variance of the CReLU activation.

Return type:

tuple

forward(mu_h: Tensor, var_h: Tensor | None = None) → tuple[Tensor, Tensor | None][source]#

Parameters:

mu_h (torch.Tensor) – Input mean tensor.
var_h (torch.Tensor, optional) – Input variance tensor.

Returns:

Output mean tensor and optionally output variance tensor.

Return type:

tuple

class objectrl.nets.layers.bayesian_layers.CLTLinearDet(*args, act: Literal['relu', 'crelu'] = 'relu', is_input: bool = False, is_output: bool = False, **kwargs)[source]#

Bases: CLTLinear

Deterministic version of CLTLinear. Disables uncertainty modeling by removing the learned standard deviation.

Parameters:

in_features (int) – Number of input features.
out_features (int) – Number of output features.
bias (bool) – Whether to include a bias term.
prior_mean (float | torch.Tensor | None) – Prior mean for weights.
prior_std (float | torch.Tensor | None) – Prior standard deviation for weights.
use_softplus (bool) – Whether to apply softplus activation to std parameters.
device (torch.device, optional) – Device to use for the layer.
dtype (torch.dtype, optional) – Data type for the layer parameters.

in_features#

Number of input features.

Type:: int

out_features#

Number of output features.

Type:: int

use_softplus#

Whether to apply softplus to std dev parameters.

Type:: bool

weight_mu#

Mean of the weight distribution.

Type:: nn.Parameter

weight_rho#

Rho (transformed std) of the weight distribution.

Type:: nn.Parameter

bias_mu#

Mean of the bias distribution (if bias=True).

Type:: nn.Parameter | None

bias_rho#

Rho of the bias distribution (if bias=True).

Type:: nn.Parameter | None

prior_mean#

Mean of the prior distribution.

Type:: torch.Tensor | None

prior_std#

Standard deviation of the prior distribution.

Type:: torch.Tensor | None

__init(*args, **kwargs)#

std() → tuple[Tensor, Tensor | None][source]#

Returns:: Standard deviation of the weight distribution and None for bias.
Return type:: tuple

KL() → Tensor[source]#

Computes the KL divergence for this layer.

Parameters:: None
Returns:: KL divergence of the layer.
Return type:: torch.Tensor

forward(mu_h: Tensor, var_h: Tensor | None = None) → tuple[Tensor, Tensor | None][source]#

Parameters:

mu_h (torch.Tensor) – Input mean tensor.
var_h (torch.Tensor, optional) – Input variance tensor.

Returns:

Output mean tensor and None for output variance.

Return type:

tuple

Bayesian Layers

Contents

Bayesian Layers#

Detailed Descriptions#

BayesianLinear#

BBBLinear#

LRLinear#

CLTLinear#

CLTLinearDet#

Usage Example#

Notes#

Classes#