Evolvable Multi-layer Perceptron (MLP)¶

Parameters¶

class agilerl.modules.mlp.EvolvableMLP(*args: Any, **kwargs: Any)¶

The Evolvable Multi-layer Perceptron class. Consists of a sequence of fully connected linear layers with an optional activation function between each layer. Supports using layer normalization, using noisy linear layers, and vanishing the values of the weights in the output layer. Allows for the following types of architecture mutations during training:

Adding or removing hidden layers
Adding or removing nodes from hidden layers
Changing the activation function between layers (e.g. ReLU to GELU)
Changing the activation function for the output layer (e.g. ReLU to GELU)

Parameters:

num_inputs (int) – Input layer dimension
num_outputs (int) – Output layer dimension
hidden_size (list[int]) – Hidden layer(s) size
activation (str, optional) – Activation layer, defaults to ‘ReLU’
output_activation (str, optional) – Output activation layer, defaults to None
min_hidden_layers (int, optional) – Minimum number of hidden layers the network will shrink down to, defaults to 1
max_hidden_layers (int, optional) – Maximum number of hidden layers the network will expand to, defaults to 3
min_mlp_nodes (int, optional) – Minimum number of nodes a layer can have within the network, defaults to 64
max_mlp_nodes (int, optional) – Maximum number of nodes a layer can have within the network, defaults to 500
layer_norm (bool, optional) – Normalization between layers, defaults to True
output_layernorm (bool, optional) – Normalization for the output layer, defaults to False
output_vanish (bool, optional) – Vanish output by multiplying by 0.1, defaults to True
init_layers (bool, optional) – Initialise network layers, defaults to True
noise_std (float, optional) – Noise standard deviation, defaults to 0.5
noisy (bool, optional) – Add noise to network, defaults to False
new_gelu (bool, optional) – Use new GELU activation function, defaults to False
device (str, optional) – Device for accelerated computing, ‘cpu’ or ‘cuda’, defaults to ‘cpu’
name (str, optional) – Name of the network, defaults to ‘mlp’
random_seed (int | None) – Random seed to use for the network. Defaults to None.

property activation: str¶

Return activation function.

Returns:: Activation function
Return type:: str

add_layer() → dict[str, int] | None¶

Add a hidden layer to neural network. Falls back on add_node() if max_hidden_layers reached.

Returns:: Dictionary containing the hidden layer and number of new nodes.
Return type:: dict[str, int]

add_node(hidden_layer: int | None = None, numb_new_nodes: int | None = None) → dict[str, int]¶

Add nodes to hidden layer of neural network.

Parameters:

hidden_layer (int, optional) – Depth of hidden layer to add nodes to, defaults to None
numb_new_nodes (int, optional) – Number of nodes to add to hidden layer, defaults to None

Returns:

Dictionary containing the hidden layer and number of new nodes.

Return type:

dict[str, int]

change_activation(activation: str, output: bool = False) → None¶

Set the activation function for the network.

Parameters:

activation (str) – Activation function to use.
output (bool, optional) – Flag indicating whether to set the output activation function, defaults to False

forward(x: ndarray | Tensor) → Tensor¶

Return output of neural network.

Parameters:: x (torch.Tensor or np.ndarray) – Neural network input
Returns:: Neural network output
Return type:: torch.Tensor

get_output_dense() → Module¶

Return output layer of neural network.

Returns:: Output layer of neural network
Return type:: torch.nn.Module

init_weights_gaussian(std_coeff: float = 4, output_coeff: float = 4) → None¶

Initialise weights of neural network using Gaussian distribution.

Parameters:

std_coeff (float, optional) – Standard deviation coefficient, defaults to 4
output_coeff (float, optional) – Output layer standard deviation coefficient, defaults to 4

property net_config: dict[str, Any]¶

Return model configuration in dictionary.

Returns:: Model configuration
Return type:: dict[str, Any]

recreate_network() → None¶: Recreates the neural network while preserving the parameters of the old network.

remove_layer() → dict[str, int] | None¶

Remove a hidden layer from neural network. Falls back on add_node() if min_hidden_layers reached.

Returns:: Dictionary containing the hidden layer and number of new nodes.
Return type:: dict[str, int]

remove_node(hidden_layer: int | None = None, numb_new_nodes: int | None = None) → dict[str, int]¶

Remove nodes from hidden layer of neural network.

Parameters:

hidden_layer (int, optional) – Depth of hidden layer to remove nodes from, defaults to None
numb_new_nodes (int, optional) – Number of nodes to remove from hidden layer, defaults to None

Returns:

Dictionary containing the hidden layer and number of new nodes.

Return type:

dict[str, int]