Evolvable Multi-layer Perceptron (MLP)

Parameters

class agilerl.modules.mlp.EvolvableMLP(*args: Any, **kwargs: Any)

The Evolvable Multi-layer Perceptron class. Consists of a sequence of fully connected linear layers with an optional activation function between each layer. Supports using layer normalization, using noisy linear layers, and vanishing the values of the weights in the output layer. Allows for the following types of architecture mutations during training:

  • Adding or removing hidden layers

  • Adding or removing nodes from hidden layers

  • Changing the activation function between layers (e.g. ReLU to GELU)

  • Changing the activation function for the output layer (e.g. ReLU to GELU)

Parameters:
  • num_inputs (int) – Input layer dimension

  • num_outputs (int) – Output layer dimension

  • hidden_size (list[int]) – Hidden layer(s) size

  • activation (str, optional) – Activation layer, defaults to ‘ReLU’

  • output_activation (str, optional) – Output activation layer, defaults to None

  • min_hidden_layers (int, optional) – Minimum number of hidden layers the network will shrink down to, defaults to 1

  • max_hidden_layers (int, optional) – Maximum number of hidden layers the network will expand to, defaults to 3

  • min_mlp_nodes (int, optional) – Minimum number of nodes a layer can have within the network, defaults to 64

  • max_mlp_nodes (int, optional) – Maximum number of nodes a layer can have within the network, defaults to 500

  • layer_norm (bool, optional) – Normalization between layers, defaults to True

  • output_layernorm (bool, optional) – Normalization for the output layer, defaults to False

  • output_vanish (bool, optional) – Vanish output by multiplying by 0.1, defaults to True

  • init_layers (bool, optional) – Initialise network layers, defaults to True

  • noise_std (float, optional) – Noise standard deviation, defaults to 0.5

  • noisy (bool, optional) – Add noise to network, defaults to False

  • new_gelu (bool, optional) – Use new GELU activation function, defaults to False

  • device (str, optional) – Device for accelerated computing, ‘cpu’ or ‘cuda’, defaults to ‘cpu’

  • name (str, optional) – Name of the network, defaults to ‘mlp’

  • random_seed (int | None) – Random seed to use for the network. Defaults to None.

property activation: str

Return activation function.

Returns:

Activation function

Return type:

str

add_layer() dict[str, int] | None

Add a hidden layer to neural network. Falls back on add_node() if max_hidden_layers reached.

Returns:

Dictionary containing the hidden layer and number of new nodes.

Return type:

dict[str, int]

add_node(hidden_layer: int | None = None, numb_new_nodes: int | None = None) dict[str, int]

Add nodes to hidden layer of neural network.

Parameters:
  • hidden_layer (int, optional) – Depth of hidden layer to add nodes to, defaults to None

  • numb_new_nodes (int, optional) – Number of nodes to add to hidden layer, defaults to None

Returns:

Dictionary containing the hidden layer and number of new nodes.

Return type:

dict[str, int]

change_activation(activation: str, output: bool = False) None

Set the activation function for the network.

Parameters:
  • activation (str) – Activation function to use.

  • output (bool, optional) – Flag indicating whether to set the output activation function, defaults to False

forward(x: ndarray | Tensor) Tensor

Return output of neural network.

Parameters:

x (torch.Tensor or np.ndarray) – Neural network input

Returns:

Neural network output

Return type:

torch.Tensor

get_output_dense() Module

Return output layer of neural network.

Returns:

Output layer of neural network

Return type:

torch.nn.Module

init_weights_gaussian(std_coeff: float = 4, output_coeff: float = 4) None

Initialise weights of neural network using Gaussian distribution.

Parameters:
  • std_coeff (float, optional) – Standard deviation coefficient, defaults to 4

  • output_coeff (float, optional) – Output layer standard deviation coefficient, defaults to 4

property net_config: dict[str, Any]

Return model configuration in dictionary.

Returns:

Model configuration

Return type:

dict[str, Any]

recreate_network() None

Recreates the neural network while preserving the parameters of the old network.

remove_layer() dict[str, int] | None

Remove a hidden layer from neural network. Falls back on add_node() if min_hidden_layers reached.

Returns:

Dictionary containing the hidden layer and number of new nodes.

Return type:

dict[str, int]

remove_node(hidden_layer: int | None = None, numb_new_nodes: int | None = None) dict[str, int]

Remove nodes from hidden layer of neural network.

Parameters:
  • hidden_layer (int, optional) – Depth of hidden layer to remove nodes from, defaults to None

  • numb_new_nodes (int, optional) – Number of nodes to remove from hidden layer, defaults to None

Returns:

Dictionary containing the hidden layer and number of new nodes.

Return type:

dict[str, int]