New Feature in DeepTrainer – Per Layer Granularity for Activation Functions

After implementing a set of new activation functions in DeepTrainer (see my previous post) I had to come to the conclusion that most modern activation functions require different treatment than the almighty Hyperbolic Tangent. With TanH I could conveniently use the same activation function for all layers and neurons in the network, and it worked like charm for all training algorithms. However, activation functions like ReLU require at least a different activation for the output layer, otherwise the training will be very likely to go wrong.

So I decided to implement separately selectable activation functions for every layer in the network. This implementation is already checked in to the repository on Git.

From the API’s point of view this means that you will have to provide a list of activation functions when initializing the network, similarly to the list of neuron count in each layer:

var parameters = new NeuralNetworkParameters(
    filename,                               // Path to CSV file containing Input & Target data
    new List<uint> { 1, 32, 64, 32, 5 },    // Neuron count in each layer. First is number of inputs, last is number of outputs
    new List<ActivationType> {              // List of activation functions in each layer.
        ActivationType.Linear,              // First represents the input layer. No activation needed here. Always Linear.
        ActivationType.PReLU,               // Activation functions of hidden layers
        ActivationType.PReLU,               // Activation functions of hidden layers
        ActivationType.PReLU,               // Activation functions of hidden layers
        ActivationType.HyperbolicTangent}); // Activation function of the output layer

network = new NeuralNetwork(parameters);    // Initializing the network with parameters

 

At the moment the implementation utilizes polymorphism through an interface and virtual functions which is translated to vtable lookups by the compiler. This is not an ideal solution performance-wise, so I am still working on a way to cache these activation functions which could improve the performance.

The UI for the WPF application has changed a bit, now it allows you to select the activation from combo boxes for each layer:

SetupWithActivation

At this moment using PReLU activation for the hidden layers with a Hyperbolic Tangent output layer activation seems to work very well, both with backpropagation and resilient propagation, the convergence of the network seems much faster than before.

Here is a video of training a network with 1 input, 5 outputs, and three hidden layers with 16, 32 and 64 neurons. The hidden layers use PReLU, and the output layer uses TanH activation.

PReLU_TanH_1_16_32_64_5

And this video below shows training a network with similar parameters, but using Hyperbolic Tangent as activation for all hidden layers and the output layer too. The training pattern is the same. (The regression of the first output/target relationship displayed in these videos are samples from a parabola function.)

TanH_TanH_1_16_32_64_5.gif

As soon as I have further updates or performance measurements I am going to update this post.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s