Counting No. of Parameters in Deep Learning Models by Hand

Photo by Andrik Langfield on UnsplashCounting No.

of Parameters in Deep Learning Models by Hand5 simple examples to count parameters in FFNN, RNN and CNN modelsRaimi KarimBlockedUnblockFollowFollowingJan 21Counting the number of trainable parameters of deep learning models is considered too trivial, because your code can already do this for you.

But I’d like to keep my notes here for us to refer to once in a while.

Here are the models that we’ll run through:Feed-Forward Neural Network (FFNN)Recurrent Neural Network (RNN)Convolutional Neural Network (CNN)In parallel, I will build the model with APIs from Keras for easy prototyping and a clean code so let’s quickly import the relevant objects here:from keras.

layers import Input, Dense, SimpleRNN, LSTM, GRU, Conv2Dfrom keras.

layers import Bidirectionalfrom keras.

models import ModelAfter building the model, call model.

count_params() to verify how many parameters are trainable.

1.

FFNNsi, input sizeh, size of hidden layero, output sizeFor one hidden layer,num_params = connections between layers + biases in every layer= (i×h + h×o) + (h+o)Example 1.

1: Input size 3, hidden layer size 5, output size 2Fig.

1.

1: FFNN with input size 3, hidden layer size 5, output size 2.

The graphics reflect the no.

of units.

i = 3h = 5o = 2num_params = connections between layers + biases in every layer= (3×5 + 5×2) + (5+2) = 32 input = Input((None, 3)) dense = Dense(5)(input)output = Dense(2)(dense) model = Model(input, output)Example 1.

2: Input size 50, hidden layers size [100,1,100], output size 50Fig.

1.

2: FFNN with 3 hidden layers.

The graphics do not reflect the no.

of units.

i = 50h = 100, 1, 100o = 50num_params = connections between layers + biases in every layer= (50×100 + 100×1 + 1×100 + 100×50) + (100+1+100+50) = 10,451 input = Input((None, 50)) dense = Dense(100)(input) dense = Dense(1)(dense) dense = Dense(100)(dense)output = Dense(50)(dense) model = Model(input, output)2.

RNNsg, no.

of gates (RNN has 1 gate, GRU has 3, LSTM has 4)h, size of hidden unitsi, dimension/size of inputThe no.

of weights in each gate is actually an FFNN with input size (h+i) and output size h.

So each gate has h(h+i) + h parameters.

num_params = g × [h(h+i) + h]Example 2.

1: LSTM with 2 hidden units and input dimension 3.

Fig.

2.

1: An LSTM cell.

Taken from here.

g = 4 (LSTM has 4 gates)h = 2i = 3num_params = g × [h(h+i) + h]= 4 × [2(2+3) + 2] = 48input = Input((None, 3)) lstm = LSTM(2)(input)model = Model(input, lstm)Example 2.

2: Stacked Bidirectional GRU with 5 hidden units and input size 8 (whose outputs are concatenated) + LSTM with 50 hidden unitsFig.

2.

2: A stacked RNN consisting of BiGRU and LSTM layers.

The graphics do not reflect the no.

of units.

Bidirectional GRU with 5 hidden units and input size 10g = 3 (GRU has 3 gates)h = 5i = 8num_params_layer1= 2 × g × [h(h+i) + h] (first term is 2 because of bidirectionality)= 2 × 3 × [5(5+8) + 5]= 420LSTM with 50 hidden unitsg = 4 (LSTM has 4 gates)h = 50i = 5+5 (outputs from bidirectional GRU concatenated; output size of GRU is 5, same as no.

of hidden units)num_params_layer2= g × [h(h+i) + h]= 4 × [50(50+10) + 50]= 12,200total_params = 420 + 12,200 = 12,620 input = Input((None, 8))layer1 = Bidirectional(GRU(5, return_sequences=True))(input)layer2 = LSTM(50)(layer1) model = Model(input, layer2)merge_mode is concatenation by default.

CNNsFor one layer,i, no.

of input maps (or channels)f, filter size (just the length)o, no.

of output maps (or channels.

this is also defined by how many filters are used)One filter is applied to every input map.

num_params= weights + biases= [i × (f×f) × o] + oExample 3.

1: Greyscale image with 1×1 filter, output 3 channelsFig.

3.

1: Convolution of a greyscale image with 2×2 filter to output 3 channelsi = 1 (greyscale has only 1 channel)f = 2o = 3num_params= [i × (f×f) × o] + o= [1 × (2×2) × 3] + 3= 15 input = Input((None, None, 1))conv2d = Conv2D(kernel_size=2, filters=3)(input) model = Model(input, conv2d)Example 3.

2: RGB image with 2×2 filter, output of 1 channelThere is 1 filter for each input feature map.

The resulting convolutions are added element-wise, and a bias term is added to each element.

This gives an output with 1 feature map.

Fig.

3.

2: Convolution of an RGB image with 2×2 filter to output 1 channeli = 3 (RGB image has 3 channels)f = 2o = 1num_params = [i × (f×f) × o] + o= [3 × (2×2) × 1] + 1= 13 input = Input((None, None, 3))conv2d = Conv2D(kernel_size=2, filters=1)(input) model = Model(input, conv2d)Example 3.

3: Image with 2 channels, with 2×2 filter, and output of 3 channelsThere are 3 filters (purple, yellow, cyan) for each input feature map.

The resulting convolutions are added element-wise, and a bias term is added to each element.

This gives an output with 3 feature maps.

Fig.

3.

1: Convolution of a 2-channel image with 2×2 filter to output 3 channelsi = 2f = 2o = 3num_params = [i × (f×f) × o] + o= [2 × (2×2) × 3] + 3= 27 input = Input((None, None, 2))conv2d = Conv2D(kernel_size=2, filters=3)(input) model = Model(input, conv2d)That’s all for now!.Do leave comments below if you have any feedback!Related Articles on Deep LearningAnimated RNN, LSTM and GRUStep-by-Step Tutorial on Linear Regression with Stochastic Gradient Descent10 Gradient Descent Optimisation Algorithms + Cheat SheetAttn: Illustrated AttentionFollow me on Twitter or LinkedIn for digested articles and demos on AI and Deep Learning.

You may also reach out to me via raimi.

bkarim@gmail.

com.

.. More details

Leave a Reply