Automatic Differentiation, Explained

The answer lies in a process known as automatic differentiation.

Let me illustrate it to you using the cost function from the previous series, but tweaked so that it’s in scalar form.

Image 1: The cost function in scalar formIn addition, because automatic differentiation can only calculate the partial derivative of an expression on a certain point, we have to assign initial values to each of the variables.

Let us say: y=5; w=2; x=1; and b=1.

Let’s find the derivative of the function!Before we can begin deriving the expression, it must be converted into a computational graph.

A computational graph simply turns each operation into a node and connects them through lines, called edges.

The computation graph for our example function is shown below.

Figure 2: Computation graph for the scalar cost functionFirst, let us calculate the values of each node, propagating from the bottom (the input variables) to the top (the output function).

This is what we get:Figure 3: Values of each nodeNext, we need to calculate the partial derivatives of each connection between operations, represented by the edges.

These are the calculations of the partials of each edge:Figure 4: Values of each edgeNotice how the partial of the max(0, x) piece-wise function is also a piece-wise function.

The function converts all negative values to zero, and keeps all positive values as they are.

The partial of the function (or graphically, its slope), should be clear on its graph:Figure 5: max(0, x) function.

As seen, there is a slope of 1 when x>0, and a slope of 0 when x<0.

The slope is undefined when x=0.

Now we can move on to calculating the partials!.Let’s find the partial of with respect to the weights.

As seen in Figure 6, there is just one line that connects the result to the weights.

Figure 6: The path that connects the function to the weightsNow we simply multiply up the edges:Figure 7: PartialAnd that’s our partial!This whole process can be completed automatically, and allows computers to compute the partial derivative of a value of a function accurately and quickly.

It is this process that allows AI to be as efficient as it is today.

If you like this article, or have any questions or suggestions, feel free to leave a comment below!.. More details

Leave a Reply