The Million-Dollar Neural Network, Part II: Machine Learning Intuition

The Million-Dollar Neural Network, Part II: Machine Learning IntuitionLearn How to Build a Neural Network & Enter to Win the $1.

65M CMS AI Health Outcomes Challenge In This 3-Part SeriesKenneth ColónBlockedUnblockFollowFollowingMay 28Author’s note: In Part I, we discussed the $1.

65M CMS Artificial Intelligence Health Outcomes challenge, what it takes to excel at data science, and the biological basis for neural networks.

If you haven’t already, give that a read (~6 min.

) before continuing here.

This particular part of the series got pretty long.

As such I’ll be splitting this into 2 (this part) and 2A.

Then in part 3, we’ll get into actually building our neural net as originally planned.

So, now that you have a firm understanding of the biological counterpart of artificial neural networks (ANNs), it’s time to get into ANNs themselves.

Here’s a quick overview of the topics we’ll be covering:From Biology to Computer Science: Neurons to NodesNeural Net Structure: Input, Hidden, and Output LayersAssigning and Updating Weights: How Neural Networks LearnConverting Inputs to Outputs: Activation FunctionsNow without any further ado, let’s get into it.

From Biology to Computer Science: Neurons to NodesAs we discussed in Part I, biological neurons receive a host of inputs from other neurons (via the dendrites).

These inputs are summed (in the soma) and if the summed inputs meet or exceed the specified threshold for that neuron, the signal is transmitted (via an action potential) down the length of the axon.

This output is then passed on across the synapse (via a chemical signal) from the axon terminals of the presynaptic neuron to the dendrites of the postsynaptic neuron.

Much like their biological counterparts, artificial neurons (or nodes, as they’re often called) receive a host of inputs from other neurons in the network, and pass on an output.

Neural Net Structure: Input, Hidden, and Output LayersIn an artificial neural network, nodes (pictured as circles below) belong to one of three different layers: the input layer, the hidden layer, or the output layer, all connected by synapses (pictured as arrows below).

The input layer, as you might guess, receive it’s inputs from a row of data from your dataset.

Each input in a given set of inputs comes from a different data element, all from the same row.

For example, let’s take a look at the following dataset.

Imagine for a moment that you and I together run a large, international corporation.

Now imagine that we’re facing an issue of high employee turnover.

What we’d like to do is try and predict which of our various employees are most likely leave, so that we can design effective programs around increasing employee retention.

Imagine that we’ve trained our neural network already using our employee data, and we’re now using the following data to make some predictions.

The inputs to our neural net would be name, position, office, age, start date, and salary.

So a given set of inputs would all come from the same row, and could look something like this:Sonya Frost; Software Engineer; Edinburgh; 23; 2008/12/13; $103,600Or to re-use our image of a basic neural net from before:Now it’s important to note, no computations (or activation functions) are applied in the input layer — no summing of inputs, no threshold to meet or exceed.

Its purpose is merely to pass on information to the hidden layer nodes.

Hidden layer nodes receive their inputs from the input layer, perform a computation, and pass on an output to the output nodes.

The output nodes perform a final computation and spit out an answer.

This final output is in response to data from that same row — in our case, this might represent the likelihood Sonya that Sonya would quit her job, say maybe a 0.

6 on a scale of 0 to 1.

This final output can be continuous, binary, or categorical; the type of output you need will depend on your particular use case/problem.

In our case, a probability score would be most helpful, so we would have used a sigmoid function in our output layer.

If you’re not sure what this means at this stage, don’t worry — we’ll get to activation functions in just a sec :).

The process essentially repeats itself with the next row of data.

To again use our example dataset, this would be:Doris Wilder; Sales Assistant; Sidney; 23; 2010/09/20; $85,600In the case of more complex neural networks, otherwise known as deep neural networks, there are multiple hidden layers, with nodes of the hidden layer receiving inputs from the nodes of the previous hidden layer, until the final hidden layer sends it’s outputs to the output layer.

Think about it like this:To harken back to our example in Part I, let’s say the input layer receiving data from your dataset equates to your eyes receiving a visual stimuli, i.

e.

strong sunlight.

The final output of the output layer equates to the muscles that control your eyelids firing, causing you to close your eyes.

The hidden layers are all the complex steps in between that make it seem like magic.

Assigning and Updating Weights — How Neural Networks LearnSo remember in Part I how we said understanding action potentials would be helpful in understanding activation functions?.Well, here’s where that finally comes in handy.

The synapses, or connections between the nodes, are assigned random strengths, otherwise known as weights.

Neural networks learn by updating these weights, increasing the strength of connections between some nodes, weakening and even eliminating (or pruning) the connections between others.

This is exactly how the human brain works — strengthening some connections, pruning others.

So what happens inside the individual nodes?To go back to biological neurons for a second: in the soma (cell body) of biological neurons, all the inputs are summed together.

If the sum exceeds the given threshold for that particular neuron, an action potential is fired.

In artificial neurons (nodes), all inputs to the node are multiplied by the weight of their respective connections and summed together, producing the weighted sum.

Converting Inputs to Outputs: Activation FunctionsSo, once we have our weighted sum, an activation function is applied to convert these inputs into an output — this determines what kind of signal is passed on (or not) to the next node.

Activation functions introduce non-linearity and enable neural networks to analyze complex data like images, audio and video.

There are many types of activation functions, some of which are better suited for use in the hidden layers (e.

g.

ReLU), while others are better suited for the output layer (e.

g.

sigmoid):Binary Step→ produces a 1 or 0 output; great when you need a simple yes / no predictionBinary Step FunctionSigmoid function → Very useful in output layer for predicting probabilities, since every probability lies between 0 and 1Sigmoid FunctionTanh (aka hyperbolic tangent) and its variations (LeCun’s tanh, hard tanh) → essentially a scaled sigmoid function that ranges from -1 to 1Tanh functionReLU (rectified linear unit) and its variations (Leaky ReLU, PReLU, ELU, etc.

) → still one of the most popular functions used in neural networks today.

Makes for easy and efficient for computation.

The Leaky ReLU variation solves the common issue of “dead neurons” with traditional the ReLU.

SoftMax → a type of sigmoid function best for use in the output layer of classification problems with multiple classes (e.

g.

input is an image of a Pokémon, output is the probability that particular Pokémon is a fire, grass or water type)SoftMax functionLet’s RecapSo back in Part I, we talked about the biological basis of artificial neural networks — the human brain.

Importantly, when neurons receive inputs from other neurons, these inputs are summed in the soma (cell body).

If the sum of these inputs meets or exceeds the given threshold for that particular neuron, an action potential is triggered and the signal is propagated down the axon and across the synapse to the next neurons.

Here in Part II, we’ve discussed how this translates to a machine learning context.

Nodes, or the individual neurons within an artificial neural network, connect to one another via synapses.

The neural network is comprised of three separate layers: the input layer (which passes on data from your dataset), the hidden layer(s) (where the real magic of neural networks happens), and the output layer (where a final computation, or activation function, is applied before spitting out a final answer).

The initial inputs are data elements from your dataset, all from the same row or observation (as per our previous example, start date, salary, and office location).

And the final output, again, is for that same row (as per our previous example, probability that employee would leave the organization).

After the data elements for that row/observation are run through the neural network, the process is repeated again with the next row/observation.

Synapses have different weights or strengths, which are updated (either strengthened or weakened) when the neural network learns.

The inputs to a given node are multiplied by their weights and summed to give a weighted sum.

So how do these nodes know what kind of signal to pass along?.The node takes the weighted sum of its inputs and applies an activation function.

There are several kinds of activation functions, some better suited for use in the hidden layer(s), like the ReLU function, or for use in the output layer, like the sigmoid function.

The activation function used in the output layer largely depends on what kind of answer you need.

The sigmoid function, for example, gives you a nice probability on a scale from 0 to 1.

If you need to assign probabilities to multiple classes, then you might be better off with the SoftMax function.

Now, unfortunately this Part 2 got pretty long, so we’ll be finishing up our machine learning intuition in a Part 2A, where we’ll cover the importance of hidden layers, minimizing the cost function, gradient descent, and backpropagation.

Then in Part 3, we’ll finally get into actually building our neural network!Interested in using machine learning to make a real impact in healthcare?.Learn how we’re solving the problem of inaccurate doctor data and check out our website at orderlyhealth.

com.

Connect with me on LinkedIn!.

. More details

Leave a Reply