IBM Draws Inspiration from the Human Brain to Build Better Neural Networks

IBM Draws Inspiration from the Human Brain to Build Better Neural NetworksJesus RodriguezBlockedUnblockFollowFollowingApr 8Connectionism is the school of cognitive science that looks to build artificial intelligence(AI) systems inspired by the human brain.

The connectionist school of thought has been behind the recent emergence of deep learning and deep neural networks as one of the most active technology trends in the market.

Despite the recent technological progress, neural network architectures are only practical for highly specialized tasks and don’t exhibit a lot of compatibility to how humans build knowledge over time.

Recently, a group of researchers from IBM published a new paper proposing a learning method that draws inspiration from neuroscientific patterns to improve the learning processes in deep neural networks.

The human brain remains the biggest inspiration to the entire field of AI.

The neurobiological mechanisms and cognitive patterns that help humans acquire knowledge remain largely unknown but the field of neuroscience has been making steady progress on this area in the last decade.

Today, we clearly know that knowledge is form by connections between different groups of neurons and that those connections play a role in other cognitive patterns such as memory, intuition, planning and many others.

Whole conceptually simple, those patterns have been impossible to recreate effectively in neural networks.

Part of the challenge is rooted in the mismatch between neural network architectures and one of the fundamental learning patterns of the human brain.

Hebb’s Rule vs.

BackpropagationIn his 1949 book The Organization Behavior, Canadian neuro-phycologist Donald Hebb introduced a new theory that came to be known as the Hebb’s rule or cell assembly theory.

The official postulate of the Hebb’s rule is as follows:“Let us assume that the persistence or repetition of a reverberatory activity (or “trace”) tends to induce lasting cellular changes that add to its stability.

 … When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased.

”The neuroscience community often uses a simpler version of the Hebb’s rule: “Cells that fire together wire together”.

The Hebb’s rule is the foundation of another important brain pattern known as synaptic plasticity which essentially states that the synapses strength of weaken over time based on their activity.

In other words, neural connections that are often active tend to become stronger over time than those which aren’t.

According to Hebbian theory, stronger synaptic connections constitute the basics of long term memory and other learning mechanisms.

The Hebb’s rule tells us that some synaptic connections strengthen over time while others don’t.

This principle is in complete contradiction with one of the core components of deep neural network architectures.

The backpropagation algorithm is often used to calculate the gradients and weights in a neural network in order to improve the learning model.

In order to perform effective updates to a given neuron, backpropagation requires not only knowledge of the target neuron and its connections but also about the higher layers of the architectures which are not directly knowable from the activities of the specific neuron.

In that sense, backpropagation relies on a top-down knowledge distribution model which doesn’t correlate with the way the brain works.

Secondly, backpropagation relies on large volumes of labeled data to build the initial composition of the network which also contrasts with the unsupervised, mostly observational model used by the brain in any learning activity.

If the human brain would use a backpropagation-like algorithm, we wouldn’t have a notion of memory as we know it as, in order to form a connection, the brain would have to predict events in the future ????Bridging the mathematical model of the backpropagation algorithm with the biological nature of the Hebb’s rule have been a recent area of research in deep neural networks.

This was the fundamental inspiration to IBM’s work in the space.

A Hebb’s Rule for Neural NetworksThe main principle of the IBM research was to design a deep neural network architecture that includes synaptic like mechanisms that follow similar principles to the Hebb’s Rule.

More specifically, IBM take as fundamental the idea coming from psychologists that the change of the efficacy of synapses is central to learning and that the most important aspect of biological learning is that the coordinated activity of a presynaptic cell i and a postsynaptic cell j will produce changes in the synaptic efficacy of the synapse Wij between them.

Using those ideas, IBM proposes a biologically-inspired neural network architecture in which the weights of neurons the lower layers is inferred by the activity of its connections.

This learning model allows to initially train a neural network using unsupervised techniques and then complete the training with a supervised model.

In IBM’s biologically-inspired neural network, the change of synapse strength during the learning process is proportional to the activity of the presynaptic cell and to a function of the activity of the postsynaptic cell.

Initially, the model is trained using that idea until the weights of the lower layers are calculated.

At that point, those layers are used as input to a fully connected perceptron model which then is combined with stochastic gradient descent(SGD) to calculate the weights of the higher layers.

Putting the previous mathematical model in the context of a neural network we get a pipeline divided into three main stages.

Given an image input, the unsupervised training models generates a vector of training currents I{I1, I2, …, Iu}.

The activations produced during this phase are used to update the weights of the lower layers using a synaptic-like mechanism.

At that point, the training of the neural network is completed using traditional supervised learning and SGD-based optimizations.

IBM’s biologically-inspired neural network presents two key advantages over traditional models:1) The first part of the training is completely unsupervised and doesn’t require large volumes of labeled data.

2) The weights of the lower hidden layers are inferred based only on local activity without requiring expensive backpropagation techniques.

One major drawback of the proposed model is performance.

First, it is an online algorithm so that training examples are presented one at a time, unlike SGD where training examples can be presented in minibatches.

Second, for any training example one has to wait until the set of hidden units reaches a steady state.

To evaluate the performance of the new model, IBM created two neural networks.

The first one is trained using the two-stage procedure: the proposed “biological” training of the first layer followed by the standard gradient descent training of the classifier in the top layer.

The second one is trained end-to-end with the backpropagation algorithm on a supervised task.

Both neural networks have been trained on MNIST and CIFAR-10 datasets.

In both cases, the biologically-inspired model achieved levels of performance comparable with traditional backpropagation neural networks with a fraction of the training dataset.

In the case of MNIST, the network that is trained with the backpropagation algorithm end-to-end demonstrates the well-known benchmarks: training error = 0%, test error = 1.

5%.

The network trained in a “biological” way reaches the error on the training set 0.

4%.

Thus, it never fits the training data perfectly.

At the same time, the error on the held-out test set is 1.

46%, the same as the error of the network that is trained end-to-end.

This is surprising because the biologically-inspired model learned the weights of the first layer without knowing what task these weights will be used for, unlike the network that is trained end-to-end.

Together with the research paper, IBM published an implementation of their biological neural network in GitHub.

The current implementation is constrained to very specific image analysis scenarios but many of the core principles can be extrapolated to other deep learning models.

The combination of unsupervised and supervised learning and synaptic-like weight calculations is certainly a super innovative idea of the IBM model.

We don’t really know which cognitive patterns of the brain can be applicable to deep neural networks but the exploration of these ideas is likely to be at the forefront of deep learning research for the next few years.

.

. More details

Leave a Reply