Evolution of Graph Neural Networks for Recommender SystemsIntroduction to the three progressions of Graph Neural Networks in the context of Recommender Systems.
Snehal Reddy KoukuntlaBlockedUnblockFollowFollowingMay 26Note :Throughout this article, I am going to bias recommender systems in the context of fashion.
With the AI craze picking up in the industry, and huge amounts of public user data being available at our disposal, every company wants to lure the customers by showing/recommending them what would seem beguiling to them, be it a product or a service.
What started off from basic collaborative filtering and market basket analysis and went on with latent factor models using Matrix Factorization such as SVD and PMF, has now branched out into something much more sophisticated.
Lets analyse a particular situation.
Let us say, we have collected data of modern trendy fashion set for a range of people, and now we have to recommend them a complete fashion set (top-wear, bottom-wear, watch and shoes etc.
) or some items to complete their already existing fashion set.
The first idea is feed some latent representation of fashion items into a Bidirectional LSTM.
But upon further thoughts, we realise that the inherent sequential structure of LSTMs becomes a problem, as there doesn’t exist a fixed order of items.
The complex relationships between different items in the set like relationship between the jacket and shoes, shoes and socks etc.
cannot be learnt properly.
To overcome the above problem, the first thought is to form some sort of representation in the form of Graph, where the edges can represent the relationship between two items represented as nodes.
This is where GNN (Graph Neural Networks) come in.
Lets proceed with this representation and improve this for our purposeGraph Neural NetworksThe power of GNN in modeling the dependencies between nodes is truly a breakthrough in not only recommender systems, but also in social networks.
So lets analyse the basic structure of a graph neural network which emerged in the first decade of the 21st century.
Basically each node is in the form of a recurrent unit, with the grey envelope denoting each node’s feature.
Each edge type has a neural network representation.
At the any timestep, for a given node we get messages from neighboring recurrent units just like in a linear RNN network and aggregate them.
We continue the learning using Almeida-Pineda algorithm, which works by running the propagation to convergence, and then computing gradients based upon the converged solution.
With this we get embedding of each node independantly.
Then we take each of those together and concatenate them to get the representation of the graph.
It become clear that the effect of one node on the other exponentially decays with the distance between them.
So we can say, it has trouble in propagating information across a long-range in a graph.
Hence came the Gated Graph Neural Network (GGNN) to overcome this problem.
Gated Graph Neural NetworkAs the name suggests, the major change is that we use GRUs (Gated Recurrent Units) and hence we let the recurrence happen for some number of timesteps whilst backpropagating through time instead of using the classic Almeida-Pineda algorithm.
Node annotations are first copied into the first components of the hidden state and the rest are padded with zeros.
Message flow between different nodes of the graph via incoming and outgoing edges with parameters dependent on the edge type and direction, is then taken care of.
Then using the hidden state and annotations, GRU styled updates take place.
But still in GGNN, the nodes interact with others by communicating their state information on the edges in a fixed way, which leads to a limitation that it has difficulty in modeling flexible and complex node interactions.
Hence Node-wise Graph Neural Networks (NGNN) came up.
Node-wise Graph Neural NetworksNGNN aims to model edge-wise interactions with node-wise parameters.
Each node in NGNN corresponds to a class of clothing items.
Edge between different classes are initialised keeping in mind the interaction between different class.
Edge from a to b has a weight proportional to the probability of finding a given b in a fashion set.
For each item we find the features, and they are used to update the initial states of the corresponding class nodes.
To overcome the problem of GGNN, we maintain two different weight matrices corresponding to input and output for each one of the nodes separately, unlike GGNN where there is a common weight matrix for all edges.
This input output weights are used to find the transformation function of any edge in the graph.
Apart from this propagation is same as GGNN.
After T propagation steps, we can obtain the final state of nodes,which are also the final node embeddings.
On top of this an attention mechanism is added, to figure out the compatibility of different items in the set.
For example, a pair of socks is not compatible with a fashion set containing flip flops.
Since a node receives state information from other nodes, this attention mechanism is easily incorporated.
A simple graph.
The areas of circles and the widths of the links are proportional to the number of fashion items in the corresponding categories and the co-occurrence frequency between categories.
Some amazing papers and videos I’ve referenced —Graph Neural Networks:A Review of Methods and ApplicationsGated Graph Sequence Neural Networks Microsoft Research PaperGated Graph Sequence Neural Network video explanation by authorDressing as a Whole: Outfit Compatibility Learning Based on Node-wise Graph Neural Networks.