The Data Fabric for Machine Learning Part 1-b – Deep Learning on Graphs

Disclaimer: This is not the second part of the past articleon the subject; it’s a continuation of first part putting the emphasis on deep learning.

   We are in the process of defining a new way of doing machine learning, focusing on a new paradigm, the data fabric.

In the past article I gave my new definition of machine learning:Machine learning is the automatic process of discovering hidden insights in data fabric by using algorithms that are able to find those insights without being specifically programmed for that, to create models that solves a particular (or multiple) problem(s).

The premise for understanding this it’s that we have created a data fabric.

For me the best tool out there for me for doing that is Anzo as I mentioned in other articles.

 You can build something called “The Enterprise Knowledge Graph” with Anzo, and of course create your data fabric.

But now I want to focus on a topic inside machine learning, deep learning.

In another article I gave a definition of deep learning:Deep learning is a specific subfield of machine learning, a new take on learning representations from data which puts an emphasis on learning successive “layers” [neural nets] of increasingly meaningful representations.

Here we’ll talk about a combination of deep learning and graph theory, and see how it can help move our research forward.

   GeneralSet the basis of doing deep learning on the data fabric.

Specifics   If we can construct a data fabric that supports all the data in the company, the automatic process of discovering insights through learning increasingly meaningful representations from data using neural nets (deep learning) can run inside the data fabric.

   Normally we create neural nets using tensors, but remember that we can also define a tensor with a matrix and graphs can be define through matrices.

In the documentation of the library Spektral they state that a graph is generally represented by three matrices:I wont go into details here but if you want a more complete overview on deep learning on graphs check out Tobias Skovgaard Jepsen’s article:How to do Deep Learning on Graphs with Graph Convolutional Networks Part 1: A High-Level Introduction to Graph Convolutional Networks towardsdatascience.

comThe important part here is the concept of Graph Neural Networks (GNN).

  Graph Neural Networks (GNN) The idea of GNN is simple: to encode structural information of the graph, each node v_i can be represented by a low-dimensional state vector s_i , 1 ≤ i≤ N (Remember vectors can be thought as rank 1 tensors, and tensors can be represented with matrices).

The tasks for learning deep model on graphs can be broadly categorized into two domains:   The author defined Spektral as a framework for relational representation learning, built in Python and based on the Keras API.

  InstallationWe will be using MatrixDS as the tool or running our codes.

Remember than in Anzo you will be able to take this code and run in it there too.

The first thing you need to do is to fork the MatrixDS project:MatrixDS | The Data Project Workbench MatrixDS is a place to build, share and manage data projects at any scale.

community.

platform.

matrixds.

comClick on:you will have the library installed and everything working :).

If you are running this outside remember that the framework is tested for Ubuntu 16.

04 and 18.

04, and you should install:And then install the library with:  Data representationIn Spektral, some layers and functions are implemented to work on a single graph, while others consider sets (i.

e.

, datasets or batches) of graphs.

The framework distinguish between three main modes of operation:For example if we runWe will be loading the data in sigle mode:Our Adjacency matrix is:Out note attributes are:And our edge attributes are:  Semisupervised classification with Graph Attention layers (GAT)Disclaimer: I’m assuming that you know Keras from here.

For more detail and code view:MatrixDS | The Data Project Workbench MatrixDS is a place to build, share and manage data projects at any scale.

community.

platform.

matrixds.

comA GAT is a novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers.

In Spektral the GraphAttention layer computes a convolution similar to layers.

GraphConv, but uses the attention mechanism to weight the adjacency matrix instead of using the normalized Laplacian.

The way they work is by stacking layers in which nodes are able to attend over their neighborhoods’ features, that enables (implicitly) specifying different weights to different nodes in a neighborhood, without requiring any kind of costly matrix operation (such as inversion) or depending on knowing the graph structure upfront.

 The model we will use is simple enough:Btw, the model is quite big:So if you don’t have that much power lower the number of epochs an play with it.

Remember that is very easy to upgrade your MatrixDS account.

Then we train it (this may take some hours if you don’t have enough power):Get the best model:And evaluate it:See more in the MatrixDS project:MatrixDS | The Data Project Workbench MatrixDS is a place to build, share and manage data projects at any scale.

community.

platform.

matrixds.

com   If you remember from last part that if we have a data fabric:An insight can be thought as a dent in it:And if you are following this tutorial in the MatrixDS platform, you realized that the data we are using is not a simple CS, but we provided the library with:And that was stored is a series of files:So this data lives in a graph.

And what we did was loading that data to the library.

Actually you can transform your data from and to NetworkX, numpy and sdf format in the library.

This means, that if we are storing our data in a data fabric, we have our knowledge-graph so we already have a lot of these features, what we have is to find a way of connecting it with the library.

That’s the tricky part right now.

And then we can start finding insights in your data fabrics through the process of running deep learning algorithms on the graphs inside of it.

The interesting part here is that there could be ways of running these algorithms in the graph itself, and for that we need to be able to build models with data stored inherently in the graph’s structure, there’s a very interesting approach for that with Neo4j by Lauren Shin here:Graphs and ML: Multiple Linear Regression Same neo4j linear regression procedures, now unlimited independent variables!.More functionality without additional… towardsdatascience.

comBut it’s also a work in progress.

I imagine the process to be something like this:Meaning that the neural net could live inside the data fabric, and the algorithms will be running with the resources inside it.

One important thing I’m not even mentioning here is the concept of non-euclidean data but I’ll go there later.

   It’s possible to run deep learning algorithms on the data fabric by deploying graph neural nets models for the graph data we have, if we can connect the knowledge-graph with the Spektral (or other) library.

Besides standard graph inference tasks such as node or graph classification, graph-based deep learning methods have also been applied to a wide range of disciplines, such as modeling social influence, recommendation systems, chemistry, physics, disease or drug prediction, natural language processing (NLP), computer vision, traffic forecasting, program induction and solving graph-based NP problems.

See https://arxiv.

org/pdf/1812.

04202.

pdf.

The applications are endless, this is the beginning of a new exiting era.

Stay tuned for more 🙂  Bio: Favio Vazquez is a physicist and computer engineer working on Data Science and Computational Cosmology.

He has a passion for science, philosophy, programming, and music.

He is the creator of Ciencia y Datos, a Data Science publication in Spanish.

He loves new challenges, working with a good team and having interesting problems to solve.

He is part of Apache Spark collaboration, helping in MLlib, Core and the Documentation.

He loves applying his knowledge and expertise in science, data analysis, visualization, and automatic learning to help the world become a better place.

Original.

Reposted with permission.

Related: var disqus_shortname = kdnuggets; (function() { var dsq = document.

createElement(script); dsq.

type = text/javascript; dsq.

async = true; dsq.

src = https://kdnuggets.

disqus.

com/embed.

js; (document.

getElementsByTagName(head)[0] || document.

getElementsByTagName(body)[0]).

appendChild(dsq); })();.. More details

Leave a Reply