SQL vs. Graql: Modelling and Querying of Biomedical Data

Graql: Modelling and Querying of Biomedical DataWriting 151 SQL lines in 4 linesTomas SabatBlockedUnblockFollowFollowingDec 3Using SQL to query relational databases is easy..When working with complex data in SQL, we can list the challenges as follows:Complex to write queries — analysing hierarchical data with many relationships leads to SQL statements containing many JOINs which easily become too difficult understand.Slow query speed — as a consequence, queries with a large number of JOINs create complexity in the computing response and lead to high query response times.Complicated data models — domains with complex data require non-intuitive data models with (join) tables that create unnecessary complexity and reduce data quality.The above challenges are especially pertinent when working with biomedical data (see more here)..This is why using a query language such as Graql can reduce the modelling and querying complexity by orders of magnitude (more on that below)..That’s why in this article, I want to look at how Graql compares to SQL when working with biomedical data.At the end of this article, I’ll show you how we can write a 151-line SQL query in just 4 lines in Graql.An Introduction to GraqlIf you’re unfamiliar with Graql, I want to first give a brief high level overview of the modelling constructs (see here for an in-depth introduction).Just as SQL is the standard query language in relational databases, Graql is Grakn’s query language..It’s a declarative language, and allows us to model, query and reason over our data..The example below specifically shows us how genes encode proteins and are associated to diseases.Simple Grakn model: Green nodes are entities, purple diamonds relationships and red boxes attributes.Finally, Rules are logical patterns that we encode in our model which allow us to reason over our existing data to create new instances of entities, relationships and attributes..In other words:If:Genes encode proteins;And genes are associated to diseases.Then:Create protein-disease associations.This rule would be part of our schema, which we could then visualise as follows, where the dotted line represents the inferred relationship:The dotted line attached to the diamond shape represents the inferred relationship.SQL and Graql: Data ModellingWith this basic understanding of Graql (and assuming you understand SQL), let’s compare how we would represent a more complicated model in both Graql and SQL, and let’s find out which one is more intuitive.For this comparison, I want to specifically look at disease networks..In SQL we have to explicitly state all the paths that the query can take, whereas in Grakn we just reason over our data, using rule-based and type-based inferencing.Visualisation of the underlying logic that finds us drugs associated to Asthma..However, when it comes to handling complex data, SQL struggles:It’s complex to write queries;it leads to slow query speeds;and forces us to create complicated data models.Fundamentally, these challenges are inevitable as the relational data model was not created to work with this type of data..Using a much more expressive query language that reasons over our data abstracts away this complexity so that we can focus on the higher level question.If you and your development team face these types of challenges while working with SQL, using Graql will be inevitable.. More details

Leave a Reply