Evolving Data Models with JanusGraph

So let’s try to evolve our schema with a few quick traversals.

Let’s start by creating a single Performance for each Concert.

We’ll just find each Concert’s firstDate property (since we were only provided with a single date in our Baltimore Symphony dataset), and use its value as the Performance’s performanceDate.

We also connect the Performance to both the Concert and the Work.

g.

V().

hasLabel('Work').

as('w').

in('INCLUDES').

hasLabel('Concert').

as('c').

map(addV('Performance').

as('p').

property('performanceDate', values('firstDate')).

addE('PERFORMED').

from('w').

select('p').

addE('INCLUDES').

from('c')).

iterate()Now, connect the conductor and soloist Artists to each Performance and remove their connections from each Work:g.

V().

hasLabel('Performance').

as('p').

in('PERFORMED').

outE('CONDUCTOR').

as('OLD').

inV().

as('cond').

addE('CONDUCTOR').

from('p').

select('OLD').

drop().

iterate()g.

V().

hasLabel('Performance').

as('p').

in('PERFORMED').

outE('SOLOIST').

as('OLD').

inV().

as('soloist').

addE('SOLOIST').

from('p').

select('OLD').

drop().

iterate()Finally, we connect the Orchestra to each individual Performance.

g.

V().

hasLabel(‘Performance’).

as(‘p’).

in(‘PERFORMED’).

in(‘INCLUDES’).

out(‘ORCHESTRA’).

addE(‘ORCHESTRA’).

from(‘p’).

iterate()For this model, we’re also keeping the existing connection between the Orchestra and the Concert.

This will make certain query pattens more concise, and the relationship between the Concert and the primary performance group (the Orchestra) more explicit.

The short story of course is that there’s no single “right” answer…it all depends on what you’re trying to do with your data, and what questions you’re trying to answer.

Our Performances should now have Conductor, Orchestra and Soloist vertices attached by their respective labels:g.

V().

hasLabel(‘Performance’).

outE().

inV().

path().

by(label)==>[Performance,CONDUCTOR,Artist]==>[Performance,ORCHESTRA,Orchestra]==>[Performance,SOLOIST,Artist]==>[Performance,CONDUCTOR,Artist]==>[Performance,ORCHESTRA,Orchestra]==>[Performance,CONDUCTOR,Artist]==>[Performance,ORCHESTRA,Orchestra]Our Works, on the other hand, should only be linked to a composing Artist and specific Performances of the Work:g.

V().

hasLabel(‘Work’).

outE().

inV().

path().

by(label)==>[Work,COMPOSER,Artist]==>[Work,PERFORMED,Performance]==>[Work,COMPOSER,Artist]==>[Work,PERFORMED,Performance]==>[Work,COMPOSER,Artist]==>[Work,PERFORMED,Performance]We can also make a few confirmations with some simple assert statements:// 3 Performances were created// Each has connections to Conductor, Soloist, and Orchestraassert 3 == g.

V().

hasLabel('Performance').

count().

next()assert 3 == g.

V().

hasLabel('Performance').

out('CONDUCTOR').

hasLabel('Artist').

count().

next()assert 1 == g.

V().

hasLabel('Performance').

out('SOLOIST').

hasLabel('Artist').

count().

next()assert 3 == g.

V().

hasLabel('Performance').

out('ORCHESTRA').

hasLabel('Orchestra').

count().

next()// Conductor, Soloist, Orchestra are NOT directly connected to Worksassert 0 == g.

V().

hasLabel('Work').

outE('CONDUCTOR').

count().

next()assert 0 == g.

V().

hasLabel('Work').

outE('SOLOIST').

count().

next()assert 0 == g.

V().

hasLabel('Work').

outE('ORCHESTRA').

count().

next()Perfect.

Our final graph, data and all, should look like this:The diagram may be a bit crowded, but our model allows for concise access to all of our data (The INCLUDES edges between Concert and Performance have been excluded for readability)We can now easily find composers who have conducted their own works, as well as retrieve the details of the performance.

g.

V().

hasLabel(‘Artist’).

as(‘a’).

in(‘COMPOSER’).

out(‘PERFORMED’).

out(‘CONDUCTOR’).

where(eq(‘a’)).

values(‘lastName’)==>Salonen// Or more verbosely to view the pathg.

V().

hasLabel(‘Artist’).

as(‘a’).

inE(‘COMPOSER’).

outV().

outE(‘PERFORMED’).

inV().

outE(‘CONDUCTOR’).

inV().

where(eq(‘a’)).

path().

by(‘lastName’).

by(label).

by(‘title’).

by(label).

by(‘performanceDate’).

by(label).

by(‘lastName’)==>[Salonen,COMPOSER,Cello Concerto, PERFORMED,3/9/2017,CONDUCTOR,Salonen]We can also hone in even more closely on what Esa-Pekka Salonen has been doing — for example, what orchestras has he conducted?g.

V().

has(‘Artist’, ‘lastName’, ‘Salonen’).

in(‘CONDUCTOR’).

out(‘ORCHESTRA’).

values(‘name’)==>New York Philharmonic==>Chicago Symphony OrchestraWell, that concludes this look into data modeling with JanusGraph.

We’ve seen that it’s easy to incrementally improve the schema as we go — and in doing so take full advantage of the unique flexibility that a graph data system provides.

FootnotesThis is a distinction that lies at the heart of the music royalty and performance rights system.

That system requires a much longer discussion, but suffice it say that if we want to use our graph to understand and manage detailed music performance data, we need to have this distinction as a central part of our graph.. More details

Leave a Reply