Create a Data Marvel: Develop a Full-Stack Application with Spring and Neo4j — Part 2

Data is extremely messy, so I will admit that these steps in the process took the longest to wade through and finally come up with import statements that worked and produced clean, unified data to match our data model (image below for review).Marvel comics data model in Neo4jData ImportNow that we created a sensible data model, we needed to import the data from the API into Neo4j to see if the model actually worked or if we needed to adjust it..This is where Neo4j’s flexible data modeling is really valuable, allowing us to create and tweak the model, as well as refactor the data to match any model changes.For our project, the data resided in an API, and we needed to call the endpoints to retrieve segments and insert them into our Neo4j model..For this API import, we used apoc.load.json procedure, which reads the JSON-formatted responses from the API and uses Cypher statements you write for specifying how you want to insert or update the data in Neo4j.Because APOC is an extension library, you can either download the GitHub project or install it within the Neo4j Desktop application and use it just as you would any other Cypher procedure or function..We start with the API endpoint url we need for characters and include a few url parameters Marvel needs to retrieve the data.The nameStartsWith= url parameter allows us to insert the starting letter from our previous statement loop and retrieve characters who have a name starting with that letter..Now, we need to sift through the object and insert the data to Neo4j..The Marvel API result gives us some high-level details (number retrieved, call status, etc) and nests the character data under a results[] section, so the next line unwinds that object and navigates the nested structure to get the subsection that has the character data.Once we have gotten here, we pass that object (using WITH) to the WHERE criteria to check if the characters have any comics or not..If a character does not have any comics, then it will not have any relationships to other entities, and therefore is not meaningful data for this project..For our project, we only cared about the characters who have comics because we wanted to focus on the relationships between entities.The next section of code actually inserts the data into Neo4j using Cypher statements..If you run a couple of quick queries like the ones below, you can verify that data was inserted and that the values look good and the translation worked.MATCH (n:Character)RETURN n LIMIT 25;MATCH (n:Character)RETURN count(n);Ok, everything looks good, and we now have plenty of characters to work with!What I LearnedAfter quite a bit of research and query tests, the query given and explained above is what we used to get the first round of character data from a finicky API hosted by Marvel into our local instance of Neo4j as a graph data model..Below are my key takeaways from this part of the data import process.It took a lot of time to maneuver through the Marvel Developer Portal restrictions on their API and find the best approach to gathering as much data as possible within the bounds.My Cypher improved a LOT from seeing the data go from one source to another and finding where I hadn’t expressed the Cypher correctly.You need a practical example to apply what you have learned..Even finding something small and simple (which this was not) will allow you to experiment with a data set hands-on and gain deeper understanding.Next StepsIn the next post, we will cover the remaining steps to import the rest of the Marvel data set and show the fully-populated database.. More details

Leave a Reply