Data Science and Law | What One Lawyer Learned From a 50-Hour Data Science Bootcamp

Based on my criteria, Data Science Dojo’s data science bootcamp fit the bill: it’s a reasonably priced 5-day, 50-hour onsite program that didn’t have any pre-requisites (though there was about 10 hours of pre-class prep).  And the class covered broad ground: in a span of the week I learned both the coding tools like basic R, MS Azure, Hadoop, and Hive along with concepts like data mining and visualization, predictive modeling, Ensemble methods like bagging and boosting, random forests, the importance of cross validation, the difference between training and test data, AB Testing basics, building a recommendation system and handling real time and streaming data (we hacked a quick IoT solution using Azure tools, though truth be told, I was pretty much lost by then).  Below are some of my takeaways on big data, especially as it relates to the legal profession and what it’s like for a lawyer to learn a new skill at an advanced age..Lesson 1 The mechanics of building a predictive model aren’t particularly difficult; understanding what features to include and how to approach the problem is  – and that’s where domain knowledge is important..One of the underlying themes of the class is that data science (itself a buzzword) is merely a collection of skills; intuition and domain knowledge matters as much as coding a predictive model.  Yet oddly, when data science is discussed in the legal profession, we downplay the importance of legal expertise and its value in creating effective models..Lesson 2 Predictive models are iterative and constant questioning is a good thing..Although most lawyers will argue a legal principle ad nauseam, when it comes to data, we’re surprisingly passive.  For the past two years, Clio has released a Trends Report that produced interesting, albeit counter-intuitive  results..Yet the results are reported as is, with no questions as to the methodologies used, what the data means or how it was gathered.  That’s not true data science: it’s group think..Lesson 3 Big Legal Data Isnt all that Big Our instructor shared with us the Five V’s — Volume, Velocity, Variety, Veracity, and Value – which are used to evaluate whether data rises to the level of big data..For volume, we’re talking about huge amounts of data – not terabytes, but exabytes and beyond –  too large to be stored and processed on traditional machines.  For example, on Facebook, 10 billion messages are exchanged each day..It’s hard to imagine many sources of legal data that approach that volume..Our instructor’s point was that we shouldn’t make a data problem into a big data problem unless absolutely necessary..So I wonder whether lawyers are using the term “big data” for small data or treating ordinary data problems as big data problems..Lesson 4 Kaggle Competitions are Way Cool I hadn’t know much about Kaggle before my class.. More details

Leave a Reply