Great Books for Data Science

Book description: A former Wall Street…weaponsofmathdestructionbook.

com2.

The Informationby James GleickThis book begins at the beginning; the emergence of information as a transferable entity.

The Information tracks every step in the development of data from tribal drums to the written word to today’s massive data centers.

Personally, the concept of information was not something I had deeply questioned before reading this book, but it definitely forces you to the point of existential crisis if you try to avoid it.

After reading this book I had to learn more about information theory which led me to The Idea Factory, Claude Shannon, and the true basis of the modern world.

But that is for another list…The Information: A History, a Theory, a FloodThe Information is so ambitious, illuminating and sexily theoretical that it will amount to aspirational reading for…around.

com3.

Everybody Liesby Seth Stephens-DavidowitzTo be honest, when I saw the title of this book I found it a little off-putting.

It was clearly created to be eye-catching like a self-help book for the masses.

From the first page though it offered the most fascinating insights into the possibilities of good data science I’ve ever seen.

Seth Stephens-Davidowitz uses examples from his time as a data scientist at Google to show how search information alone can be used to get at deeper subconscious truths about our collective society.

The point isn’t that people lie as individuals, it’s that on aggregate our beliefs about ourselves can be shown to be highly elevated when compared with our online behavior.

If the goal of data science is to find ways to capture, model and then make use use of systemic consistencies (like physicists and engineers have up to this point), then it is vital to have an accurate representations of the environment, which is really what this book is about.

Underneath overly idealized models, individual biases and expectations there is the reality of the system, hiding in the data.

About Everybody Lies – Seth Stephens-DavidowitzHowever, we no longer need to rely on what people tell us.

New data from the internet-the traces of information that…sethsd.

com4.

Dataclysmby Christian RudderWhile I was an undergrad, OkCupid and Tinder were two of the companies I thought would be most exciting to work for.

The data sets of these online dating giants may be rivaled only by Facebook in terms of the fundamental sociological information they contain.

Not only that, but there is incredible potential to use data and feedback systems to improve outcomes for customers, gaining insights into the patterns of human behavior along the way.

This book is basically those insights.

Written by Christian Rudder, co-founder and head of analytics at OkCupid, Dataclysm is about how data has been used, what we have learned and what happens next.

One of the best things about this book is the use of visualizations.

Creating representations of information that are intuitive and easy to digest is a huge (and often overlooked) side of data science.

This book inspires by both showing and telling what you can do with data.

Dataclysm by Christian Rudder | PenguinRandomHouse.

com: BooksAn NPR Best Book of 2014A Globe & Mail Best Book of 2014A Brain Pickings Best Science Book of 2014A Bloomberg Best Book…www.

penguinrandomhouse.

com5.

Big Databy Viktor Mayer-Schönberger and Kenneth Neil CukierBig data is a buzz word these days.

The authors of this book are well aware of the hype, but rather than write this off as another fad, they work to explain the legitimate opportunities that new information technologies have created.

Big data has become a cliche because people and businesses use the term in such an ethereal way, so it’s easy to forget to ground the concept of ‘data’ that is somehow ‘bigger’ in real world explanations and meaningful definitions.

This book goes beyond background, existing applications and definitions, to explain the incredible potential that is contained in massive data sets, and how data scientist will someday be developing systems and automation that could drive businesses, social policy and entire nations.

Big DataWelcome to "big data" – the idea that we can do with a vast amount of data things that we simply couldn't when we had…www.

big-data-book.

com6.

Algorithms to Live Byby Brian Christian and Tom GriffithsAlgorithms to Live By is a book that anyone could benefit from.

Not only are the technical concepts incredibly powerful, but the book focuses on the accessibility and practicality of each algorithm when navigating daily life.

People with a CS background probably have the most to gain from this book.

While courses on algorithms, data structures and discrete mathematics teach how to use the tools of a software engineer, this book makes CS all the more accessible by explaining where the tools come from, why they are incredibly useful, and how to begin applying them to daily life.

While the book doesn’t teach you how to write a merge sort in C, it will give you real world examples of how libraries and search engines continually sort their data sets, the compromises made to ensure fast searches, the trade-offs of space and time complexity, and an intuitive explanation for the hard theoretical limits to optimal searching and sorting.

This is all covered in one chapter of this pretty incredible book.

Every data scientist should have a thorough understanding of algorithms, data structures and discrete mathematics but whether you have the technical knowledge or not, this book offers to stretch your thinking.

Algorithms to Live By: The Computer Science of Human DecisionsA fascinating exploration of how computer algorithms can be applied to our everyday lives, helping to solve common…algorithmstoliveby.

com7.

The Signal and the Noiseby Nate SilverThe Signal and the Noise is probably one of the most popular statistics books around.

The name that Nate Silver chooses comes from an analogy that is often used in data science: identifying the relevant ‘signal’ that is correlated to the solution of a given problem from within a ‘noisy’ data set or system.

The world is full of distractions, and many of the things that end up effecting our decision making are diverting our attention away from indicators that are more closely correlated to our objectives.

As an example, Silver used to work in saber-metrics creating useful statistics relating to baseball strategy.

When baseball scouts go out looking for new players, they tend to rely on intuition: the skills of players that seem immediately evident in their relative size, speed or ‘skill’.

For the past century teams have relied on scouts to help them field winning teams, based on the assumption that the intuition of experienced scouts is the key to finding the best players.

Over the past few decades, professional statisticians and sabermetricians have been displacing whole scouting departments.

It turns out that the scouts — and by extension the baseball teams, were being led off course by ‘noise’ (the intuition of the scouts based on a limited set of observations).

Teams that rely on statistical judgements when fielding their teams have proven far better at achieving their objective — winning games.

This is just the tip of the iceberg.

The Signal and the Noise flows seamlessly through political strategy, social dynamics, gambling, monte-carlo methods, and the basics of game theory (like many books this focuses on the tragedy of the commons, the ultimatum game, and the iterated prisoners dilemma to outline the ideas behind game theory).

The insights presented and the underlying mindsets (being a fox) taught in The Signal and the Noise are indispensable.

The Signal and the Noise by Nate Silver | PenguinRandomHouse.

com: BooksOne of Wall Street Journal 's Best Ten Works of Nonfiction in 2012 "Not so different in spirit from the way public…www.

penguinrandomhouse.

com8.

Complex Adaptive Systemsby John Miller and Scott PageComplex Adaptive Systems is probably the most important book in this list.

It isn’t the easiest book or the most exciting, but the perspective it assumes is incredibly powerful.

The idea of a complex adaptive system is one in which knowing the behaviors or constraints on the component parts does not allow you to make actionable predictions on the behavior of the whole.

In complex adaptive systems there is an emergent complexity that makes the whole greater than the sum of its parts.

There are infinite examples of systems that fit these criteria: individual cells are complex adaptive systems, individual people are complex adaptive systems, individual cities are complex adaptive systems and of course the whole biosphere is a complex adaptive system.

Being able to identify complex adaptive systems as a ubiquitous form in the Universe is one thing, but this book is really about modeling and dissecting these systems to get at their inner structure.

Whether we are aware of it or not data scientists are at the front lines when it comes to dealing with complex adaptive systems, so having the ability to both recognize when this perspective is appropriate and then utilize the tools to model and dissect these system will only become more important as the problems we deal with get increasingly complex.

The concepts in this book come fast and thick, so it’s worth looking at another source to strengthen understanding.

I recommend signals and boundaries, which approaches these ideas from yet another perspective.

Complex Adaptive SystemsThis book provides the first clear, comprehensive, and accessible account of complex adaptive social systems, by two of…press.

princeton.

edu9.

Antifragile/Incertoby Nassim Nicholas TalebI chose Antifragile because it was the book that I found most impactful, but each of the 4 books in Nassim Nicholas Taleb’s Incerto series are trans-formative in their own right.

They loosely reference one another, but you would still be well served as a data scientist if you chose to only read Antifragile.

The Incerto series is essentially a tirade against how modern systems deal with uncertainty and the weakness of the assumptions that underlie behaviors and systems.

It comes from a place of bold experience, careful observation and tons of discipline.

Taleb lays out a complete worldview in this series, complete with motives, background, justifications, counterexamples and anecdotes all in support of a central thesis.

It is the work of a perfectionist, totally baffled by the weakness and stupidity built into the modern world.

I could probably write 4 articles about these books alone, but Antifragile was the one that really resonated for me in terms of data science.

Antifragility is when systems become more robust and resilient as they are exposed to greater disorder.

Given that the book is called Antifragile (a concept Nicholas Taleb seems to have conjured out of thin air in order to explain his worldview), I will leave the more detailed definitions to the book.

While nature demonstrates antifragility at every level, fragility is almost an inbuilt assumption of modern systems.

I studied Physics in school, and what most intrigued me was chaos theory or nonlinear dynamics.

When you get deeply into studying the physical world, it seems that chaos underlies all emergent order.

How do intricate ordered systems like the human body develop under these circumstances?.The answer is that the successful propagation of any ordered system is a function of its antifragility.

Once the modern world finally catches up to Nassim Nicholas Taleb, antifragility and the other concepts in the Incerto series (e.

g.

the black swan power-law distribution, recognizing randomness or noise) will be HUGE for data science and engineering.

Better to get a head start.

IncertoBed of Procrustes is a standalone book in Nassim Nicholas Taleb's landmark Incerto series, an investigation of opacity…www.

penguinrandomhouse.

com10.

Superforecastingby Philip E.

Tetlock and Dan GardnerOne of the most important applications for data science and statistics is in forecasting.

In a fast moving world people, businesses and society as a whole will live and die on the ability to accurately predict future events and respond appropriately.

Superforecasting is about the skills and habits of people who are capable of making incredibly accurate future predictions consistently.

It presents these strategies in the context of a famous experiment where teams competed to predict future events.

Tetlock’s team was made up of volunteers from around the US and Canada — people with no special qualifications, just time and interest in the subject.

In spite of their lack of credentials, Tetlock was able to outperform top analytics firms and the CIA using only these crowd-sourced predictions.

Through a mix of psychological insights and character studies, Tetlock reveals the traits and practices that make anyone capable of predicting future events with great precision and accuracy.

When I began reading this book I was a little disappointed by the limited scope of the cases chosen (mostly from his study on volunteers), but the book ends up doing an excellent job of exposing and explaining the strategies that make someone a superior forecaster.

While the formal statistical methods are kept to a minimum, the mindset and heuristics that allow someone to make actionable predictions are vital to any data scientist.

Superforecasting by Philip E.

Tetlock, Dan Gardner | PenguinRandomHouse.

com: BooksA New York Times Editors' ChoiceA Washington Post BestsellerA Hudson Booksellers Best Business Interest Book of…www.

penguinrandomhouse.

com.

. More details

Leave a Reply