The Infinity Stones of Data Science

This is going to surprise you, but.

its relative.

If we are interested in descriptive analysis — that is, no prediction, along the lines of a straightforward data analysis — the more intimately we are familiar with the data the better.

The end is the means in this case, and so the quality of describing, visualizing, and sharing the data, a la a data analyst, is highly correlated with exploration intimacy.

When it comes to predictive analytics, and machine learning undertakings, there are differing opinions on how much exploratory data exploration is helpful.

There are also differing opinions on the level of exploratory analysis of datasets which are not being used for training (i.


validation and testing sets).

This aside, in order to ensure maximum power over your data space is achieved, be sure to guard against the potential pitfalls of poor exploratory data analysis or shoddy visualizations, such as the correlation fallacy, Simpsons paradox, and the ecological fallacy.

When performed properly, exploratory data analysis will provide an understanding of your data in a way which allows successful data science to follow.

  Time Stone The Time Stone grants its owner the power to re-wind or fast-forward time.

If you have studied algorithm complexity, you know that the choice of algorithm can severely impact the time it takes to complete a given computation task, even with the same data, and it is for this reason that algorithm and method selection is our equivalent of being able to fast-forward time.

This will apply to both the outright selection of algorithms, as well as the setting of hyperparameters which also have an impact on run time.

Neural network architectures can be incredibly complex, but a pair of equivalently simple neural nets can have vastly different convergence times when using vastly different learning rates.

You know about the bias-variance tradeoff, but there is also a space-time tradeoff, as well as a complexity-speed tradeoff that can be made.

One logistic regression model might not perform quite as well as a random forest of thousands of trees, yet that hit to performance may be worth it to you in exchange for speed, to say nothing of the increase in explainability the logistic regression model may provide over the random forest (if thats your thing).

This isnt to say that you must choose a faster (or less complex, or less compute intensive, or more explainable) algorithm, but you need to keep in mind that it is one of the tradeoffs you are making, and one of the best ways we have of controlling the flow of time.

At least, it is one of the best was we have of controlling the flow of time without the real Time Stone.

  Power Stone The Power Stone bestows upon its holder a lot of energy—the sort of energy that you could use to destroy an entire planet.

That sounds like a lot of energy.

Where do we find this sort of energy in the data science world?.Computational power!Computational power (or “compute”) is the collective computational resources we have to throw at a particular problem.

Unlimited compute was once thought to be the be all and end all of computing, and for good reason.

Consider how little compute there was one, two, or three decades ago, in comparison to today.

Imagine scientists sitting around and thinking about problems they could solve, if only they had more than a handful of MHz worth of compute at their disposal.

The sky would be the limit!Of course, thats not exactly how things have turned out.

Sure, we have a lot more compute at our disposal now than we ever have in the past in the form of supercomputers, the cloud, publicly available APIs backed by heaping amounts of compute, and even our own notebooks and smartphones, comparatively.

All sorts of problems we could never have imagined would have enough compute to solve are now tractable, and thats a great development.

We need to keep in mind, however, that “clever” is a great counterbalance to compute, and lots of advancement in data science and its supportive technologies have been made possible by brain over brawn.

Ideally, a perfect balance of brain and brawn could be used to attack every problem out there, with clever being used to setup a perfect algorithmic approach, and the necessary compute being available to back it up.

Perhaps this is an area data science will one day prove to help itself.

Until then, rest assured that there is compute available for even the less than perfect approaches to problem solving.

  Soul Stone It’s unclear what the Soul Stone’s powers are in the film universe.

In the comics, the gem allows the holder to capture and control others’ souls.

“Capture and control others souls” sounds ominous, and more than just a little bit devious.

But if we take a more positive spin on the concept of the Soul Stone, we could force an equivalence between it and the power of prediction.

We are training models to control the innermost essence of unlabeled data — its soul — by making informed predictions as to what it truly holds.

Thats not a stretch at all.

Right?The Soul Stone, then, is analogous to the power of prediction, which is to say it lies at the absolute core of data science.

What are data scientists trying to accomplish?.They are trying to answer interesting questions with available data in order to make predictions which align as closely as possible with reality.

That prediction piece seems pretty crucial.

And, given that it is so crucial, it should be evident that the outcomes of data science should be handled with the utmost care.

Not to oversell it, but the soul of our work is the value it can create (whether that be to business, to charities, to government, to society in general), and taking this lightly, or just treating the predictive outcomes as another of the steps in the data science process, is ill advised.

Data science is a fight for the soul, coupled with the upcoming battle for the mind.

  Mind Stone The Mind Stone allows the user to control the minds of others.

Which of the Infinity Stones allow for the control of others minds?.Thats the Mind Stone, of course, and in the world of data science nothing better helps control the minds of others than a well-crafted data presentation, including a compelling story and effective visualizations.

The modeling is complete.

The predictions have been made.

The insights are.


Its now time to inform the project stakeholders of the outcomes.

But the non-data scientists among us dont have the same interests in, or understandings of, data and the data science process, so we need to be effective in presenting our findings to them in a way they will appreciate.

This isnt some form of condescension or “they just dont get it”; this is an acknowledgement of the reality that everyone has different skills, interests, and roles, and those of the individuals who could most benefit from our work dont necessarily share those of the data scientist.

So tailor your presentation accordingly.

Remember, if your insights dont translate to being useful, then your work isnt complete.

Its up to you to convince others of the value of your work.

Once convinced, they can take action, and change through action is the real pay-off of any data science project.

  Infinity Stone descriptions came from this article.

  Related: var disqus_shortname = kdnuggets; (function() { var dsq = document.

createElement(script); dsq.

type = text/javascript; dsq.

async = true; dsq.

src = https://kdnuggets.



js; (document.

getElementsByTagName(head)[0] || document.


appendChild(dsq); })();.

. More details

Leave a Reply