Understanding data

Probably not.

We only did this because we could.

A real analyst, on the other hand, excels at the science of looking at data quickly and the art of looking where the interesting nuggets lie.

If they’re good their craft, they’re worth their weight in gold.

What is a distribution?If these 27 items are the everything we care about, then this sample histogram I’ve just made also happens to be the population distribution.

That’s pretty much what a distribution is: it’s the histogram you’d get if you applied hist() to the whole population (all the information you care about), not just the sample (the data you happen to have on hand).

There are a few footnotes, such as the scale on the y-axis, but we’ll leave those for another blog post — please don’t hurt me, mathematicians!If our population is all packaged foods ever, the distribution would be shaped like the histogram of all their weights.

That distribution exists only in our imaginations as a theoretical idea — some packaged food products are lost to the mists of time.

We can’t make that dataset even if we wanted to, so the best we can do is make guesses about it using a good sample.

What is data science?There’s a variety of opinions, but the definition I favor is this one: “Data science is the discipline of making data useful.

” Its three subfields involve mining large amounts of information for inspiration (analytics), reasoning carefully about incomplete data to make decisions wisely (statistics), and using patterns in data to automate tasks (ML/AI).

All of data science boils down to this: knowledge is power.

The universe is full of information waiting to be harvested and put to good use.

While our brains are amazing at navigating our realities, they’re not so good at storing and processing some types of very useful information.

That’s why humanity turned first to clay tablets, then to paper, and eventually to silicon for help.

We developed software for looking at information quickly and these days the people who know how to use it call themselves data scientists or data analysts.

The real heroes are those who build the tools that allow these practitioners to get a grip on information better and faster.

By the way, even the internet is an analytics tool — we just rarely think of it that way because even children can do that kind of data analysis.

Memory upgrades for allEverything we perceive is stored somewhere, at least temporarily.

There’s nothing magical about data except that it’s written down more reliably than brains manage.

Some information is useful, some is misleading, the rest is in the middle.

The same goes for data.

We’re all data analysts and always have been.

We take our amazing biological capabilities for granted and exaggerate the difference between our innate information processing and the machine-assisted variety.

The difference is durability, speed, and scale… but the same rules of common sense apply in both.

Why do those rules go out the window at the first sign of an equation?I’m glad we celebrate information as fuel for progress, but worshipping data as something mystical makes no sense to me.

We’re all data analysts and always have been.

Let’s empower everyone to see themselves that way!.

. More details

Leave a Reply