Part-1 : StatisticsAjay RathoreBlockedUnblockFollowFollowingApr 20Photo by Stephen Dawson on UnsplashWhile starting my journey in Data Science, I came across this one question — statistics vs data science.
To establish the answer of this hot topic, we need to define them properly with their origin problem statements and need of them.
I will be writing a series of articles over this topic .
We will kick off this journey from this part.
Goals of this part -Origin of StatisticsStatistics from application POVWhat is Statistics ?Let’s takeoff our discussion from here-Try to relate this statement with yourself.
“A child’s survival depends upon being able to predict reliably in certain situations”From just experiencing the event — what can a hot glass of milk do to your palm ? , you predicted /assumed / calculated that what heat can do to your skins ?.
Now, you’ll take precautions to maintain distance from such situations.
Its just one example .
Curiosity and Survival tendency are the bigger pictures.
It turns out every action which we perform, has it’s origin from either one of these two.
Try to think about how/why at sometime point of your life you have asked— How PC works ?, How mailing works ?, How weather changes ?, What is climate ?, Do I have to jump in this pool ?, Do I need to eat food ?, Why am I even reading this blog ?.
And then try to relate these two terms with the origins of these questions and your actions as response to these questions.
In consequence of this surviving tendency or curiosity, whatever the origin is -we start to develop a model of world around us from the perspective of that specific origin.
And this modelling process is done via gathering data from our self experiences and data passed on to us by others through their experiences.
Developing a model means — developing examples, developing a copy , developing system or developing a prototype of an original event or instance.
And model of world means a example of world, a copy of world in our brain.
Everyone’s memory is just a storage of models of this world perceived in diff situations (origins)Why the model ? — The reason is we didn’t create this world around us and there isn’t any documentation from it’s developer’s side also .
So it is really tough to understand, debug and fix this world as per as for our needs and answer the questions related to it— discover new possibilities,test situations, hacks and solutions.
Instead what we do, we create a model which has similar features or as much as the world has and then we setup a fact that — this another world behaves same as first one in following manner(involved assumptions).
Now, whats the point of this new world ?.Well heres the catch- Since we ourselves have made this new world so, we know exactly how things are working in this new world.
We know each and every section , how they are working, their internal dependencies, etc of this new code base.
So, now we can precisely answer any question regarding any event happening in this new world.
It could be anything — whats the input of this event , what will be the output of it at this input and everything possibly that could be.
It ‘s our creation.
And we already have a fact that this new world is equivalent to the real world — so we can also answer the same for the questions of the real world.
All we have to do is ask the same question to our new world where we are already God and we will say that the same will be applied to the real world.
Here is the thing — the assumption of real world and new world are same is reason which sometimes gives us the taste of failure.
It’s very simple the more accurate the projection of real world , we will be able to create the more accurate results.
And this is what science is doing ,debugging this large code base of this universe section by section .
And modelling a new world in which you will not have to wait for God’s approval to answer your questions.
Debugging is more of a coder’s language, In language of science — exploring and modelling the sections in systematic way is called research.
I hope the origin is clear here.
Because now whenever we will have any doubt we will fall back to this background and again start from here.
You already know what is research ?.and why we need research ?Let’s get started with — How to perform research ?We will going through each step with an example.
Do you remember any event when you have come up with a question in your mind and then you have answered that ques just for the sake of your curiosity or survival instinct ?.Think about it, take your time.
So, if you remember such event then you must be able to see these following things happening during the life cycle of this event.
You have been seeing apples falling from a tree for last 6 months.
And then in a particular moment during 7th month, something has kicked your brain’s normal routine/you have realized — wait a sec why does it look like everything is falling back on groundYou threw a pebble, what ?it also came back to ground, you have tried one more shot- tossed a coin it again came back to ground.
You — “so it is true that things fall back to earth”.
Nice work .
In more generalized way these four steps can viewed as—knowledge gain from our senses (eyes)observation (a moment of final realization, when you were slightly peeking differently from your normal routine during the knowledge gain).
more data gain and initial verification of observationcoming at initial conclusionThese all four things have very intermixed timeline, so we consider them more of a single step and call it Initial observation .
Now from this initial observation we generate theories.
like — “Earth applies force on objects near it” , “Object’s mass is imaginary” , “Sky pushes back ” etc .
You pick one and then starts with it .
This initial theory is initially supported by our believe in the theory that — yes ,I have seen this, I have felt this, and some logical facts, principles which already exists at that time in your memory agrees with it.
This theory establishment is just to carry out next steps.
In coming steps we will try to generate more support/evidences for this theory so that It could be believed by others also just not only myself.
And while doing it we may need to perform some modification in this version of theory and then again continue the process and it is a recursive cycle.
Next step is generating hypothesis from this theory .
The main focus of this step is to rewrite the theory , in some form which can be empirically evaluated and then we identify variables involved.
Like — “mass of ball, acceleration of ball,” etc.
Now next step is to collect more data to test theory, we already know the variables here ,so we exactly know what to look out for and what are their values.
After collecting data in form of variables our next step is analyze the data.
Analyzing data means checking up it’s consistency and possible deviation/errors.
If the collected data is consistent then we say well this hypothesis is satisfies this data set.
We then add this result to our theory as additional evidence to support out theory.
This testing/ analysis of data is done via various methods like Plotting the data, Fitting a model ,detecting outliers and many other methods.
Here comes the answer of — why we have developed statistics?.Well the answer is now easy for us to digest , we developed statistics to handle the last step Analyze the data.
When we will dig deeper in the last step , we will find that there is more to it rather than just the analysis of data.
From here the definition of statistics evolved as — Branch of mathematics that deals with collection, organization,analysis and interpretation of numerical data.