Probability and Statistics explained in the context of deep learningPhoto by Josh Appel on UnsplashThis article is intended for beginners in deep learning who wish to gain knowledge about probability and statistics and also as a reference for practitioners.In my previous article, I wrote about the concepts of linear algebra for deep learning in a top down approach ( link for the article ) (If you do not have enough idea about linear algebra, please read that first).The same top down approach is used here.Providing the description of use cases first and then the concepts.All the example code uses python and numpy.Formulas are provided as images for reuse.Table of contents:IntroductionFoundations of probabilityMeasures of central tendency and variabilityDiscrete probability distributions, binomial distributionContinuous probability distributions, uniform and normal distributionsModel Accuracy measurement toolsRandom process and Markov chainsProbabilistic programmingExternal resourcesIntroduction:Probability is the science of quantifying uncertain things.Most of machine learning and deep learning systems utilize a lot of data to learn about patterns in the data.Whenever data is utilized in a system rather than sole logic, uncertainty grows up and whenever uncertainty grows up, probability becomes relevant.By introducing probability to a deep learning system, we introduce common sense to the system.Otherwise the system would be very brittle and will not be useful.In deep learning, several models like bayesian models, probabilistic graphical models, hidden markov models are used.They depend entirely on probability concepts.Real world data is chaotic.Since deep learning systems utilize real world data, they require a tool to handle the chaoticness.It is always practical to use a simple and uncertain system rather than a complex but certain and brittle one.The versions of probability and statistics presented here are a highly simplified versions of the actual subjects..In the above neural network, the input vector x is a random variable, the output ‘prediction’ is a random variable, the weights of the neural network is also a random variable(because they are initialized randomly using a probability distribution.)Probability distribution: The probability distribution is a description of how likely the random variable is to take on different values of the sample space.In the neural network, the weights are initialized from a probability distribution.The output vector y follows softmax distribution which is also a probability distribution that shows the probability of X taking different digit values.(In general,softmax provides the probability of categorical values)In this example, the probability distribution y is discrete(having 10 discrete values.) whereas in some other cases, it may be continuous(the sample space is also continuous).In a discrete distribution, the probability distribution is provided by a probability mass function(pmf) denoted by P(x=x)..Example: probability of drinking water after eating is very high.Marginal probability:what is the probability of a subset of random variables from a superset of them.Example: probability of people having long hair is the sum of probability of men having long hair and probability of women having long hair.(Here the long hair random variable is kept constant and the gender random variable was changed.)Bayes’ theorem: It describes the probability of an event based on prior knowledge of other events related to that event.Bayes theorem exploits the concept of belief in probability..numpy docsnp.cov(a)Probability distributions:As I mentioned in the beginning,several components of the neural networks are random variables.The values of the random variables are drawn from a probability distribution.In many cases,we use only certain types of probability distributions.Some of them are,Binomial distribution: A binomial random variable is the number of successes in n trials of a random experiment..A random variable x is said to follow binomial distribution when, the random variable can have only two outcomes(success and failure).Naturally , binomial distribution is for discrete random variables..numpy docs.import numpy as npn=10 # number of trialsp=0.5 # probability of successs=1000 # sizenp.random.binomial(n,p,s)Continuous distributions: These are defined for continuous random variables.In continuous distribution, we describe the distribution using probability density functions(pdf) denoted by p(x).Their integral is equal to 1.If you are not comfortable with integral or differential calculus look here,Uniform Distribution: It is the simplest form of continuous distribution, with every element of the sample space being equally likely..Here is an excellent visual explanation of markov chains.This is a markov chain that describes the weather condition.The values represent the probability of transition from one state to another.Markov chains are used for simple systems like next word prediction, language generation, sound generation and many other systems.The extension of markov chains known as hidden markov models are used in speech recognition systems.I have stopped random processes till here and planned for an extensive article on them because of the excessive length of the concept.Probabilistic programming:A new paradigm of programming has evolved known as probabilistic programming.These languages or libraries help to model bayesian style machine learning.It is an exciting research field which is supported by both the AI community and the software engineering community.These languages readily support probabilistic functions and models like gaussian models,markov models, etc.One such library for writing probabilistic programs was created by Uber last year known as pyro which supports python with pytorch(a deep learning library) as the backend.pyro library logoIf you liked this article about probability and statistics for deep learning, leave claps for the article..But for beginners, I would also suggest several other awesome external resources, to reinforce their knowledge in the interesting field of probability.(Though the knowledge that you gained through this article is enough for proceeding in deep learning)External Resources:Awesome free course on deep learning and machine learning: fast.aiIntuitive explanation of calculus: 3blue1brownBest book on deep learning: the deep learning bookRandom process: Sheldon M.. More details