For those interested in optimizing portfolios, look at OptimalPortfolio.I must agree, the name shrinkage is quite a strange one, but in essence what shrinkage estimators do is that they ‘shrink’ the estimate with high bias to an estimate with high variance..In other words, it is the sum of an estimator with high variance and an estimator with high bias, with some weighting between the two..Although it sounds easy, the difficulty comes in deciding which estimators to use and how to optimal weigh the estimators.Bias vs VarianceFirst, we shall begin by understanding the trade-off between bias and variance..This trade-off in statistical estimation is very similar to the bias-variance trade-off encountered in machine learning..Some estimators have more bias than others and some have more variance than others..A way to illustrate this is the following..Consider the sample covariance of a set of observations..The sample covariance is an unbiased estimator of the population covariance, with the following form:However, this estimator only works well when the amount of data is large..So when the dataset is small, the variance of the estimator is large..Now consider another estimator with lower variance and error, for example, the Maximum Likelihood Estimator..The form of MLE estimator for a normal distribution assumption will beObviously, since the distribution is a simple Gaussian, the difference between the two estimates is not significant..Nevertheless, it allows us to prove the usefulness of shrinkage estimators.. More details