# Using Markov Chain Monte Carlo method for project estimation

In particular, we are interested in finding the number of story points we can complete in one iteration with 95% confidence.We will start by defining the likelihood function for the sample of size one:Let’s select the priors for μ and σ.For σ we choose non-informative prior:For μ we choose the conjugate prior which is a Normal distribution with a L2 regularization hyperparameter λ:The posterior probability in this case is proportional to:The joint probability distribution, that will let us calculate percentiles, is therefore:So we can calculate all percentiles by marginalizing over the parameters μ and σ..The answer can be derived analytically, but in our case I want to solve it numerically using MCMC Hamiltonian sampling method.The idea of this method is to get a sample of joined distribution p(x,μ,σ2|x(i))..After that we can calculate percentiles of x..This equivalent to marginalizing by the parameters of the posterior distribution.As I said earlier, the answer can be derived analytically, but we will use a method that can be used for the distributions, for which it is hard or impossible to get a sample..This method is called Markov chain Monte Carlo..The idea of this method is to do a random walk in the variable space, but try to visit more probable areas more frequently, so that in the resulting sample the histogram follows the probability distribution..Some of the unlikely values in this case have to be rejected.A specific flavor of MCMC is the Hamiltonian sampling method..It is somewhat similar to the Gradient Descent algorithm, except that instead of trying to converge to the minimum of the cost function, the step in MCMC is large enough, so that it explores also the less likely regions in the variable space, but tends to visit in the high probable regions more often.For this we need to take a log of the joint distribution function (ignoring the constant):There are many libraries to do Hamiltonian sampling..There are even probabilistic programming languages optimized for this, for example, Stan..But here we will use TensorFlow Probability, a probabilistic library created by Googleimport numpy as npimport pandas as pdimport tensorflow as tfimport tensorflow_probability as tfplamb = 2e-1def log_likelihood(x, mu, sigma2): 'The (negative) log likelihood function for one sample' return (tf.log(x)-mu)**2/2.0/sigma2def get_unnormalized_log_probability(data): def joined_probability(x, mu, sigma2): result = -(2.0+len(data))/2*tf.log(sigma2) – lamb * mu**2/2.0 -log_likelihood(x, mu, sigma2) #sigma2 for datum in data: result -= log_likelihood(float(datum), mu, sigma2) return result return joined_probabilityWe want to make sure our functions are working correctly, and the best way to do that is to write a unit testimport unittestimport mathclass TestUnnormalizedLogProbability(unittest.TestCase): def test_get(self): tf.reset_default_graph() data=np.array([1.0,1.0]) x = tf.constant(value=1.0) mu = tf.constant(value=0.0) sigma2 = tf.exp(1.0) probability_function = get_unnormalized_log_probability(data) probability = probability_function(x, mu, sigma2) init = tf.global_variables_initializer() with tf.Session() as sess: sess.run(init) self.assertTrue(abs(-2-probability.eval())<1e-5) unittest.main(argv=[''], verbosity=0, exit=False);———————————————————————-Ran 1 test in 0.088sOKWe will use the same data as we used earlierdata=np.array([14, 12, 7, 14, 13])# Create state to hold updated `step_size`.step_size = tf.get_variable( name='step_size', initializer=1e-1, use_resource=True, # For TFE compatibility. trainable=False)# Initialize the HMC transition kernel.hmc = tfp.mcmc.HamiltonianMonteCarlo( target_log_prob_fn=get_unnormalized_log_probability(data), num_leapfrog_steps=3, step_size=step_size, step_size_update_fn=tfp.mcmc.make_simple_step_size_update_policy(), seed=1398)# Run the chain (with burn-in).samples, kernel_results = tfp.mcmc.sample_chain( num_results=int(1e5), #num_burnin_steps=int(1e1), current_state=[10.0,2.0,0.2], kernel=hmc)# Initialize all constructed variables.init_op = tf.global_variables_initializer()with tf.Session() as sess: init_op.run() samples_, kernel_results_ = sess.run([samples, kernel_results])all_samples = np.array(samples_)all_samples.shape(3, 100000)Our results has three rows for x, μ and σ respectively..We can take mean of these three variablesall_samples.mean(axis=1)array([14.040146 , 2.4572225 , 0.21323058], dtype=float32)Unexpectedly, the mean of x is higher than we thought it would be (12)..Let's plot the histogram to see the probability distribution:%matplotlib inlineimport matplotlib.pyplot as pltimport seaborn as snsfig, ax = plt.subplots(figsize=(11.7, 8.27))sns.distplot(all_samples[0,:], hist=False);We see that the maximum is around 12, as we would get using maximum posterior distribution.The curve has an irregular shape, but that is somewhat expected for multivariate MCMC results.. More details