Predicting the task duration based on a range

We discussed that we can fit the estimates (both for the Agile and Waterfall projects) to a Log-Normal distribution, which guarantees the positive support..Since we model estimates using log-normal distribution, our new variables y, l, h will be logarithms of the actual number of days, low and high estimates respectively..Both λ parameters act as regularization parameters, so we will have to tune them#Set the hyperparametersalpha = tf.constant(name='alpha', value=1.0)beta = tf.constant(name='beta', value=1.0)lambda1 = tf.constant(name='lambda1', value=1e-4)lambda2 = tf.constant(name='lambda2', value=1e-4)def loss(l, h, y): return tf.log(1+zeta**2*(h-l)) + rho**2/2/(1+zeta**2*(h-l))**2 * (y – theta_l*l – theta_h*h)**2cummulative_loss = tf.reduce_sum(list(np.apply_along_axis(lambda x: loss(*x), axis=1, arr=log_data )))cost = cummulative_loss – (N+1-2*alpha)/2*tf.log(rho**2) + beta*rho**2 +
ho**2*lambda1/2*(theta_h**2+theta_l**2) + rho**2*lambda2/2*zeta**2learning_rate = 1e-4optimizer = tf.train.AdamOptimizer(learning_rate)train_op = optimizer.minimize(cost)import mathinit = tf.global_variables_initializer()n_epochs = int(1e5)with tf.Session() as sess: sess.run(init) for epoch in range(n_epochs): if epoch % 1e4 == 0: print("Epoch", epoch, "Cost =", cost.eval()) print(f'Parameters: {theta_l.eval()}, {theta_h.eval()}, {rho.eval()}, {zeta.eval()}') sess.run(train_op) best_theta_l = theta_l.eval() best_theta_h = theta_h.eval() best_sigma = 1/math.sqrt(rho.eval())Epoch 0 Cost = 55.26268Parameters: 0.5, 0.5, 0.009999999776482582, 0.009999999776482582Epoch 10000 Cost = 6.5892615Parameters: 0.24855799973011017, 0.6630115509033203, 0.6332486271858215, 1.1534561276317736e-35Epoch 20000 Cost = 1.39517Parameters: 0.2485545128583908, 0.6630078554153442, 1.3754394054412842, 1.1534561276317736e-35Epoch 30000 Cost = 1.3396643Parameters: 0.24855604767799377, 0.6630094647407532, 1.4745615720748901, 1.1534561276317736e-35Epoch 40000 Cost = 1.3396641Parameters: 0.24855272471904755, 0.6630063056945801, 1.4745622873306274, 1.1534561276317736e-35Epoch 50000 Cost = 1.3396646Parameters: 0.2485586702823639, 0.6630119681358337, 1.4745632410049438, 1.1534561276317736e-35Epoch 60000 Cost = 1.3396648Parameters: 0.2485581487417221, 0.6630115509033203, 1.4745649099349976, 1.1534561276317736e-35Epoch 70000 Cost = 1.3396643Parameters: 0.2485586702823639, 0.6630122065544128, 1.4745644330978394, 1.1534561276317736e-35Epoch 80000 Cost = 1.3396643Parameters: 0.24855820834636688, 0.6630116701126099, 1.4745631217956543, 1.1534561276317736e-35Epoch 90000 Cost = 1.3396646Parameters: 0.248562291264534, 0.663015604019165, 1.474563717842102, 1.1534561276317736e-35What is interesting here is that ζ is zero..This also means that we can just use log-normal distribution around the mean specified by the learned parameters θl and θh..Plugging it into the formulas we see:mu = best_theta_l*math.log(10)+best_theta_h*math.log(15)most_likely_prediction = math.exp(mu) most_likely_prediction10.67385532327305We can also get the 95% confidence, by plugging the values directly into log-normal distribution:from scipy.stats import lognormdistribution = lognorm(s=best_sigma, scale=most_likely_prediction, loc=0)print(f'95% confidence: {distribution.ppf(0.95)}')95% confidence: 41.3614192940211As we see, if we want 95% of confidence, we have to give an estimate of 41 days, instead of 11 days for 50% confidence.. More details

Leave a Reply