Friday, March 22, 2013

Estimating Part II

So what is an estimate really?  Maybe a stupid question but I think it is important that all parties involved, from developers and testers to the most senior stakeholders agree on the meaning of the estimates provided, and that includes agreement on its definition. So my definition for an estimate in the context of Agile and Scrum is the following: An estimate for a deliverable is a range or set of possible values for which that deliverable could be delivered according to the previously agreed upon definition of done, with a corresponding probability for each of the values in the range. The values can be expressed in story points, man days or dollars.
So basically we have a range or set of values with their corresponding probabilities. In statistics, that is the definition of a probability distribution. Assuming that you arrive at your estimates using planning poker, I am going to treat an estimate as a set of discrete values with their corresponding probabilities here.
When it comes to estimates, we want accuracy, precision and confidence. Accuracy is difficult to predict beforehand, after all, it is just an estimate. But there are some assumptions that if true lead to more accurate estimates and when false will lead to inaccurate estimates. These are:
1.  The user stories and epics have are independent, i.e. they can be implemented and tested and have value independent of the other ones
2.  The team performing the estimates is overall knowledgeable and experienced enough in the technology, business domain and estimating techniques. That does not mean they all need to be experts in all 3 areas, but working with a team consisting solely of recently graduated juniors is going to make this a much more difficult exercise than when all of your team members have more than 10 years working together on similar projects with similar technology.

On precision and confidence we can say something by doing some statistical analysis on the estimates provided by the team.

Say we have a team of 5 and we are playing planning poker to estimate 10 user stories and epics for a new project. Some user stories are very detailed and small and will lead to a lower estimate; other epics are only described on a high level and will be subject to change as the project moves along. These will receive higher estimates. The results of the estimating session are shown in the table below, together with mean estimates (μ) and standard deviations (σ) for the individual stories as well as for the project as a whole:
The last column is the 2-sigma value, which is approximately the limit of the 95% confidence interval if we assume that the estimates for a single story can be modeled as a normal distribution. That is of course for a single user story obviously not the case, but according to the central limit theorem, if we add up (or more precisely, convolute) the estimates for each of the user stories, then the probability distribution for the overall project will approach a normal distribution, the more so for more user stories, the less so for less user stories. In any case we may take the overall standard deviation of 46.58 story points as more valid for the entire project than the standard deviations of the individual user stories. So we have an estimate for the entire project of 87.2 ± 46.6 story points, with 95% confidence.

These 46.6 story points are 53.4 % of the mean so our precision with 95% confidence is ±53.4%. For user story A the 2-sigma value is 1.10, or 78.6% of the mean, so our precision with 95% confidence for this single user store is ±78.6%. So by convoluting estimates for several user stories, we get a more precise estimate for the overall project than for a single small user story. And that isn’t so strange if you think about it; if you estimate a single user story, you may over- or under-estimate it, and that’s that. But when estimating several user stories and epics, some will be overestimated and others will be underestimated and these will cancel each other out. The more user stories and epics you have the more opportunity there is for estimating errors to be cancelled out.

So what does this all mean? Well, a first conclusion is that you are more likely to come up with precise estimate for a large project than for a small one. This is a consequence of the central limit theorem. If I look back on my own projects, indeed I have had cost overruns between 1 – 12% on larger projects, but much larger ones on some of the small projects. All this for taking average estimates as my budget, plus a risk premium that didn’t even come close to the above mentioned 78% or 53%. OK, lesson learned.

A second conclusion is that you need to think about the confidence level that you could live with for your project estimates. If you want a 95% confidence, you need to submit a budget of μ + 2σ, if you want 99%, it will be μ + 3σ. If you can live with 70%, then μ + σ. What is this means is that you need to decide on the percentage of projects that have cost overruns you can live with. That number is your confidence interval.

1 comment: