Sunday, August 31, 2014

The Freeze Period

Today I’d like to talk about a subject on which I have already had some heated discussions in several different settings: The goal and usefulness of a freeze period.

Usually the intention of a freeze period is to freeze the code base in one of the test environments before release to production for a determined period, say one or two weeks, with the aim of avoiding regression bugs after the production release. The argument is that running a period of time in a lower environment without regression issues is a reliable  indication that there will be no regression issues after production release either. I have several observations with respect to this line of reasoning:

  • In order for this to valid, you need to be sure that the infrastructure of your lower environment is pretty much if not exactly the same as your production environment. Less performant hardware for instance may make it difficult to deduce if you have a performance issue with your application. Less physical memory in your lower environment will lead to disk swapping earlier than in your production environment, so you may conclude you have a performance issue when in fact there is none. Same with network equipment, processor speed etc. If you use load balancing in production, you need to use it in your lower environment as well.
  • If your application manages data, then the data in the lower environment must be sufficiently close to the production data both in quantity as in quality in order to detect application issues that are triggered by unexpected but erroneously allowed data. If you miss production data configurations in your lower environment, at some point this will trigger a production incident.
  • And then there is the matter of usage and load. These too need to resemble he actual production situation as close as possible for the freeze period to give rise to any meaningful conclusions.
  • How long is your freeze period going to be? If you want to avoid all incidents that would happen once a year, you would have to organize a full parallel production run on the exact same infrastructure and dataset for an entire year. This obviously has some cost. If you do that only for one week, you will detect all issues that would occur on a weekly basis, half of the issues that would happen once every two weeks and less than a quarter of the issues that would happen once a month. Of the yearly incidents you would detect less than two percent, and you don’t know which two percent, the highest impact or the lowest impact. Conversely, if your code changes would give rise to an incident once a year, your chance of detection it during a one week parallel production run is less than two percent.
  • So suppose you have almost same infrastructure except for server memory, a third of your production data, and you emulate 150% of the production load for several hours per day for a week using 2 standard use cases, but in fact over 10 exist. No issues are detected. How confident are you there will be no issues in production?
  • Another problem is what to do when an issue is in fact detected. If you take the freeze to be an actual hard code freeze, then this raises the question of whether to fix the issue or not. And if you do, are you then going to break the freeze and re-deploy to your lower environment, possibly missing a KPI? And do you then start to count from zero again for your freeze period? And if you do, are you then going to postpone your production release for which go to market activity has already started, missing another KPI? Problems, problems, problems. Usually this situation is resolved by a senior manager grudgingly giving permission to “break the freeze” without postponing the actual release date.

I’ve encountered situations where this dynamic led to some bizarre consequences. For instance, when an issue was discovered during a freeze period in one of my assignments, the first thing the team did was to check if it occurred in production as well. If yes, it was not regarded as a regression. Since the bug existed previously, it did not need to be fixed and no break of the freeze period was warranted. This meant that once a bug made it to production, as long as no one complained to loud about it, it would never be fixed. And if a bug did need to be fixed, instead of being happy that that bug was found and fixed before it got into production, there was disappointment about having broken the freeze.

So does a freeze period makes any sense at all? That depends on what you expect from it. A scope freeze makes sense (but is already achieved if you enforce a fixed sprint backlog), a code freeze much less so if any. Basically the evaluation that needs to be done on any defect is the same no matter when it is detected:

  • Can we still fix this in time for the release?
  • Can we still verify it in time for the release?
  • Which additional tests do we need to re-execute as a result of this fix, and can that be done in time for the release?
  • And if the answer to any of these questions is no, can it wait until the next release or is it a showstopper?


It is clear that you can fix and verify bigger issues 6 weeks before the release than you can 2 days before the release. But then again your highest risk backlog items should always be done first, so the bigger issues should come out early in your project and not right before the production release. If that happens you have problems in the area of backlog prioritization and risk assessment. A mandatory freeze period is not going to address that. A full parallel production run may be very necessary in some cases, like in industrial settings or when replacing your ERP system. But this is not necessarily the same as a freeze, as you will want to verify your fixes. My conclusion is that a freeze period is theoretically a nice idea but there are so many gaps in its concept and concessions to make in its implementation that its practical usefulness is close to zero.


Thursday, August 21, 2014

Governance in Agile Projects

Project governance is a reality in many large software development organizations.  Whether you like it or not, there are many valid reasons for software development needing to comply with governance covering a variety of topics, depending on the context in which the project is executed. Good governance covers and mitigates risks that projects may form for the wider organization without imposing too much of  a cost on the project in terms of speed of delivery and budget. These risks can be related to legal, financial, security, reputational, operational and health and safety risks and this list does not pretend to be complete.

For instance there will be a slew of security and legal requirements that need to be fulfilled if you are developing a personal banking site for a financial institution. A system to manage and consult medical information will be subject to privacy laws. In an industrial setting there will be environmental and safety requirements that may have an impact on IT systems. So how to deal with governance requirements in an Agile project, where you just want to be as productive as you can without being hindered by apparently unnecessary rules that just seem to complicate things and slow you down?

Well, first of all, too much of a good thing is not a good thing, and too much governance, like too much government, tends to have a range of unintended consequences. So when deciding on the specifics of governance that your projects need to comply with, it is important that for each of the rules and standards you intend to implement it is perfectly clear which problem you are trying to solve or which risk you are intending to mitigate. Too often you hear statements like ‘All our projects need to comply with rules so and so because that is just what we decided to do many projects ago based on our experience with these projects and it is since then the way we do things around here’.

Aha, is that so? Well, I’ve got a few questions then:

  • The projects you refer to, what kind of projects were they?
  • Do you still do the same kind of projects technology wise, functionally, organizationally, scale wise, budget wise?
  • Are the risks and issues you identified back then still relevant today?
  • Are the mitigation options still valid?
  • What about new technology since then, does this not allow for addressing some of these risks to be handled differently?
  • Are the people still the same, or have most if not all in the meantime been replaced?
  • For external customer facing projects, are the market expectations still the same, particularly in terms of quality and speed to market?
  • Is your market position still the same
  • Have new regulations come into effect, or existing ones changed or abandoned?

The answers to these questions will likely lead you to revise existing governance frameworks and specific rules. Rules for which the motivation and relevance are clear are much easier complied with.

Once you have a pragmatic and reasonable set of rules for your governance framework, it must be made sure it is in fact adhered to. The best way to do this is to explicitly include it in your definition of done, and if some rules do apply to certain user stories only, then they need to be reflected in the individual acceptance criteria for these user stories. It is of course important that the Product Owner puts sufficient emphasis on this when accepting new functionality as Done. It is an area where it can be very tempting to skip a few rules in order to be quicker in the market, but this is a risky strategy. In the type of organizations where governance becomes necessary, there will be steering committees and program boards and release management and it is up to them to make sure that team members, the Product Owner and Scrum Master are aware of the need for this, while at the same time avoiding to become just an administrative hurdle to take on the way to production. The latter will occur when the motivation for certain rules is unclear, this will lead to rubberstamping of certain aspects of governance compliance.


Monday, May 20, 2013

Agile Metrics

Metrics play an important role in project management. They are the primary way to monitor and communicate the status and progress of a project to senior stakeholders. I already mentioned several metrics in previous blogs, but I’d like to sum them up all together.

Velocity
The first metric, and one that was defined in Agile and has no counterpart in traditional project management, is the velocity. This is defined as the number of story points per sprint that can be delivered by the Team according to the Definition of Done. The velocity is important because it will tell you when all scope on the backlog will be done, it will tell you when you run out of scope. If the velocity drops from one sprint to the next, there should be an explanation for that. It may be that some team members fell sick. Maybe there where a few national holidays. Maybe the user stories that were put on the sprint backlog where more complicated than thought. There can be a wide variety of reasons why the velocity varies between one sprint and the next or why it deviates from the average so far. If those reasons are one-offs, you need to see if there is a way to make up for the loss to keep the project on track, or have the Product Owner come to accept the drop in scope. If the reasons are structural, you need to make the Product Owner and senior stakeholders aware that there is an issue and that expectations must be adjusted.

Burn Rate
The burn rate is the budgetary counterpart of the velocity. It is defined as the amount of man days that are spend during a single sprint, by everybody who books on the project. If your project has several scrum teams, you may want to split out the burn rate for each of the teams. If there are people booking on the project who support the teams, like business analysts, product owners, anyone who is not part of a scrum team, then these costs should be evenly distributed among the teams as to get a burn rate per team that includes indeed all costs that are made to have that team deliver software. Where the velocity will tell you when you run out of scope, the burn rate will tell you when you will run out of budget. If you have a fixed deadline, then the velocity tells you what will be delivered by that deadline and the burn rate will tell what you will have spent.


Defect Detection Rate 

The defect detection rate is the amount of defects detected per sprint. Assuming that developers produce defects at a more or less constant rate, it is correlated with the velocity; the more story points are delivered, the more defects should be found and fixed as well. Teams tend to be pretty consistent in the quality of the software they deliver, so a drop in velocity combined with a rise in the defect detection rate should trigger the alarm. Something’s cooking and you need to find out what it is. My personal opinion is that a lower defect detection rate isn’t necessarily better than a higher one. A defect more found in one of the development and test environments is a defect less that makes it into production. From that perspective, you could support the statement, the more defects the better.

Defect Closure Rate 

This is the amount of defects fixed and closed per sprint. It should be equal to the defect detection rate. If it’s not, the amount of open defects will rise as you move along with the project, leaving the largest part of the bug fixing for the end of the project. This brings me to the last metric.

Gap Between Total and Closed Defects 

This is the difference between the total amount of defects and the amount of closed defects at any one time. This number should be as low as possible. A low number indicates that the quality of the delivered software so far is good. That implies that there will be few if any surprises once UAT and release preparation starts. And that in turn implies that the velocity and burn rate you have measured are indeed reliable indicators to forecast the remainder of your project. I consider this the most important metric of all, for if it’s low, it means I can indeed rely on the other indicators.

A healthy project has a stable velocity and burn rate, combined with a stable and sufficiently high defect detection rate and a low gap between total and closed defects. The velocity and burn rate will ideally indicate that you will run out of scope before you run out of budget, and that you run out of both before the requested delivery date.


It is not possible to give here absolute numbers for any of these metrics that would indicate for a random project whether or not it is in good shape. For instance you can’t say, a project with x number of developers and y days per sprint should produce no more than z number of defects per sprint. Such statements are nonsensical. The actual values of these metrics will depend on the technology you build your systems on, the developers and testers you have, the tools and practices they use, the existing technical debt if there is any, the functional and business context the project is executed in and many, many factors more. What matters is that you determine the actual numbers that result from the execution of your project given its current context and that you know how to interpret them, so you can act accordingly.