Today I’d like to talk about a subject on which I have already had some heated discussions in several different settings: The goal and usefulness of a freeze period.
Usually the intention of a freeze period is to freeze the code base in one of the test environments before release to production for a determined period, say one or two weeks, with the aim of avoiding regression bugs after the production release. The argument is that running a period of time in a lower environment without regression issues is a reliable indication that there will be no regression issues after production release either. I have several observations with respect to this line of reasoning:
- In order for this to valid, you need to be sure that the infrastructure of your lower environment is pretty much if not exactly the same as your production environment. Less performant hardware for instance may make it difficult to deduce if you have a performance issue with your application. Less physical memory in your lower environment will lead to disk swapping earlier than in your production environment, so you may conclude you have a performance issue when in fact there is none. Same with network equipment, processor speed etc. If you use load balancing in production, you need to use it in your lower environment as well.
- If your application manages data, then the data in the lower environment must be sufficiently close to the production data both in quantity as in quality in order to detect application issues that are triggered by unexpected but erroneously allowed data. If you miss production data configurations in your lower environment, at some point this will trigger a production incident.
- And then there is the matter of usage and load. These too need to resemble he actual production situation as close as possible for the freeze period to give rise to any meaningful conclusions.
- How long is your freeze period going to be? If you want to avoid all incidents that would happen once a year, you would have to organize a full parallel production run on the exact same infrastructure and dataset for an entire year. This obviously has some cost. If you do that only for one week, you will detect all issues that would occur on a weekly basis, half of the issues that would happen once every two weeks and less than a quarter of the issues that would happen once a month. Of the yearly incidents you would detect less than two percent, and you don’t know which two percent, the highest impact or the lowest impact. Conversely, if your code changes would give rise to an incident once a year, your chance of detection it during a one week parallel production run is less than two percent.
- So suppose you have almost same infrastructure except for server memory, a third of your production data, and you emulate 150% of the production load for several hours per day for a week using 2 standard use cases, but in fact over 10 exist. No issues are detected. How confident are you there will be no issues in production?
- Another problem is what to do when an issue is in fact detected. If you take the freeze to be an actual hard code freeze, then this raises the question of whether to fix the issue or not. And if you do, are you then going to break the freeze and re-deploy to your lower environment, possibly missing a KPI? And do you then start to count from zero again for your freeze period? And if you do, are you then going to postpone your production release for which go to market activity has already started, missing another KPI? Problems, problems, problems. Usually this situation is resolved by a senior manager grudgingly giving permission to “break the freeze” without postponing the actual release date.
I’ve encountered situations where this dynamic led to some bizarre consequences. For instance, when an issue was discovered during a freeze period in one of my assignments, the first thing the team did was to check if it occurred in production as well. If yes, it was not regarded as a regression. Since the bug existed previously, it did not need to be fixed and no break of the freeze period was warranted. This meant that once a bug made it to production, as long as no one complained to loud about it, it would never be fixed. And if a bug did need to be fixed, instead of being happy that that bug was found and fixed before it got into production, there was disappointment about having broken the freeze.
So does a freeze period makes any sense at all? That depends on what you expect from it. A scope freeze makes sense (but is already achieved if you enforce a fixed sprint backlog), a code freeze much less so if any. Basically the evaluation that needs to be done on any defect is the same no matter when it is detected:
- Can we still fix this in time for the release?
- Can we still verify it in time for the release?
- Which additional tests do we need to re-execute as a result of this fix, and can that be done in time for the release?
- And if the answer to any of these questions is no, can it wait until the next release or is it a showstopper?
It is clear that you can fix and verify bigger issues 6 weeks before the release than you can 2 days before the release. But then again your highest risk backlog items should always be done first, so the bigger issues should come out early in your project and not right before the production release. If that happens you have problems in the area of backlog prioritization and risk assessment. A mandatory freeze period is not going to address that. A full parallel production run may be very necessary in some cases, like in industrial settings or when replacing your ERP system. But this is not necessarily the same as a freeze, as you will want to verify your fixes. My conclusion is that a freeze period is theoretically a nice idea but there are so many gaps in its concept and concessions to make in its implementation that its practical usefulness is close to zero.