Sunday, February 8, 2015

An Architectural Nightmare

Once upon a time, in a Company far away, there worked an ambitious and talented young software architect, who, for lack of a better idea, we shall call Luke. Now Luke absolutely loved his job. There was no greater satisfaction for him than designing an architectural framework that could accommodate any requirement that a Product Owner could possibly need. And his achievements did not go unnoticed in the Company. So it was one afternoon that he was called into his boss’s office and was told of his new Mission: defining an architecture for a ground breaking Application that would change the industry in which the Company operated, and as a side effect would bring world peace. Or something equally noteworthy.
Naturally Luke dived into this Mission with his known zeal and enthusiasm. He had long talks with representatives from marketing and sales about the requirements. He discussed these with some of the senior developers in the Company, a group that would eventually, together with a few Testers, become the Team. One of the marketing representatives was eventually appointed Product Owner, and in the course of a few weeks, Luke, who had superior pattern recognition skills, noted that all requirements Product Owner had come up with could technically be reduced to Triangles. Triangles of many different sizes and types, but Triangles nonetheless. And so Luke decided that the basic building block of the application was going to be the Triangle, and that a Framework was to be built that could accommodate any type of Triangle, either by extension, or by rearranging the already delivered Triangles. With this strategy, he achieved scalability, extensibility and maintainability in one master stroke. Upon discussing his plans with the Team, Lead Developer observed that this Framework would be costly and time consuming to build up front, and that it could only deal with requirements that in some way could be reduced to Triangles. But his arguments were refuted when Product Owner stated that no requirements other than Triangles were considered or even deemed necessary for the success of the project. As far as the budget was concerned, building the Framework up front did not have a negative impact on the business case, in fact, the time spend on this would be won back later in the project when new requirements would be implemented faster than without this Framework.
And so, after the Company investment board had agreed to the plans and the budget, work began on building the Framework. The first version took six months to build, and a prototype of the Application was made with it that could indeed accommodate the first requirements listed by Product Owner to satisfaction. This convinced Luke and the Team that the chosen strategy was the right one. Over the course of the next year, many different Triangles were added to the Application, all within the context of the Framework. Work now progressed at breakneck speed, and the commercial rollout of the Application was an astounding success. Yes, life was great. Luke had printed the Framework Diagram on glossy A2 paper and hung it above his bed in a luxury frame. His beautiful wife looked up to him in devout admiration and his young children were more obedient than ever. After a well-deserved vacation, paid for with the substantial bonus he had received from the Company after a glowing performance review, Luke drove to his office thinking about the great things he intended to do with the Framework of Triangles. It might come in handy for other projects the Company would execute. But first order of business would be a backlog grooming meeting where Product Owner would start explaining and outlining some new requirement that had come up during Luke’s holiday. Confident this too would be reduced to one or more Triangles, he entered the meeting room, where the Team and Product Owner were already present. The new requirement was lying on a central table still hidden under a silk drape. Salutations were exchanged and once everybody sat down the meeting started. Product Owner held a short presentation about the latest user feedback, market opportunities and usage figures, and then proceeded to pull away the drape from the new requirement. Before the eyes of everyone lay a, immaculate, bright, shining, perfectly round Circle.
It took a few seconds before Luke realized what he was looking at. This was not a Square that could be diagonally cut to make it two Triangles, or any other Polygon that could be reduced to a set of Triangles. They had dealt with those before, although Product Owner didn’t seem to understand why the estimates for these always came out quite high, and why so much discussion was always needed on the correct way to split these Polygon requirements up. But a Circle was a different beast altogether. There was just no way to divide these in Triangles without losing material at the outer edge of the Circle.
It was Lead Developer who broke the silence. “Listen”, he said to Product Owner, “Just how important is it that this Circle is implemented as a perfectly round Circle as opposed to an approximation like say an Octagon or Dodecagon”?
“What do you mean”, answered Product Owner, “The requirement is clearly a Circle. So what I want is a Circle, not an Octagon or Dodecagon. I would have asked for those if I wanted them, but the user feedback indicates a strong need and market opportunity for Circles, so I want a perfect Circle. That’s the whole point of this requirement. How many story points you think it will take”?
“Well”, said Lead Developer, “I guess that depends on the architectural approach we take”, and he directed his eyes to Luke, who was feeling more uncomfortable by the minute. The Team and Product Owner looked to him with great expectation.
 After 30 seconds of deafening silence, Luke got himself together and said “You see, Product Owner, our architectural Framework doesn’t support Circles. It was made for Triangles, and we can cut up Squares and Rectangles and other Polygons into Triangles without having to throw away anything, but with Circles, or in fact with any Shape that has curved edges, this will not work”.
Product Owner looked confused. “I don’t understand”, he said, “When we started this we explicitly stated that we needed a flexible architecture to accommodate all sorts of requirements, and we have done so with Squares and Rectangles and Pentagons, so how come a Circle won’t fit”?
“Well”, said Luke, “The Framework is flexible in the sense that we can accommodate all sorts of Triangles, and this is what we in fact agreed on early on. The fact that we managed to cut up Squares and the like into Triangles doesn’t mean that the Framework natively supports these shapes, just that we found a workaround that is invisible to the end user. Functionally, yes, they’re Squares and Rectangles and whatever, but in the backend it’s all Triangles and nothing else. I mean, we have talked about this extensively during several Grooming and Planning meetings where we discussed these non-Triangle requirements, don’t you remember”?
Product Owner was becoming somewhat irritated by now. This was not something he could report to his own boss, the VP of Business Development for  the Company, without consequences. The Circle was a vital part of the product strategy for the next 12 months. Just two weeks earlier, the Competitor had shown a preview of its own Application and the Circle was to be central to their strategy. So quick delivery of the Circle was instrumental to stay ahead of the Competitor. He estimated the Company would have no more than 4 sprints to beat the Competitor to market with the Circle, and now this supposedly stellar Architect was telling him something as simple as a Circle couldn’t be done? What were they paying him for?
“No”, he said, “I don’t remember and I don’t understand. After all, I’m not a technical person, that’s your job. I mean, if you can accommodate Squares and Rectangles and Pentagons, and you’re even proposing Dodecagons now, how hard can it be to accommodate a Circle? I can’t believe the difference is that big”.
 “Let me answer that”, interrupted Lead Developer, “You see, as long as the Shape has only straight edges and corners, we can cut it into Triangles. The more corners it has, the more cuts are needed to do that, that’s why implementing a Pentagon is more expensive than implementing a Square, which in turn is more expensive than implementing a Triangle, which doesn’t need to be cut at all. But a Circle can be thought of as having an infinite number of corners and infinitely small edges, and thus needing an infinite number of cuts to get an infinite number of Triangles, and thus the cost would be infinite as well”.
“That is ridiculous”, fumed Product Owner, “The way I see it, a Circle has no corners at all, and only one edge, so the cost of implementing it should be even cheaper than a Triangle as per your own reasoning. Are you trying to pull my leg”?
“No, no, not at all of course, I’m dead serious. In the context of the current Framework, implementing a perfect Circle will have an infinite cost, which is another way of saying it’s impossible. The point is that the single edge of the Circle is curved and not straight. That’s what makes it so expensive”.
“Sure, but it is only one edge and there are no corners. That should compensate for the curvy nature of the single edge, shouldn’t it”?
This conversation was getting nowhere, Luke realized. He took the word once again. “Look, we could make a big enough triangle so that the Circle will fit inside it. But there are a few drawbacks: First, there will be empty spaces between the outer edge of the Circle and the inner corners of the Triangle. So this particular Triangle won’t be as strong. Depending on where in the Application we put this ‘Circle inside a Triangle’, this lack of strength will cause more or less instability in the Application, for that we would need to do a detailed impact analysis. Second, since it is weaker than other Triangles, we can’t build on top of it. That means we can add one or maybe two ‘Circles inside Triangles’ to the Application, but that’s about it. It will reduce extensibility and scalability”.
“I have another suggestion” said Lead Developer, “Why don’t we just build the Circle next to the Triangle Framework? In that way the stability of the Application isn’t compromised and we can continue building on top of it. Only drawback is that the Circle would be a sort of stand-alone feature, not fully integrated into the application”.
“That’s unacceptable”, said Product Owner, “One of the acceptance criteria is that it works seamlessly with one of the Rectangles and two Pentagons we added to the system 6 months ago. A stand-alone Circle would be very user unfriendly”.
This was pondered for a moment. “Well”, said Lead Developer, “Then we could build a second Framework based on Circles, and re-implement the two Pentagons and Rectangle in this Framework as well, although they won’t be 100% the same as the Triangle based ones. But this too will be costly, and we would need to maintain two Frameworks instead of one. And we would need to find a way to hide all this from the end user”.
Luke started to get a severe headache. His beautiful and elegantly designed Framework was butchered and compromised right in front of him, by the very people who had admired and approved of the sophistication of the design. Although to be honest, he had never fully trusted Lead Developer in this. And was it just his imagination or did he note an ever so slight smirk on his face? Anyway, it seemed to him that there were several options, none of them good:
1.       The reduced requirement option. Settle for a Dodecagon instead of a Circle. This would be relatively cheap, and would not compromise the application in any way. But it would reduce the business benefit for the Big Important Customer. Maybe amount of corners and edges could be raised further to decrease the loss of business benefit further, but that would also raise the cost of implementation.
2.       The quick and dirty option. Implementing the Circle inside a Triangle. This would also be cheap, even cheaper than option 1, but it would lower the business benefit as it would come over as rather strange to the end user. It would also compromise the application integrity, making this option not only cheap but also dirty.
3.       The second Framework option. This would be expensive and time consuming, but it would be a lot better from an application integrity point of view. It basically boiled down to duplicating some parts of the Application in a different Framework.
4.       The total refactoring option. Build a new Framework that can accommodate any kind of Shape, and then migrate the existing Application to this Framework. This would be the best option from and architectural point of view but also the most expensive and time consuming.
After listing these four options to Product Owner and the Team, the mood of the meeting had turned rather sour. Product Owner didn’t look happy at all. None of the options were acceptable to him. He didn’t want to lose the advantage over the Competitor just because of some incomprehensible technicality. Luke himself kept wondering how this situation had come to be. The meeting ended with the agreement to inform the Project Steering committee and senior stakeholders. Product Owner parted with a cool handshake to Luke. Then Luke reluctantly went up to the office of his boss to inform him of the news. But he was already aware.  His salute was ”We will replace you with another Architect” …
Soaked in sweat and heavily breathing Luke woke up, at first not knowing what had happened and where he was. He was still shaking when he climbed out of bed and went into the bathroom for a glass of water. Looking into the mirror, he realized it had all just been a horrible nightmare. In fact, the Team had delivered a first version of the application in 3 Sprints implementing just 2 Triangles. In Sprint 4, they had implemented a Square by adjusting the architecture such that it could adapt to any Shape. Emerging Architecture at its best. A Circle would not be any problem at all if Product Owner would require it, just a little refactoring.

Sunday, August 31, 2014

The Freeze Period

Today I’d like to talk about a subject on which I have already had some heated discussions in several different settings: The goal and usefulness of a freeze period.

Usually the intention of a freeze period is to freeze the code base in one of the test environments before release to production for a determined period, say one or two weeks, with the aim of avoiding regression bugs after the production release. The argument is that running a period of time in a lower environment without regression issues is a reliable  indication that there will be no regression issues after production release either. I have several observations with respect to this line of reasoning:

  • In order for this to valid, you need to be sure that the infrastructure of your lower environment is pretty much if not exactly the same as your production environment. Less performant hardware for instance may make it difficult to deduce if you have a performance issue with your application. Less physical memory in your lower environment will lead to disk swapping earlier than in your production environment, so you may conclude you have a performance issue when in fact there is none. Same with network equipment, processor speed etc. If you use load balancing in production, you need to use it in your lower environment as well.
  • If your application manages data, then the data in the lower environment must be sufficiently close to the production data both in quantity as in quality in order to detect application issues that are triggered by unexpected but erroneously allowed data. If you miss production data configurations in your lower environment, at some point this will trigger a production incident.
  • And then there is the matter of usage and load. These too need to resemble he actual production situation as close as possible for the freeze period to give rise to any meaningful conclusions.
  • How long is your freeze period going to be? If you want to avoid all incidents that would happen once a year, you would have to organize a full parallel production run on the exact same infrastructure and dataset for an entire year. This obviously has some cost. If you do that only for one week, you will detect all issues that would occur on a weekly basis, half of the issues that would happen once every two weeks and less than a quarter of the issues that would happen once a month. Of the yearly incidents you would detect less than two percent, and you don’t know which two percent, the highest impact or the lowest impact. Conversely, if your code changes would give rise to an incident once a year, your chance of detection it during a one week parallel production run is less than two percent.
  • So suppose you have almost same infrastructure except for server memory, a third of your production data, and you emulate 150% of the production load for several hours per day for a week using 2 standard use cases, but in fact over 10 exist. No issues are detected. How confident are you there will be no issues in production?
  • Another problem is what to do when an issue is in fact detected. If you take the freeze to be an actual hard code freeze, then this raises the question of whether to fix the issue or not. And if you do, are you then going to break the freeze and re-deploy to your lower environment, possibly missing a KPI? And do you then start to count from zero again for your freeze period? And if you do, are you then going to postpone your production release for which go to market activity has already started, missing another KPI? Problems, problems, problems. Usually this situation is resolved by a senior manager grudgingly giving permission to “break the freeze” without postponing the actual release date.

I’ve encountered situations where this dynamic led to some bizarre consequences. For instance, when an issue was discovered during a freeze period in one of my assignments, the first thing the team did was to check if it occurred in production as well. If yes, it was not regarded as a regression. Since the bug existed previously, it did not need to be fixed and no break of the freeze period was warranted. This meant that once a bug made it to production, as long as no one complained to loud about it, it would never be fixed. And if a bug did need to be fixed, instead of being happy that that bug was found and fixed before it got into production, there was disappointment about having broken the freeze.

So does a freeze period makes any sense at all? That depends on what you expect from it. A scope freeze makes sense (but is already achieved if you enforce a fixed sprint backlog), a code freeze much less so if any. Basically the evaluation that needs to be done on any defect is the same no matter when it is detected:

  • Can we still fix this in time for the release?
  • Can we still verify it in time for the release?
  • Which additional tests do we need to re-execute as a result of this fix, and can that be done in time for the release?
  • And if the answer to any of these questions is no, can it wait until the next release or is it a showstopper?

It is clear that you can fix and verify bigger issues 6 weeks before the release than you can 2 days before the release. But then again your highest risk backlog items should always be done first, so the bigger issues should come out early in your project and not right before the production release. If that happens you have problems in the area of backlog prioritization and risk assessment. A mandatory freeze period is not going to address that. A full parallel production run may be very necessary in some cases, like in industrial settings or when replacing your ERP system. But this is not necessarily the same as a freeze, as you will want to verify your fixes. My conclusion is that a freeze period is theoretically a nice idea but there are so many gaps in its concept and concessions to make in its implementation that its practical usefulness is close to zero.

Thursday, August 21, 2014

Governance in Agile Projects

Project governance is a reality in many large software development organizations.  Whether you like it or not, there are many valid reasons for software development needing to comply with governance covering a variety of topics, depending on the context in which the project is executed. Good governance covers and mitigates risks that projects may form for the wider organization without imposing too much of  a cost on the project in terms of speed of delivery and budget. These risks can be related to legal, financial, security, reputational, operational and health and safety risks and this list does not pretend to be complete.

For instance there will be a slew of security and legal requirements that need to be fulfilled if you are developing a personal banking site for a financial institution. A system to manage and consult medical information will be subject to privacy laws. In an industrial setting there will be environmental and safety requirements that may have an impact on IT systems. So how to deal with governance requirements in an Agile project, where you just want to be as productive as you can without being hindered by apparently unnecessary rules that just seem to complicate things and slow you down?

Well, first of all, too much of a good thing is not a good thing, and too much governance, like too much government, tends to have a range of unintended consequences. So when deciding on the specifics of governance that your projects need to comply with, it is important that for each of the rules and standards you intend to implement it is perfectly clear which problem you are trying to solve or which risk you are intending to mitigate. Too often you hear statements like ‘All our projects need to comply with rules so and so because that is just what we decided to do many projects ago based on our experience with these projects and it is since then the way we do things around here’.

Aha, is that so? Well, I’ve got a few questions then:

  • The projects you refer to, what kind of projects were they?
  • Do you still do the same kind of projects technology wise, functionally, organizationally, scale wise, budget wise?
  • Are the risks and issues you identified back then still relevant today?
  • Are the mitigation options still valid?
  • What about new technology since then, does this not allow for addressing some of these risks to be handled differently?
  • Are the people still the same, or have most if not all in the meantime been replaced?
  • For external customer facing projects, are the market expectations still the same, particularly in terms of quality and speed to market?
  • Is your market position still the same
  • Have new regulations come into effect, or existing ones changed or abandoned?

The answers to these questions will likely lead you to revise existing governance frameworks and specific rules. Rules for which the motivation and relevance are clear are much easier complied with.

Once you have a pragmatic and reasonable set of rules for your governance framework, it must be made sure it is in fact adhered to. The best way to do this is to explicitly include it in your definition of done, and if some rules do apply to certain user stories only, then they need to be reflected in the individual acceptance criteria for these user stories. It is of course important that the Product Owner puts sufficient emphasis on this when accepting new functionality as Done. It is an area where it can be very tempting to skip a few rules in order to be quicker in the market, but this is a risky strategy. In the type of organizations where governance becomes necessary, there will be steering committees and program boards and release management and it is up to them to make sure that team members, the Product Owner and Scrum Master are aware of the need for this, while at the same time avoiding to become just an administrative hurdle to take on the way to production. The latter will occur when the motivation for certain rules is unclear, this will lead to rubberstamping of certain aspects of governance compliance.