Organizational Pain and Legacy Release Cycles in eCommerce

4 min read

“We can only do releases after hours. That is a mandate from the CIO.”

That is sad, and very disappointing, but not uncommon. I heard that statement this week as part of a larger discussion about the exhaustion of doing after-hours releases biweekly at 11pm local time, and hot fixes (three, by my count) over the past seven weeks, one of which was at 12am local time and ran until well past 3am. This creates a broad organizational pain that can reduce availability and productivity for multiple days after a release.

It is unfortunate that even in 2020, there is a Y2K mindset that pervades the upper levels of technology management. This has become a bit foreign to us that grew as DevOps grew, embraced continuous integration/continuous delivery (CI/CD) and containers, think about security-first development, and most importantly think that any web or mobile application should be releasable at 10am on a Tuesday because the code was promoted from QA and the CD pipeline kicked off.

There are only two reasons, in my opinion, that an executive would dictate to an entire organization that releases must be after hours: First, there is a lack of understanding of modern technology and technology consumption and second, there is a lack of trust in the organization responsible for the software deployment.

The first problem can be solved with education.

Modern technologies make automating testing components straightforward and can prevent bad releases if used properly; this can mean both unit tests in code but also implementation and behavioral tests that verify your application (web, mobile, or otherwise) is performing the way that is expected both before and after a release. Whether or not continuous delivery is ever implemented, these tests should exist, and although human QA can never be duplicated, it will never catch all of the quirky things that can be tested for. By the same token, tests can never fully replace human QA

Other education requires education and acceptance that like the news cycle, the internet no longer sleeps. Let us assume that this is an ecommerce site and there is a targeted geographic area of the continental United States, and for this scenario let us assume as well that the purchase cycle is evenly distributed as 7am-12am in all US time zones, but never completely stops. This is the long known math of legacy releasing: an organization has between 3am and 7am eastern to complete the following:
1) Software release
2) Verification of release
3) Testing and QA process
4) Roll-back call (and if rolling back, the next steps)
5) Rollback
6) Verification
6) Testing and QA process for rollback

That is quite a bit to get done in four hours, when most resources are already exhausted and out of their natural circadian cycle. Being tired breeds mistakes, we all know it, and we all have been there.

In this scenario, the lower-but-non-zero purchase period is disrupted by 4 hours, possibly (and probably, based on human nature) losing a conversion to a competitor. Why would anyone do this to their organization or revenue stream if it can be avoided?

That brings us to the second problem, the problem of trust. If, as a leader, I trust my organization so little to do a seamless release then my concern is not about losing conversions, it is about how many conversions I will lose. Compare losing few-but-not-none against many during the peak daytime period, and of course I will demand my team to release things when there is the least amount of impact.

But why does that trust not exist? Is the team under-educated on modern software development practices? Is there a lack of supporting infrastructure that would allow for Blue/Green or A/B deployments with the legacy application? Is it the education piece, above, where the person issuing the dictate simply has stopped learning about technology innovation?

There is rarely a single reason for trust issues in an organization, but considering that the only way to build trust is to address the concerns one by one and prove that they are (or have become) unfounded.

The primary issues with this extend beyond just the developers - remember, there is probably a whole support team behind this. The day after a release every single one of those people is going to be functioning below capacity, or if there is a kind leader, perhaps all of those people involved in the release are given a comp day (note: I have only seen an organization that is kind to its legacy release team in this manner twice). This deprives an organization of available resources that may be needed to address a critical late-breaking bug or issue that impacts the application.

The primary way to address this issue is to release when people are awake and cognizant and ready to go. My favorite time for legacy releases is actually at 10am on a Tuesday, provided that the testing is solid and there is, at a minimum, a proper A/B release cycle set up.

Why 10am on a Tuesday? People are starting the week. Any critical issues that surfaced over the weekend have been addressed, and the partiers on the team have had a chance to recover. The mid-morning means that people are probably both awake and have not burned through their mental energy yet. People will be more alert, more responsive, and more reactive to any issues that might arise and will probably resolve those issues more quickly if they do arise.

I will never be a fan of “block a period so we can do a release” but that does not make it any less necessary for some organizations, but the pain that a legacy mindset forces on an organization can be mitigated with a few tweaks to timing, education, and capability.

If done correctly, not even a few conversions will be lost.

Image of Stephen Sadowski

Stephen Sadowski

Leader focusing on quality, delivery, technical debt management, and leadership education about DevOps and SRE practices