Devops and Failing Forward

When deploying a production change, usually you have a rollback procedure – documented even!

But sometimes after a deploy, things don’t work exactly as expected. At that time, you need to decide which is better: rollback, or fail forward?

If it’s a small, isolated code change and you can operate with the old version, often it’s an easy decision to rollback.

But in more complex situations, sometimes it’s better to accept that the change wasn’t perfect, but:

is still an improvement overall, and gets you closer to your goal
or is equivalently bad to the previous situation, but in the right direction
moves to a new target architecture that can’t be simulated in qa or stage for reasons of time, cost or complexity.
serves as a commonly-understood stake in the ground, or anchor point. “Now that we’re here, we can see the right direction!”

and keep the change and fail forward instead of doing a rollback.

Generally to know if failing forward is an option, you need:

enough personal and organizational responsibility to accept the risks and handle the consequences.
a clear understanding of the overall IT systems and IT risks
a clear understanding of the overall business systems and business risks
availability of staff to do verification and fix small issues that arise
to pick a good time for the change that minimizes stress and risk
Monitoring and application logging tools help evaluate the situation. (I’d even suggest rounding out your tools inventory beforehand if failing forward is new to you.)
communicate that you may fail forward if necessary, based on calculated, not reckless, risk assessment and that rollback is still an option.

Some actual examples of when I have failed forward successfully:

firewall rule changes that were closer to the final goal, but broke a couple of servers temporarily.
database schema changes that were correct, but required a day or two of minor internal application updates that were not in the original QA test plan.

Some actual examples of when fail forward was not acceptable, and rollback was required:

changes from httpd 2.0 to 2.4 that actually required significant re-QA and updates to the deploy process
database schema changes that were correct, but required a major application re-build and re-QA totalling more than 3 hours of downtime
changes that affected legacy applications with no budget for developers or QA.

Especially with databases, the arrow of time cannot be reversed. So database restores results in the loss of data on busy systems, making fail forward the default policy at many SaaS companies. Additionally, failing forward helps with development velocity.

Devops and Failing Forward

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112