Stop chasing nines

by Jan Ouwens

As an individual, it might feel like there’s not much we can do about climate change. As a developer, however, we have options that other people do not have: we’re pushing the buttons (literally and figuratively) behind the scenes of some of the biggest websites in the world. This is where we can make a difference.

The IT sector is responsible for about 4% of the EU’s carbon emissions, largely due to data centers and data transmission to and from these data centers. This is a place where we can reduce. As Holly Cummins argues in her excellent talk Writing Greener Java Applications, if you’re reducing your cloud spend, you’re reducing your emissions.

So let’s take a close look at what it is that’s running inside those data centers. Setting aside jokes about forgetting to shut down servers for hobby projects, many applications use more resources than necessary. There are many ways to achieve reduction, and Cummins discusses several of them in her talk, but let’s look at a specific one that seems to be overlooked often.

We have decided, as a profession, that application uptime is important. In fact, it’s become so important, that we started measuring it in “nines”: an uptime of 99% is two nines; an uptime of 99,999% is five nines. More nines is better. Many applications have Service Level Agreements (SLAs), often self-imposed, of several nines.

As a result, this often leads to redundant deployments, load balancers, and orchestration services like Kubernetes. Do we really need all that?

Conversely, in the Netherlands, many strict Protestant organisations choose to turn off their website on Sundays. This includes newspapers and online stores. They clearly communicate on their site that they do this, and why. Other websites, such as Lego hobbyist site BrickLink, has a daily maintenance window where the site is down for ten minutes every day; again, clearly communicated on the site.

These websites do just fine.

High-frequency trading platforms may lose substantial revenue from milliseconds of downtime, but does the average person really need to be able to execute financial transactions in the middle of the night?

And of course, if you’re running a life or death system, for instance one that matches organ donors with recipients, then you might actually need multiple nines of uptime. But how many of us are actually running a life-critical application?

I propose that we stop chasing the nines. An uptime of 95% is fine. If you communicate this well, your users will understand. If you communicate why, your user may even applaud you. Pick a service window, and do your deployments and site maintenance during those windows, while showing a static page to your users.

If doing this allows you to permanently turn off some services or deployments, e.g. in the form of redundant servers or orchestration overhead, then you are succeeding! This could significantly reduce your application’s cloud bill, and therefore its emissions. If we all start normalizing uptimes of 95%, it could make a real dent in the IT sector’s contribution to global emissions.

Therefore, if you’re a climate-conscious developer working on a high-uptime system, here is something you can do today: re-think uptime for your application. Does your application really need this SLA, or can you do with less uptime? If so, what reductions could you make in your cloud infrastructure? Involve your coworkers in this discussion, and if you know developers working for other organizations: ask them as well.

Once you’ve done that and determined that your business won’t be impacted too much, find out how much money it would also save if you do this. After all, that might be the best way to influence the decision makers in your organization!