Bulletin 23 August. Automating operations and what’s in a name?

Left brains and right brains

At a recent developer event, I happened to participate in a discussion about the role of what we call “operations” that is, managing and running IT systems, networks, storage and all that. The view, universally it appeared, was that operational IT was on the brink of being automated away, therefore rendering such roles redundant. Discussion turned to the fact that other roles would be available, so nobody needed to worry about jobs.

Which was nice, but irrelevant. Because IT infrastructure is not going anywhere. It only occurred to my flummoxed self some way through the debate, that the quite senior, enterprise-based people involved were largely on the development side. I have been characterising these as the right-brain creatives, largely directed by innovation, aspiration and other uplifting motives.

Meanwhile, on the operational side are the left-brain types, who need to work with reality and whose fault it will be if things start to fail. And fail they do, for reasons nefarious from power shortages to software bugs, and everything in between (it is completely relevant that the origin of the word ‘bug’ was based on a real insect, crawling across a circuit board and causing it to short).

Wait, I hear the visionaries say. It’s all about orchestration now. Work things out up front, write it in configuration files, throw it at the pristine racks of servers and it will just, you know, happen. That’s all very well, and I’ve been to the co-located data centre facilities with minimal staff where, indeed, it does all seem to be auto-magical.

Ramifications of this approach are that the configuration work still has to happen, even if specified in YAML (sands for YAML Ain’t Markup Language. Don’t ask). Such work can be given to Site Reliability Engineers, a new super-race of individuals that are absolutely bloody brilliant at defining infrastructure so it will just work. I’m pretty sure that’s the spec.

Or it can be allocated to developers, who, as we all know, love (and I mean LOVE) to spend their time doing things that aren’t development. “If only I could spend more time defining my target infrastructure,” said no developer I have ever spoken to. Okay, sorry, I’m being glib. The point is, however, that the job has to happen, even if the interface moves.

There’s more. In terms of day to day, keeping the lights on operations, much of the effort goes into dealing with consequences of poor, or incompatible decisions. These can be, variously, badly architected solutions; applications being used for things they were never intended for; prototypes becoming live products because of shortening timescales; and so on.

Many such challenges are as likely to happen in software, as in hardware. Ops people can suck their teeth for a reason: it’s because they will remember last time a certain thing was tried, and how badly it went for everyone involved. If you speak to someone shaking their head and looking negative, it isn’t because they were born that way, but because they have learned to be so.

We do have some potential hope coming from the latest, greatest trends in tech: I’m speaking about containers, microservices and all that (for the uninitiated, this means defining applications as a set of highly portable modules which can be run anywhere). With such models comes standardisation of everything ‘below’, which leads to less incompatibility, etc etc.

However these ideas have a way to go. The Kubernetes container orchestration service may become de facto, for example; but it doesn’t yet have everything it needs to support storage, networking or security, nor is there a generally agreed approach to building a Kubernetes application. We may be standardising one thing, but the rest is still very much to be dealt with.

And then, perhaps all we will have done is shifted the problem. With Kubernetes you can build fantastically powerful, yet complicated applications, buts of which could be running anywhere. And so, guess what, people will do just that, even when it is completely the wrong thing to do. And they will do it badly.

And when they do, somebody will need to be there to pick up the pieces, to work out where things stopped working, to isolate the problem and to feed back information that can prevent it from happening again. Who knows what they will be called, these clever people: something like “operations”, perhaps. No doubt in five years’ time, rooms of experts will tell us that such roles are on the brink of being automated away.

And round we shall go again.

Bulletin 23 August. Automating operations and what’s in a name?