DevOps Institute

[E7] Incident Management with Damon Edwards of Rundeck

On this episode of the Humans of DevOps Podcast, Jayne Groll sits down with Damon Edwards of Rundeck to discuss incident management and the state of DevOps, SRE, and ITIL.

The lightly edited transcript can be found below.

Intro:
You’re listening to The Humans of DevOps Podcast, a podcast focused on advancing The Humans of DevOps through skills, knowledge, ideas and learning, or the S-K-I-L framework. Here’s your host, DevOps Institute CEO, Jayne Groll.

Jayne Groll:
Hi everyone. This is Jayne Groll, CEO of the DevOps Institute, and welcome to another episode of The Humans of DevOps. I’m particularly delighted today to be joined by Damon Edwards of Rundeck. Damon and I were at DevOps Enterprise Summit recently, and we got into a pretty good discussion about the state of incident management. Which Damon, as often as we get together, it’s usually one of the topics we talk about. So Damon, welcome. Why don’t you introduce yourself and let’s talk a little bit about incident management and the state of DevOps and SRE and ITIL.

Damon Edwards:
Yeah. Hey Jayne, how are you doing? It’s good to be here, and thanks for having me on your show for change. To me, incidents are where the rubber meets the road in operations, right? So being able to diagnose, resolve incidents, how quickly you can do that, how well you can do that I think is the true test of an organization’s operational capabilities. And it ties in so many things, right? I mean, we’ve had conversations on from technology perspective. We’ve talked a lot about the different management philosophies, right? I think we love to talk about ITIL versus the new kind of SRE and DevOps way of seeing the world. Talked about tooling.

You know, being from Rundeck, we’re very involved with Runbook automation and being able to give people responding to incidents those expert diagnostic and repair commands at their fingertips, right? To collaborate and grasp all that knowledge from around the organization, put it into something useful and spread control across the organization. So to me this is a topic worth talking about and we often forget about it, right? Because we’re so wrapped up in the new world about deploy, deploy, deploy, right? Can we go faster? We forget about so much of what we do goes into day two and beyond. How do we fix the problems that are inevitably going to happen?

Jayne Groll:
And you know what’s interesting, Damon? So you’ve been in the DevOps space, you know, you’re one of the original originals, right? And even when I entered this space, so I have a long history starting in IT at the support functions. I always used to get a little frustrated because up until recently DevOps seemed to stop at deployment, right? Like you just said, deploy, deploy, deploy, and then start again. And those of us that came from operations kind of stood and went, well, now what? What does day two look like? And how do we manage this? And is it different?

Damon Edwards:
If you go back to the early DevOps Days, it was really about that. So when Patrick Debois created the first DevOps Days, it was literally because he saw people for the first time on stage together. It was John Alspaugh and Paul Hammond at First Velocity, there was a video of them and you heard about it on Twitter, Dev and Ops, 10 plus deploys a day at Flicker, right? So we got freaked out about, wow, they deploy 10 times a day. But what Patrick got excited about was, whoa, here’s an Ops leader and a Dev leader onstage together talking about their problems. And up until that point, if you saw a Dev leader and an Ops leader at the same conference, one of them was at the wrong place.

Jayne Groll:
Right.

Damon Edwards:
And Patrick had spent a lot of time working in both Dev and Ops domains as a consultant and was like, why are these things so different? Why are they so kind of far apart? And so his idea was, why don’t we get people together, Dev and Ops, to talk about their problems and we’ll do it over two days, so it’d be DevOps Days, right? That was his dramatic naming, take the words and shove it together, right? Even though he’s from Belgium, not Germany, Germany. And yeah, so really that idea was like, let’s get Dev and Ops together and talk about our problems. Why are we always fighting, right?

And the thing that the flashpoint, between Dev and Ops is deployment. That was I think the original focus and there was all these new cloud technologies and there was configuration management, whatnot, that came along with it. So people wanted to talk all about, oh, let’s talk about deployment. Right? And I think that became to signify, how well are you doing? How fast can you get things through your pipeline? How fast can you get from an idea to where it’s deployed? You know, we kind of took this detour focusing on up to deployment, like delivery up to deployment. And then now we’re kind of realizing, oh yeah, that’s right, there’s a whole lot to life after deployment. So that was still in a lot of organizations, most organizations, especially most enterprises, being treated as a separate thing, right?

When you finally broke through that deployment finish line, it was really just a mirage, and now you’re back in 1997 on the operation side. And it’s a lot of silos, a lot of ticket queues, very command and control, very functionally, you know, functional alignment instead of horizontal product alignment. Very reactive versus proactive. Kind of back to the way things used to be. So I think now the big focus, and this is I think where the SRE movement has really come on strong, is trying to say, okay, well there’s got to be a better way to do this. How do we apply these DevOps and lean ideas that we’ve done, applied so well on the delivery side, how do we apply them to not just the operation side but the full end to end life cycle, right? The interesting part is not saying, hey Ops, go fix yourself. But it’s also, how do we bring Dev and Ops closer together? So really the Dev activity and the day two operations activity is kind of one consistent thing. We’re all working in the same way.

Jayne Groll:
You know what’s really fascinating about that is that DevOps kind of enters into the realm and everybody gets very excited about DevOps. It crosses over the chasm into the enterprise space, and there’s a lot of cheerleading up to deploy. And then right after deploy, which is really where value is created, right? Value isn’t created until somebody actually uses your product. It kind of fell back into very traditional, very ITIL, very command and control. And it isn’t that there was anything wrong with ITIL, just that ITIL was originally built for Waterfall. I mean that, you know, if you go back eight years before the most recent version, Waterfall was the most predominant approach, very, very linear. So it goes into deploy and it goes afterwards. And unfortunately the incident management, problem management type of processes really kind of slowed down.

And I think worse than that, the feedback loop wasn’t closed. Right? So we were talking about, I’m a really big believer that incident metrics really sit at the heart of your organization. You want to take the pulse of your organization, go look at the incident data, right? Figure out what your users are telling you about what their experience is like, because it’s going to reflect on what’s working and what’s not working. So I know you’ve done a lot of speaking, and I know Rundeck really has, I think, elevated the incident management experience from a support perspective, from an operational perspective so that it can go faster, right? It can avoid these horrible ticket queues. There were really big human problems that year. Try to find a level two or even a level three to work on a really complex incident. I used to bake cookies and bring it up to level three and do a little begging to get some of their time. And I think there’s a lot of progress in that direction. Do you want to talk a little bit about that?

Damon Edwards:
There really hasn’t been that much progress, right? I mean that’s kind of, I think the tools have have evolved. I think in terms of the platforms and technologies, I think the communication is getting better. But fundamentally if you look at how a lot of operations or organizations, especially in enterprises are structured, it’s very functionally siloed, right? So you put like with like, the database folks with the database folks, and the storage and the Windows servers and the Linux servers and the network team and the firewall team, and so on and so forth. You line things up in these functional silos. You know, that’s kind of how things, the work has to jump between those different silos. That’s kind of the first instance, you know, that’s where the ticket queue really kind of went from being a place where you record problems to a place that runs your life, right?

All work has to flow through that because we’ve got these silos, right? We’ve created these functionally based silos. And so imagine those are kind of vertical, right? And then the work has got to flow horizontally. So kind of theater of the mind here from, imagine from left to right. The left’s the light bulb and the right’s the pile of cash, work’s got to flow through all those different functional silos. And operations is still the dominant way things work. And if you go back to the early days of ITIL in 1989, I think is the first version, which I haven’t read. Version three was where I came in and really read the books and got to understand it. You know, where you got to up to that point is, now instead of just these functional silos, we’re going to have process silos. And they don’t really call them silos.

Just processes, right? But what do we do in a big enterprise? Well, there’s problem management, there’s incident management, there’s release and deployment management. There’s request for filament, right? There’s service asset and configuration management. There’s a service catalog management, so on and so forth, right? And so we’re going to tell people these are the 26 processes, you know, now they call them practices. These are 26 processes in the ITIL version three perspective. And they each have their own inputs and outputs and triggers and metrics. What are you going to do? Oh, we’re going to go to an organization and say, these are the 26 practices or processes you must become great at. Or what are they going to do? They’re going to take 26 people, you know, 26 mid junior and mid level managers and say, this is your process.

Here it is. Thou shalt own this process. And what are they going to do, right? They’re to manage the crap out of that process. You know with their sharp elbows and everything because that’s how they’re going to get ahead. They’re going to show everybody that they’re the best quest for film and process, you know, management executive on the planet. Right? And what ends up happening is now we’ve got these more vertical silos around these processes. And that feels even more foreign to the people doing the work, even though they live in kind of functional silos in terms of technical specialties. Imagine things, okay, and here’s the incident management process and we’re going to work hard on that. And there’s going to be somebody who’s going to manage all the ticket processes and the flow of work through that. And then it’s like, oh no, we got this problem management.

After enough incidents, we realize there’s a bigger problem here. Now we got to put the problem into there. And that’s going to be a different person running that with different processes and a different set of tickets. And underneath it all you have the same subject matter experts going, you know, what’s going on here, right? I’m dancing through this silo. I’m dancing through that silo. It’s a very kind of disjointed and broken way of working. And I understand that, you know, from the ITIL hype, call them the high priests. I don’t want to sound derogatory but-

Jayne Groll:
It was a high priestess, wait a minute.

Damon Edwards:
You know, I’ve been leading the ITIL world, they say, oh, that’s not the way it’s supposed to happen. But when you look at how enterprises are shaped, that’s how it ends up happening. Because of all of this, there’s this fundamental problem that when you look inside large enterprises, the incidents take too long, and there’s too many escalations. Right? And working in that old style of working, it’s very difficult to overcome that.

Jayne Groll:
Yeah. It feels like, I used to say, it was follow the pointing finger, right? Because unfortunately who needs to work on a particular incident, it isn’t always clear, right? The symptoms may show up and point you in one direction to one resource, and then they point their finger at another resource. And then meanwhile you’ve got this kind of play going on at the service desk trying to figure it out. And even then, I mean it became fairly self contained, partially really in response to the fact that the communication between post-production operations and pre-production activities wasn’t really clear.

You know, you do a lot of speaking about the contact wagon, and I love that. Can you very briefly just give us an overview of what that is and how it just leads to this crazy delay of time? And of course I would point people to your full presentation, which I’ve seen in a couple of different venues. But give us an overview of that because I think that’s also a segue into talking about old ways of working versus new ways of working. It’s not to say that ITIL is bad, by the way. It just is a reflection of its time. Right? And it’s a reflection of its culture.

Damon Edwards:
That’s exactly it. Before I get to that, I mean, when I do kind of knock the ITIL away, it’s effectively, it’s the framework, right? So the advice within it, the vocabulary, the fact that we talk about the difference between incident management and problem management, we have that vocabulary, is something I tip my hat to, and the ITIL has brought to us. And it’s not the individual content inside it, it’s the framework around it and how the framework is divided into these processes, and now called practices. And how there’s an idea of change authority, right? Or change advisory, that kind of drives a lot of command and control, quality buying external inspection, which we’ve seen these other movements just doesn’t work. So yeah, it’s a function of time. And I think that the framework is really the part that turns toxic inside of big enterprises, not the individual advice.

So I want to kind of separate those two things out. But the talk you’re talking about is called, The Last Mile, which is something that I gave as a keynote last year, 2018, at the DevOps Enterprise Summit. And I start off the talk by doing this big growing storyboard, scrolling kind of cartoon of an actual enterprise incident. Right? Everything was great in the front end. They were all agile DevOps, cloud native, go, go, go. You know, when an incident arose, what actually happened, right? And it dives into this kind of siloed ticket driven command and control way of working. But the context wagon specifically, which is funny because a lot of people bring that up to me. And I almost didn’t put it in there, and something I’ve just been aware of for a long time, but kind of became a featured thing, which is, anytime you work in this ticket driven way of working, these tickets are opened and you kind of become part of this…

You’re in the context wagon. There’s only so many things that a human being can think about at once, and it occupies a little piece of your brain, each thing in there. And they talk about context switching in general is a very expensive thing, right? Sometimes it’s a 20 to 30% tax anytime you have to switch between working in one context and working in another. And with this ticket driven way of working, this kind of very shotgun approach to operations, which is, there’s an incident, what are we going to do? Page everybody, right? So we get all the level ones from all the different teams that do the, not me, the don’t point at me kind of game. And then we find, oh we think it’s this, and we escalate up. And they think it’s that, we escalate over there.

But what happens is, each of those people are kind of added to this context wagon, right? Draw a little wagon at the bottom of the screen. And the idea is that it occupies a little bit of their brain, even if they’re not active in that ticket anymore. It’s just something hanging out there of whether it’s, I wonder what happened to? Or, I’m on the CC list. Or I’m just sort of in the background here, can’t really avoid it. And as the incident gets bigger and longer, that context wagon grows more and more. Right? And it’s a great way to waste people’s brain power. And it was also all sort of all kinds of cultural and political destructiveness that comes along with it.

Jayne Groll:
Well and you know, it’s interesting too, and I want to segue into new ways of working. Is that it is the ultimate case study in lack of feedback loops. As this incident works its way through, and I said, it’s a reflection of the time. I remember when we could escalate automatically, right? The automation would do an auto escalation and everybody went, ooh, that’s fantastic. Right? So it is a reflection of the time, but didn’t close feedback loops, right? So if you touched it, you didn’t know what happened afterwards, you had no closure. And that’s why you would kind of throw it around like a hot potato because like, hey, I don’t want this because I don’t even know what’s going to happen with it. So let’s segue to new ways of working. What’s the new approach to incident management that humans, right, should really kind of pay attention to and internalize? You know, outside of the automation, outside of the frameworks, whether it’s SRE or it’s ITIL or whatever. What’s the best way to manage an incident these days?

Damon Edwards:
Well, I think there’s two halves to it, right? So if you imagine there’s like a dial or a wheel, right, that from zero to pointing straight up, that’s kind of like the observe side of things. And then from straight up kind of to the other side is the react, right? And then I think we could probably draw a line underneath learn. But really it’s about observing and react. If you think about on the observing side, I think a lot has gone on with the observability movement, right? You know, monitoring is about the spawning the knowns, things that happened in the past, looking for patterns that happen in the past. But as these systems become more and more complex, right? We’ve got these kind of very complex microservice-based architectures combined with all sorts of traffic, ingress and egress from all different kinds of user activity.

So there’s sort of this technical complex system that’s been created. And then on top of that we’ve got this human complex system that’s acting on that technical complex system. Right? And that’s all this go, go, go. DevOps we’ve divided up into these kind of build and run teams and really broken things down to decouple the organization. I think it was Charity Majors, is this great quote that if you think about, you know, there’s an infinite number of almost impossible failure scenarios, right? So talk to people from organizations that have really invested a lot into resiliency. You know, folks like Netflix, right? If you can take down whole Amazon availability zones, you know, you don’t even notice it from using their service. They’ve noticed as the better you get at it, and they’ve invested millions of dollars, right?

In resiliency. The better you get at it, the weirder the problems get, right? So the problems don’t stop, they just get weirder and weirder. The observability side’s very interesting because it’s really all about interrogating the unknowns. It’s a combination of logging, which tells you about this is the event, right? There’s metrics, that’s data points over time. Is this faster or slower than it was before? And there’s tracing, right, looking at all the different events that happen in the context of a single event. So that all goes into observability, which is really about, how do you enable people to interrogate their systems, right? To find the unknowns, right? Because most of what they’re going to be battling is these unknowns. So that’s, I think, the big thing on the observability side. On their reaction side, I think the big thing, and maybe I’m bias coming from Rundeck and I’ll admit that, but there’s other people out there doing it as well.

You know, Runbook automation is back, right? So had this funny thing where, Jayne you’ll remember this, right? Because in the, call it the classic days, right? Runbook themselves and Runbook automations was a big deal, right? The idea that there’s all these tools, scripts, tools, you know, procedures you need to know about, but it’s the human to tool action that’s the problem. How do I call the right scripts? Is it, wait, I call this one before that one? Or it’s Tuesday, so it’s dash-e. Or what is it the firewall people told me I should never do with this F5 interface? Because of that, when you go to try to resolve an incident or fix something, what you end up doing is you end up being a bunch of wikis. But then the wikis, you’re like, is this is up to date? Do I not understand what the person is saying?

Or you’re back in those tools and scripts and trying to just figure out, do I have the right version of the script? Do I know where everything is? You know, we end up doing is just escalating, right? That’s kind of what always would end up happening. So the point of Runbook automation is to capture that knowledge. How do you invoke the tools? What order do you do it in? What APIs do you hit? Instead of trying to pass that knowledge off through either weeks of human to human education, or months of maybe a software development. You want to put that into a Runbook automation tool and then you can hand that off to somebody else, right? So the things that only your subject matter experts could do in the past, you can now distribute to the people closest to the problem.

So just like we were distributing the ability to inspect our systems and observability I was talking about, we want to distribute the ability to take that subject matter expertise of how to go either diagnose. Say hey, what’s the five or six things you would check to see if this service is performing properly, right? Or it’s configured properly, right? Or do some kind of repair action. How would you restart this thing? If there’s a no one problem of X, Y or Z, how do you go and fix that, right? Whether it’s resetting the cash or rolling back to a previously known good, or changing some configuration, or just doing some fail over to a backup system, you want encapsulate that as your Runbook automation. And then you use that to give that to other people.

So now you can empower other folks in your organization to do the things that we previously had to rely on your subject matter experts. So how that would play out is, now you can say, well maybe it’s a build and run. Or maybe it’s just some L1 in a knock, you want to give them the ability to run the diagnostics to say, hey, do all the things the database team would do. Do all the things the network team would do. Do all things the storage team would do to see if this thing is healthy. And so now instead of escalating to everybody, they can start running diagnostics just like that team could. Or they could even take action. Like hey, this service seems to have a problem, three or four things to make sure the traffic’s off it. Then I got to take it out of low bouncer pool, then I could run my restart script.

Then there’s a five or six things I got to check to make sure it came back up. Then I’ll put it back into the low bouncer pool. You know, that’s a very hard thing to hand off to just anybody. Point of a Runbook automation tool is you just turn that into configuration and say, hey, call these different APIs and different tools in that order. And now you’re able to encapsulate that best practice and hand it off to other people in the organization who can get it done. And by doing that you’re decoupling from having to have this as escalation chains, and you’re able to solve incidents quicker. The return of Runbook automation I think is the hot thing going on.

Jayne Groll:
And you know, I have to tell you that listening to you talk about it, and you’re right Runbook automation is kind of a little bit of full cycle. But even back then it wasn’t something that developers and operations folks felt comfortable pushing it down to L1 or or L2, it was always something mysterious, right? So in today’s environment, not only does it help solve incidents faster, it gives the opportunity to do some diagnostics, maybe even some root cause analysis by having these Runbooks that can be run by individuals who maybe are not subject matter experts, right? They’re subject matter experts kind of generalist knowledge and solving problems. Right? Good detectives make really good operations folks. But it’s also giving them an opportunity to upscale. And one of the things I hated about running a service desk, and I ran small service desk, I ran really large service desk, is that it always felt like a little bit of a tunnel.

You know, people would come out, they do their time and then they were waiting to figure out what their next role was going to be. And it never felt like they were there because they really like what they were doing. Right? So optimizing automation, being able to execute Runbooks, maybe not at L1 triage, but many organizations, level ones, you know, one of their primary metrics is first contact resolution. Really being able to drive that down and giving them the ability to grow their skills. Because as they’re running it, they’re going to see things, they’re going to experience things, they’re going to talk to people that maybe before they were phone takers.

Damon Edwards:
Yeah. You know, Jody Mulkey talked about this in 2015 or ’16, DevOps Enterprise Summit when he was the CTO at Ticketmaster. And he had talked about how they turned the knock of their talk, they called it, back into operators. They loved it. And they actually, you know, through pushing down control, through Runbook automation, that’s how they got those people to actually not just be escalators.

Jayne Groll:
Right.

Damon Edwards:
Right? You know, look at some lights and make a phone call, but actually look at some lights and take some action. And they loved it. And because it pulled on all that knowledge that they’ve already had and their ability to kind of spot patterns. And it also gave, you know, Jody had this kind of ER metaphor where he gave them a career path that said, hey look, you got the EMTS, right? And after the EMTS you got the emergency room doctors, right? And then you got the emergency room kind of physicians. And then you’ve got the specialist brain surgeons going to come in on call. And they had sort of timing and criteria for each of those different levels.

And then you actually had a career path, right? Which is, hey, I could work my way up this chain and build my knowledge. But the only way that would work in this kind of in a faster moving more at that point is going towards microservices. Now everything is know about microservices and containers and it’s like ephemeral everything. You can’t do that if this Runbook automation is only for the experts. If it’s only for the experts it’s not going to work, otherwise there are people always be buried in these repetitive requests.

And it’s also, it never works if you expect operations to create it for themselves. This is part of the developers are creating it, they’re handing off to operations. There’s some kind of code review just like everything else. They decide, yes, this procedure is good or bad. And then they’re using access control to let those other people, to push control to where it must happen. Right? So I think that is the big difference of Runbook automation for something for those level three experts to use to be able to act faster. So like their other own kind of an internal tool. To where it’s really a mechanism to take operational control and operational knowledge and turn it into something that’s executable and hand that off to anybody in the organization who needs it. And you can do it in a high velocity, high confidence way.

Jayne Groll:
I love that. And I said, it’s funny because we look at the rise of SRE. And the word people and human, right? Has been a big part of our conversation today. And so while we’re talking about automating toil and we’re talking about automating the reduction of toil, and looking at it creating more of an engineering focus in operations, all of which is fantastic. Let’s not forget that we’re not automating ourselves out of humans. You know what we’re talking about in incident management, a lot of incident management is almost perspective, right? Or almost intuition. I say good detectives.

I’m a generalist by trade. And I think as I moved up through the ladder of IT, that gave me a perspective that other people didn’t have. And I think those that really engage in incident management should feel very proud of what they do. And if they can really supplement and go faster through Runbook automation, there’s more willingness for mentoring, there’s more willingness for knowledge management and sharing. God, I couldn’t buy knowledge in my day. Some day I’ll tell you a story about that, next time we’re at a conference. I did, I tried to buy the knowledge, nobody would give it. So $25 an article, nobody would give it to me. It’s very human. I love it.

Damon Edwards:
Well, it’s interesting is, I think this is a very good point. The automation we’re talking about, it’s more like Iron Man than how, right? So it’s really about, it’s automation to support the human, to elevate the human. That’s really the key point. So this Runbook automation is not about, oh, how can we get rid of people, right? It’s about, how can we make those people be able to act swiftly and with confidence and empower them to do things that they normally would have to escalate off to somebody? And I think this whole AI Ops idea has kind of been a false start, right? It hasn’t worked because this idea, it’s like I said before, there’s an infinite number of seemingly impossible failure scenarios, right? And we still need the human brain.

We need the human. If you look at aviation, aerospace, the medical fields, right? Nuclear power plants. They’ve been trying to automate the human out of a job for decades, right? We spent billions of dollars and decades of academics trying to figure out how to take the human out of these complex high consequence sums. And they haven’t been able to do it. This idea that we’re suddenly going to somehow do that with the few startups in Silicon Valley, like it’s not going to happen, right? So I think the strategy to take is, how do you elevate the human? Now that means we want to get a lot of toil off our plates, right? That means we want to make them more efficient. We want to kind of automate the repeatable things. There’s a lot of ways to use automation to serve the human. Fundamentally, you look at the people that excel in this field, and you look at the people that excel in other high consequence complex systems, they excel when they elevate the human. They don’t excel when they replace the human.

Jayne Groll:
You know, if there’s any message that comes out of our conversation today, I think there’s a few takeaways. Certainly automation serves the human, not the other way around. And I think that’s fantastic because good automation, particularly with the perspective of incident management, right? Because you got to remember, at the other end of an incident is somebody that’s trying to use your product. Somebody that’s trying to do something, even if it was in a reactive incident, even if it was something that was observed. Sooner or later it would bubble up, right? So it is a very human event. And there’s a lot of pressure, right? There’s a lot pressure to get it resolved depending of course on the impact and the urgency of the thing. So first of all, I think the automation serves the human not the other way around. And I think that’s interesting.

It helps make the human better. And you’re right, we’re not going to automate our way out of it. I think it’s important to recognize that when we’re looking at the end-to-end value stream, that it doesn’t end at deployment. That what happens life, you call it day two, it’s kind of like, okay, now we go into production and who cares? No, that’s no longer important. Incidents are going to happen that’s a given. Incident management, how they’re managed, who manages it, what kind of tools can be available that are more than just a CMDB and a ticketing system, which is kind of the the old way of looking at it. It’s not a command and control process. And I think that’s something we’ve got to move away from. We’ve got to stop thinking of it in very linear terms.

So we have to think of it very much more of an agile perspective. And I think also when we start to look at the interaction of development and operations. And whether you want to call operations pre-production, infrastructure, post-production, support, I think there’s a different human experience. And I think you’ve done a really good job of amplifying the fact that it is a very human experience. Developers are going to help to build Runbooks. Infrastructure folks, pre-production, operations are going to put their knowledge into Runbooks. And when that feature product, whatever is being deployed, reaches those that have to manage reliability, manage resilience and manage incidents, those people are going to be equipped and powered to be able to do their jobs really, really well. And to make sure that service actually delivers value. So it is a new day.

Damon Edwards:
It is. It very much is. And I think just to tease it for maybe our next conversation, you know, if you see what’s really going on, we talk to people who are kind of pushing the boundaries in this space, is to realize that whether it’s just DevOps and SRE or you got different names for it. What’s effectively happening is we’re moving from a world where we have vertically, functionally aligned teams that kind of operate inside a command and control from some… they’re granted authority to do something from some kind of command and control approval system, to a much more horizontally aligned self-regulating system.

So you think about what DevOps and SRE is really all about, it’s about self-regulation. So it’s about empowering the people closest to the problem, aligned to some business value delivery, to make all the decisions needed to build and deliver a high quality system. And that’s a fundamentally different way of a different operating model. A different way of running these technology organizations. And if you look at what’s going on in the high performing organizations and you look at what’s going on in some of the more kind of thoughtful academics in this space, that’s where everything is pointing to is high velocity, self-regulating systems. And at the heart of that is people, right? It’s elevating the people so we get the best value out of them because that is our greatest asset.

Jayne Groll:
Absolutely. And maybe for our next podcast you and I can do a little compare and contrast, and kind of all ways of working versus new ways of working. You know, the good, the bad and the ugly, right? So there’s good in everything. There’s certainly bad in everything. And maybe a little ugly, right? I would love to do this again and really look at it as objective of view as we can give it. To say, okay, what’s the value of this? I think if there’s any message that comes away from this is loyalty should be to your career path and your organization, not necessarily to one way or another of working. It’s all of the above that’s going to make it.

First of all, thank you for spending some time with me. I always enjoy having these conversations with you because you have a very real world approach, which a lot of people don’t. You know a lot of people talk in the way of should, Damon. You know, it shouldn’t be this way, or it should be that way. And truthfully the enterprise is maturing at the speed that it can depending on the organization. And also at the speed of the innovative thinkers, which is also very human. So thank you for for sharing that, your insight. I think you have the ability, and no pressure, but you do really have the ability to kind of reshape the way organizations and individuals, particularly in post-production support, think about their job, their responsibilities and the service that they provide, and how they can do that better. So that’s awesome.

I also want to thank you because on December 10th you’re joining us for Global Skillup Day. And so going to share some more of your knowledge with our audience. You know it’s an 18 hour event where every session is a little bit about how. So kind of teased about what. Really looking forward to how.

Okay. Well anyhow, again, Damon Edwards of Rundeck. If they want to learn more about Rundeck, where do they go?

Damon Edwards:
Rundeck.com. R-U-N-D-E-C-K.com.

Jayne Groll:
All right, well, thanks very much again. I’m Jayne Groll. Really delighted to have spent some time with Damon Edwards of Rundeck, and this is The Humans of DevOps Podcast.

Outro:
Thanks for listening to this episode of The Humans of DevOps Podcast. Don’t forget to join our global community to get access to even more great resources like this. Until next time, remember, you are part of something bigger than yourself. You belong.

Community at DevOps Institute

Join now

The DevOps career roadmap: staying relevant in an AI-powered future

The DevOps career roadmap: staying relevant in an AI-powered future Mohammed Feisal Ismail, Principal Consultant, Sapience Consulting As artificial intelligence becomes embedded across DevOps and Site Reliability Engineering (SRE) toolchains, many mid-career...

Beyond Pipelines: How DevOps Teams Are Redefining Performance

Axel Labruna, Solutions Architect, Nubiral, and DEVOPS INSTITUTE Ambassador Axel Labruna has specialized in DevOps for over 10 years and has been an ambassador of the DEVOPS INSTITUTE since January 2025. In this article, he examines what truly defines a...

[EP112] Why an AIOps Certification is Something You Should Think About

Join Eveline Oehrlich and Suresh GP for a discussion on Why an AIOps Certification is Something You Should Think About Transcript 00:00:02,939 → 00:00:05,819 Narrator: You're listening to the Humans of DevOps podcast, a 00:00:05,819 → 00:00:09,449 podcast focused on...

DevOps Institute

[E7] Incident Management with Damon Edwards of Rundeck

Community at DevOps Institute

related posts

The DevOps career roadmap: staying relevant in an AI-powered future

Beyond Pipelines: How DevOps Teams Are Redefining Performance

[EP112] Why an AIOps Certification is Something You Should Think About

Complete your application from the PeopleCert website

Buy the exam from the PeopleCert website

Buy the exam from the PeopleCert website

Buy the exam from the PeopleCert website

Buy the exam from the PeopleCert website

Buy the exam from the PeopleCert website

Buy the exam from the PeopleCert website

Buy the exam from the PeopleCert website

Buy the exam from the PeopleCert website

Buy the exam from the PeopleCert website

Buy the exam from the PeopleCert website

Buy the exam from the PeopleCert website

Buy the exam from the PeopleCert website