- How incidents can be considered an opportunity to gather context and skill
- What leaders can do to make this mindset part of the culture of an organization and encourage it across teams? (team-first thinking)
- How to get the best value from incidents
Incidents are a great opportunity to gather both context and skill. They take people out of their day-to-day roles and force ephemeral teams together to solve unexpected and challenging problems. Incidents can be a great accelerator. Writing code that fails safely and loudly is one of the key differentiators for more senior engineers, and there’s no better way to learn how than seeing all the different ways that systems fail.
Lisa is a product engineer at incident.io – in fact Lisa was employee #2. Lisa started out as a consultant working at Accenture, before accidentally becoming a developer.
Lisa loves building stuff, but is also interested in how people interact with each other in a work environment – particularly in software engineering. Having seen the ‘old way’ of large-scale waterfall project at Accenture, Lisa thinks it’s interesting to try and bring some thinking from that environment into a startup. Outside of work, Lisa loves cooking and pretty much any competitive sport (well, the British ones anyway).
Voted Best 25 DevOps Podcasts by Feedspot
Want access to more content like this? Gain the tools, resources and knowledge to help your organization adapt and respond to challenges by becoming a member of DevOps Institute. Engage in one of the fastest-growing DevOps communities today! Get started for free: https://www.devopsinstitute.com/membership/
Have questions, feedback or just want to chat? Send us an email at [email protected]
Lightly edited transcript below
You’re listening to the humans of DevOps podcast, a podcast focused on advancing the humans of DevOps through skills, knowledge, ideas and learning, or the SKIL framework.
Jason Baum 00:33
Hey everyone, it’s Jason Baum, Director of Member experience at DevOps Institute. And this is the Humans of DevOps podcast. Welcome back. Hope you had another great week this week. I always hope you have a great week. So I hope this one was even greater than the last one. Today we’re gonna be talking about incidents, mistakes, do-overs, we talked about blameless culture. It’s one of the core principles of DevOps. But in reality, is it as easy as just saying, we have a blameless culture. In preparing for today’s episode, I’m reminded of a quote by Phoebe Waller bridge, the creator of fleabag, and show runner of killing Eve. That’s the very reason they put rubbers on the ends of pencils because people make mistakes. I love that quote. If you don’t know who she is, you know, Waller bridge has made a career of bringing to life unconventional women who make a lot of mistakes. But what her series have in common is that her flawed characters get a chance at redemption, moving past mistakes, and offering them an opportunity to prove something to themselves and come out stronger and more confident on the other side. I feel like this quote, the metaphor she made is a perfect setup for our conversation today. Incidents are a great opportunity to gather both context and skill. They take people out of their day-to-day roles and force teams to solve unexpected and challenging problems together. Joining me today to discuss this topic is Lisa Carlin Curtis, Lisa is a product engineer incident.io. In fact, Lisa was employee number [email protected] She started out as a consultant working at Accenture before accidentally becoming a developer. I’d love to hear how that happened. Lisa loves building stuff. But it’s also interested in how people interact with each other in a work environment, particularly in software engineering. Outside of work, Lisa loves cooking, and pretty much any competitive sports, we definitely have that in common. But I guess she likes the British one. She says, I really don’t know that much about them. So Lisa, welcome to the podcast. Thank you for joining me.
Lisa Karlin Curtis 02:49
Hey, lovely to be here. I’m really looking forward to awesome.
Jason Baum 02:52
Are you ready to get human? Alright, let’s do it. All right. So we’re talking about incidents? How can incidents be considered an opportunity to gather context and skill?
Lisa Karlin Curtis 03:07
So I kind of started thinking about this. When I was reflecting on Yeah, I kind of became a software engineer accidentally, and I’ve accelerated quite quickly, I’ve been very fortunate. And part of that is because a lot of the stuff I did before I was an engineer was actually quite useful. But also part of it was, I realized, I started doing this thing where I was basically running towards the fire. So stuff would go wrong. And I’d be like, Oh, that looks kind of interesting. And all of the times where I learned most the sort of step changes in my understanding or my context, were around incidents. So they were like, something would go wrong. And either I would learn, like, while we were fixing that problem, I would learn about your stuff. Or straight afterwards, when I was like reflecting on it and talking to people about it, I’d learned a bunch of stuff. And so I kind of started thinking about this and talking to people about it. And it turns out other people had had the same experience, what a surprise. And so I think that there is something very unusual about incident, which is why it is an incident Right? Like something, something happens that is unexpected that you didn’t know was going to happen. And then you have to react to it. And that pushes people outside their comfort zone. And it pushes you to do things and see things that you wouldn’t otherwise see. So I guess like, I can think of like three, three key areas where it’s really useful. So one is about like broadening your horizons because you see the stuff that you wouldn’t see in your day-to-day. One is about teaching you how to build stuff that fails gracefully, and then an observable way. So I think that one of the key differentiators between good software engineering and great software engineering is what happens when the thing that you didn’t think could happen happens. So like step one, make it work. Step two, make it work really fast. Step three, make it work really fast. And when you get a negative number that you weren’t expecting you explode really, really loudly as opposed to just take the negative number and let’s just Pay somebody a negative amount of money or you know, whatever it might be. And then sorry, just to finish off, the third is about building your network. So you have a whole bunch of connections with different people in your organization, you work with, like your team. But in an incident, often you have to, you have to work with lots and lots of people from across the organization. And that builds bonds that are really important. And I think really valuable both to you as an individual and to the company.
Jason Baum 05:25
Yeah, absolutely. If you listen to this podcast, you’ve heard me use parenting often as examples. Because I think parenting is so applicable to what goes on in day-to-day life outside of your home. And one of the things that I was told when I was a new parent was plan for the unexpected or plan for the implantable. And I think that’s applicable here. I think with incidents and mistakes, it’s almost like, you need to expect it, it’s going to happen, if you’re building a program, it’s going to have a bug. It’s what happens after it happens. That really matters, right? That’s where all the everything happens.
Lisa Karlin Curtis 06:07
Yeah, I think that’s the differentiator in terms of, if you’re if you’re building a system, you, you will predict a certain number of the possible things that are going to happen, and you will make your system behave well in them. And that’s all great. And then you read a book that’s like, oh, you should make your system observable. And you go, Okay, I will add some loglines. And I will add some metrics. And like, it’s really easy to do that in a way that doesn’t really add any value. And we’ve all seen examples of that. We’ve all seen dashboards that really mean anything. And the way to get from like, having read in a book to being actually able to do it, valuably, I think is just to see it. And it’s very difficult to do that into learn that in the abstract, but as soon as you see someone trying to debug a problem, and you see like, you know, what, what is the breadcrumb? What are the breadcrumbs that they are following, in order to get from our API is slow to this is the root cause I can now fix this problem. And if you see somebody do that enough times, you start to be able to lay your own breadcrumbs. Because you can kind of imagine you can, you can empathize, you can put yourself into that person’s shoes who’s trying to debug it and be like, Oh, maybe it’ll be useful for me to like, have a metric here. Because if this specific bit starts to go weird, we want to know about it. And I think that that’s something as you say, where it’s like, it’s all about preparing for the unexpected. And a lot of that is actually counter-intuitively, perhaps it’s not being able to handle every case, it’s being able to either be sure that what you’re doing is right, or get a human to help you out, right, and that ENCODE is like throwing an exception or panicking or whatever you want to call it. And that’s the most important thing, particularly if you’re building your billing software, like cars and planes and you know, software that we trust with our lives, then you need to be sure that if the software sees anything that it doesn’t expect it the first the human. And that’s the same in like FinTech, which is my background. And it’s the same in lots of bits of software. And that’s something that like, you’re not really taught that and if you read it in a book, it doesn’t really land. But once you see it, you can really start to engage with it and sort of do it yourself.
Jason Baum 08:01
So I mentioned blameless culture. And that’s really important in DevOps, with incidents of on blameless culture anywhere, really. But I feel like it is something that is said, I’ve heard it is said so much that it almost becomes a buzzword. And you really question the authenticity of when someone says, oh, we have a blameless culture? How do you actually have a blameless culture? How do how do incidents lead to learning? Without feeling like, the person who made the mistake is getting in the way or, you know, is? Well, yeah, feeling like you made a mistake, let people down. I think that’s, that’s inherent nature for all of us, right?
Lisa Karlin Curtis 08:54
Yeah, absolutely. I think that there are a lot of people have a lot of shame around, like making mistakes at work. And that is a very like human thing, humans are incredibly susceptible to shame. And what that means is that there is so much psychological pressure when you make a mistake to try and cover it up. And that is like the worst possible thing that you can do in a software engineering environment. And we know that, and yet all of us still have that, right. All of us still have that moment when we find a mistake was made. And we’re like, maybe I’ll just fix it, and it will be fine. And no one will ever know. And I think that is such a, it’s so hardwired into our brains, that you have to work very, very hard to combat it. And so some of the obvious things that you can do as an individual, try and be very open about your mistakes, particularly if you’re in a leadership role. Or some or if you have quite a lot of social capital because you’ve been in that organization for a long time. That means that people will kind of monkey see monkey do right. If you do it, other people will, will copy you and we’ll follow your example. And then there’s another part of it, which is I think you talked about failing together. So The way that most technology is more complicated than one person made one error. It’s normally lots of people made lots of decisions that have all coalesced into a bad thing. There was a famous one company, I used to work out where a junior engineer had kind of gone on to there, they’d written some code that was supposed to send an email, telling people who weren’t paying that they should pay basically kind of trying to prompt and increase conversion. And the logic was a bit wrong. And it was actually targeting all the people who were paying. And they ran it in staging. And customer support got inundated with requests. And the studio engineer is sitting there being like, put it in staging, I’m really confused. This is very stressful, like the team jumps in, like they go to support, then is sort of told this the mistake, don’t worry, your billings, fine. And they start to look back. And it turns out that somebody has seeded staging with production data to run some load tests, and they didn’t anonymize the emails. And so they’ve got they’ve just ran. Basically, they’ve run their code in production, but they didn’t know. And something like that the junior engineer is, is really mortified because they’ve done this thing. And they’ve they clicked a button and a bunch of emails went out. And that’s really bad. But I think it’s important to look at that as a group and be like, Well, how would you have known that? Possibly, right? Why did we put production data in staging, why did we not anonymize it? Why is staging setup so that it can send unlimited emails to unlimited numbers of people? And there are a whole load of other questions, right, and you can start to look at it as a systemic problem. Or you can look at it, there’s like the Swiss cheese analogy of like, all the holes have to line up. And I think that if you talk about things like that a lot, then people get it, and people buy into it. And at that point, it’s much more comfortable to admit your mistake, because you know that your team is going to gather around you and you know that your team is going to take accountability. And so if you’d like if you succeed together, if you fail together, you can build this blameless culture. But if you hang people out to dry, if you mock people, if you’re mean, that’s just going to reinforce the shame that that person is already worried about.
Jason Baum 11:59
So if I if I’m hearing you, it’s, it’s that proactive honesty, it’s the mistake is made calling it out. But saying, basically, it’s calling it out for what it is this happened. How do we address it? What do we do coming together, getting everybody to rally around it? Without pointing the finger?
Lisa Karlin Curtis 12:22
Yeah, I think that’s exactly it. And then you need to combine that with incident shouldn’t be a big scary monster. So I think there’s a blog post on our blog about like incidents and no bad thing, you should be declaring more incidents. And there are, there are lots of organizations who sort of measure their success on number of incidents, which I think is a really perverse incentive. But I think that if incidents become the norm, then mistakes become the norm. And if you’re all talking about incidents, and if that information about that those incidents is really accessible to people in your organization, then you’ve made a mistake, just like everybody else on the team has made a mistake has made hundreds of mistakes. Whereas if that is all kept hush within the team, and it’s not broadcast, then all of a sudden, like that’s the first mistake anyone in the company has ever made as far as you’re concerned. And that’s a really terrifying place to be.
Jason Baum 13:07
Yeah, and we all know that’s not true. But yet we feel it’s just inherent human nature, right? To think that your mistake is the worst mistake ever made, and oh my god, they’re gonna fire me or they’re gonna like, black list me or something bad is gonna happen. I’m never gonna work again in this at the end of my life. And like, we just have this habit, I think even as like from like kids through adulthood, I always hope that that feeling would go away, and it never has. Why?
Lisa Karlin Curtis 13:40
I think it’s kind of it’s partly the imposter syndrome thing of like, the more you progress, the more responsibility to have you have, the more you can see all the things that you don’t have to do. But I think that also there is a there’s another part of it, which is we, I think we inherent we net, we inherently strive for perfection. And we want ourselves to be perfect because it’s quite inconvenient that we’re not, because every single decision you make, you’re like, I think I’m right. As a software engineer, you spend your entire day going. Yeah, I think this is right, let’s do it. And if you’ve got a little voice in your head all the time going, what if you’re not, that becomes really stressful and really tiring. And so what we do is we go, yeah, I’m right. Most of the time, this is kind of fine. And then when something goes wrong, that punctures that sense of confidence, and then that’s really destructive. And then we get really stressed because we think that the only reason anybody has hired us is because we’re right all the time. And they haven’t. But because that makes our day to day easier. I think it’s very easy to kind of fall into that and fall into the trap of almost believing your own rubbish that you are, in fact, perfect and flawless.
Jason Baum 14:53
It’s so funny because it’s even, I would say probably if not the most common one. have the most common questions that come up in an interview processes. Name an incident that happened that you like, for example, where you made a mistake, but how did you handle it? And how did you overcome it? How did you overcome it? Right? That’s like one of the most common questions, I think. If you haven’t, if you haven’t had it, I don’t know, maybe you haven’t ever applied for a job before in your life? Because I think it must be the most common one. And yet, when you’re hired to do a job, it’s almost like yeah, we do strive for perfection. And that question went out the window. It’s like, now I can’t make a mistake.
Lisa Karlin Curtis 15:36
Yeah, it’s, it’s really, it feels like one of those things that there should be a better answer to as well. But I don’t think there is I don’t think there’s a silver bullet. I think like, you talk about it, you lead from the front. You try and catch it when it does happen. And, and you hope that slowly but surely, people, you know that that feeling of wanting to hide it, that feeling of shame just becomes less and less strong, and the muscle, you develop this muscle of overriding it. But you know, I’m talking about this in evangelizing, I still absolutely have that instinct. The only thing that I’ve learned is I have a muscle that I can now be like, I can recognize it. And I can look it in the face and be like, we’re not doing that today. Because that’s not useful. But it’s it’s definitely an active thing. It’s not it’s not a sort of default.
Jason Baum 16:23
Yeah, yeah. And, and we have to look at it, it’s based on what you’re saying. It’s like isolated in each incident is an incident, isolated, we have to take care of it, learn from it, and just move past that. If I’m gathering that.
Lisa Karlin Curtis 16:40
Think emotionally, absolutely. I think there’s there’s another side to that, right, which is, if you as an organization, I think incidents are really important source of data to understand where you should be putting your chips. So if you have good reporting, if you can look at your incidents in aggregate, if you have like a good way of recording them, and categorizing them, then you can start to use that data and start to go, oh, this bit of tech seems to cause us a lot of problems or, you know, this process seems to be really risky for us. And now that will help that will help us decide like, where are we going to invest next. So I think from that point of view, you don’t want to kind of leave them behind the tool. And that’s in direct conflict with what you want people to do emotionally, which is to, you know, be there in the moment, solve the problem, close it and not worry about it. And I think that that’s quite difficult to manage, because you simultaneously are telling people to leave it behind, and also telling them to constantly be thinking about them and be you know, sitting in a quarterly review being like, what went wrong this quarter? What do we want to invest in to help make our platform more reliable, right?
Jason Baum 17:43
You know, what it makes me think of, and apologies ahead of time, American football, we have a quarterback and the quarterback, throws interceptions. It’s a given thing. Everyone knows their quote, no matter who it is Tom Brady is I mean, they’re going to throw interceptions. And yet, they strive for perfection. Because what sport what athlete doesn’t, right? And then when they happen, the one thing that I would say is in that culture is they get on the phone, or they go next to the offensive coordinator coach, and they look at what happened in that play. Here’s what happened. Here’s why he didn’t see it. This is what happens. And then they are supposed to forget it. Forget it ever happened and move on? Because how do you move on with the rest of the game, if all you’re thinking about is the one big mistake you made? And I just I think that for me, when you’re when you’re talking about kind of forgetting it, that that instantly popped into my head? So many things could be applied to that, I think.
Lisa Karlin Curtis 18:44
Yeah, I think also when we talk about incidents, there’s nothing that’s specific to engineering about them. Really. I think the engineers talk about them a lot. We have a language that we discussed them in. But there are loads of examples of incidents that are not engineering. And I think almost all the stuff that we discuss around incidents being a chance to build your network with other people being a chance to touch things that you don’t normally interact with, you know, being a chance to watch what your system does when it fails. Like that feels very engineering. But actually, if you’re a customer success team, you know what happens when your processes fall over? What happens when the person who’s doing all of the glue work has gone on holiday and all of a sudden something bad happens? And like you’re still stress testing, you’re still finding the edges. It’s just a slightly different environments.
Are you looking to get DevOps certified? Demonstrate your DevOps knowledge and advance your career with a certification from DevOps Institute? get certified in DevOps leader, SRE or DevSecOps, just to name a few. Learn anywhere, anytime. The choice is yours. Choose to get certified through our vast partner network self study programs, or our new skillup elearning videos. The exams are developed in collaboration with industry thought leaders, and subject matter experts in the DevOps space and Learn more at DevOps institute.com/certifications.
Lisa Karlin Curtis 20:08
I think what’s what I find interesting about that is that you you start at, you’re like, oh, when things go wrong when it’s bad. And we’ve had a number of incidents where I would say the net impact on our company has been positive. Because somebody reports it, we see it, we’ve got some really quite good observability. So often, we can like, find it pretty quickly fix it, turn it around and say half an hour. And the customer ends that interaction, actually feeling better about us than when they started, which is probably quite counterintuitive, because really, if we just hadn’t broken it in the first place, maybe that would have been better for them. But we we’ve joked internally about maybe we should deliberately come up with bugs because of how how much like great feedback we get when we fix things from people,
Jason Baum 20:51
right? I feel like that’s the evolution, right of any good product is the feedback. So, you know, in thinking about letting it go and thinking about these improvements. I feel like there must be obstacles, though, besides ourselves, right? And our own internal turmoil that we put ourselves through, when we make a mistake, or when an incident happens. There’s deadlines, and you need to hit them, you need to meet them. And when you miss them, that’s a big deal. So how does that play into when incidents happen? How does that how does that, I guess, impact that feeling that we’re already feeling right, the shame that you talked about, and then we have this deadline looming over our heads?
Lisa Karlin Curtis 21:43
I think that’s really interesting. I think it’s very, very difficult because you have a trade-off generally. So normally, there’s that there’s a triangle of like, speed, and quality, and the common warts on the other end of the triangle. But the idea being you have to, you have to trade off something number of people, maybe, maybe we should just scrap all of that, I’m gonna start again, that’s fine. I think there’s a trade off here. So when something goes wrong, the first thing is I need to fix the things broke. And that takes you however long it takes you and basically nothing else matters. Generally, there is there is a sort of a type of failure mode, where you’re just trying to bring your system back up, or resolve the bug or stop anything getting any worse. And that’s a really easy decision, because it’s there, it’s on fire, we got to fix it. And then you get to a sort of second stage of an incident, which maybe is like follow-ups. Or maybe you’re still kind of in the incident mode where nothing is on fire anymore. But there’s a lot of things that you could do that would make it less likely to happen or resolve it in a neater way. And that’s where you need your strong engineers to come in and make those trade offs. And it’s like, what is the value of this piece of work? How long is it going to take us? How much in the wrong direction? Is it from what we thought we were doing? And can we afford to punt it? Can we afford to delay it? And you get to this point where you’ve got a deadline, people are set of problems. And you need to make a decision about basically which is more important. And that is a that is a strange shootout trade off often because it’s like, we have four people in our team, we have two weeks, what shall we do? And the answer to that is not worth 60 Nowadays, because in all likelihood, you won’t get anything more than in my experience. And so instead it has to be right which of these which of these is more of a risk to us? What happens if we missed the deadline. And that’s a decision that needs to get escalated to someone who has the authority to make that call and the information. So that person that means that you have to make that information really available to them in terms of, you know, what, what is the work that we could do to mitigate it? What would it be mitigating? And versus how far behind? Are we on deadline? What does it mean, if we don’t get the deadline? And that is one of those. I think the lots of people have this, oh, we’ll find a creative solution. And sometimes there’s a creative solution and somebody’s overskirt something and actually, it’s all gonna be fine. And sometimes there isn’t. There isn’t enough time, and you have to pick something. And I think identifying that is really important and being really honest, from a kind of motivation and human point of view. I think the the times where I’ve seen that go badly is when people either kind of try and have their cake and eat it and sort of say, Oh, I know you said you can’t do these two things. But what if I told you you could. And then there’s another problem where there is a trade off and nobody makes the decision. And then you just end up in a situation where both like the team is all kind of looking at each other being like, do we make the decision now because we don’t think it’s our choice, but but I guess no one’s telling us what to do. And then somebody shouts at them afterwards because they made the wrong decision.
Jason Baum 24:51
I think that leads us right to my next question. It’s what can the leaders do to make this mindset part of the culture of the organization and encourage it across all teams.
Lisa Karlin Curtis 25:06
So I think the first thing to look for is lookout with anybody playing the hero. In all organizations that I’ve ever seen, there are a group of people who take on more than their fair share of the burden of dealing with incidents. And we could call them heroes. And that is really good until it’s really bad. And it’s good, because they’re probably very good at dealing with incidents, because they’ve had a lot of practice. And they’re often the people who’ve been at the company for a long time. But it’s bad because it means that nobody else is learning how to do it. And so all of those benefits that we talked about right at the start on, no one else is getting that. So they’re kind of gatekeeping, the skill needed to debug these issues. And that’s very problematic because it stunts other people’s growth. And when that person burns out, or when that person goes on holiday, or when that person leaves the company, you’re suddenly in a really bad situation. So you end up with these really bad key man dependencies. And as a leader, I think it’s really important to identify those patterns. And if you’re, if you’re using tooling you can look at who’s answering who’s getting paged, who’s taking your on call load, you can look at your incident, who’s leading your incidents, you know, have you got somebody who’s leading 50% of your incidents? That’s probably not a good sign. And you can use that data to find those people and then chat to them Be like, why are you doing that? And probably the answer is, well, because I think it’s useful. And that’s like, great. But now we’re gonna have a conversation about why we need to spread this load out of the team. And it’s a combination of like protecting you and your mental health, frankly, but also, it’s about spreading the knowledge and spreading the experience. So I think that that kind of pattern of having those superheroes is really damaging. And it restricts your ability to scale your incident response. And then, as a leader, the other things you can do is encourage people to show that working. So if you want people to learn from incidents you to make that information available to them. And then you need to make it accessible by which I mean available is like, have your conversations in a public Slack channel. Ideally, use some incident tooling so that you can curate those conversations and build a timeline that somebody can interact with, write a post mortem, share the post mortem. And then by accessible, I mean, try and make it really easy for people to get that information. So have it in a knowledge base that people can look at to find something that they’re interested in. And if you’re, it’s like step one, write the thing. But if you just write a post-mortem that goes into draw, no one’s really one at that point. So push it out to people and make it clear to people that reading those materials is part of their job and considered a very good use of their time. And that’s a difficult balance because there are some, sometimes you need people to ship stuff. But I think it’s important to talk about this explicitly and talk about the fact that if you look at what other people did in their incidents, you can build better software, you’re going to have less, fewer incidents or fewer, you’re gonna have fewer incidents, or your incidents are going to be less severe and easier to debug. And that then means that you’ll be able to sort of teach the next generation to the next generation, and you get this great positive feedback loop if everybody’s talking about it, and learning from each other, as opposed to the negative feedback loop, where people are keeping it very secret where people are gatekeeping it and where not everyone is getting involved.
Jason Baum 28:17
I feel like the word of the day, the word of the day is transparency. I feel like that’s pretty much what you’re saying. Not like not to put a word in your mouth, because I don’t think you’ve said it specifically. But what I’m hearing is transparency, transparency, and transparency. As someone who works with the engineering team, or you know, I’m just relaying information from people to people, on a leadership team, for example, and half the time with a deadline. It’s because the leadership doesn’t necessarily understand it. They don’t necessarily know, what is the specific issue? What is the specific reason why a deadline isn’t being hit? Or? Or, you know, and that’s where I feel like the culture part can sometimes go awry, right. We allow it to happen when there isn’t transparency.
Lisa Karlin Curtis 29:10
Yeah, I think that when, when you lack transparency that it gives humans a lot more remit to try and get what they want from bad, bad ways, basically. And if you don’t have transparency, you can lie. And you can put forward an argument that suits whatever you think your goal is. And in an ideal world, everybody in your organization has exactly the same goal, and they’re all pulling in the same direction. But in reality, often people view their goals as being slightly in conflict with other people’s goals because they’re trying to get more resources for their team because they think their problem is the most important thing. And if you if you don’t have transparency, it’s very difficult to hold people accountable to those things. Whereas if you do and if people are very honest and open, then you The organization can make the right choice for the organization. And if you think about a sort of individual first versus organization, first type culture, ideally, as an organization, you should be putting your chips and the thing that is most important, not on the thing that has the best argument. And the way that you don’t make that mistake is to use transparency, and to be open and honest and like generate that culture and generate the culture of it being okay to make mistakes. And also it being okay to say, I don’t think this is so important. And that does not know that that’s not gonna impact your career. And I think that that’s one of the reasons why people don’t do that. If because there is this view that like to get promoted, or to get that job that you really want to get influenced, you need to be the most important person, you need to be doing the most important thing. And none of us are always doing the most important thing. i This week I work I’m not doing the most important thing that is very clear. And I and that’s kind of fun. But it also means that if somebody else needs an extra pair of hands, I’m going to jump on their thing I’m not going to just pursue with mine because I want to look good. And that’s the sort of team-first thinking that I think you need to try and get into your cultural DNA as an organization.
Jason Baum 31:08
I love that I would love to hear in a team update with the company. This week, I’m not working on the most important thing. You know, I don’t think we ever hear that. Because everyone who does want to be the most important, I think. So, you know, you said accountability. And I hadn’t planned on asking this question, but it does now trigger something in me. Incidents are okay. And they are learning experiences. We just spent the past 30 minutes talking about it. But when is how does accountability play into this? When our incidents? Not that they’re not okay. But when do we need to hold accountability? Because I think there needs to be an element of accountability. How does that play in? Especially in a blameless culture,
Lisa Karlin Curtis 32:02
I think it’s a really difficult balance. And I think I’d come back to the stuff I was saying about failing together. So I think that you, you can, as a team, you take accountability for what happens in your team. There are obviously occasions where somebody goes rogue and does something that the team thinks was terrible idea that that’s sort of a HR issue, frankly, and I think it’s very separate. But generally, it’s you as a team have made some choices. And you’re now looking at the consequences of those choices. I think that the way to hold teams accountable around incidents is the same way that you hold teams accountable for any kind of delivery. So if you imagine an incident is normally something that that team maintains, has broken in some way. And so that team has a kind of agreement or a contract with the rest of the org that they will have this service and it will do this stuff. And sometimes it won’t, because incidents happen and mistakes happen. I think you make people accountable by making them transparent, and by making them expose the trade-offs that they’re making. And so as an example, if you’re in a team, which is under loads and loads of pressure for time, and they’re like, you’ve got to shut this thing as quickly as you can, if you as the as the senior engineer or the tech leader are looking at them saying, cool, we’ll do that. But there’s risk. And these are the things that might go wrong. If we do that, are we comfortable with that risk? Then you’re accountable. Because if something goes wrong, it’s either like, yeah, I said, these things will go wrong. And we talked about it. And we decided we were okay with the risk, or it’s something completely different has gone wrong, that is maybe significantly more severe than the things that we thought my right let’s talk about why we thought this was safe. And why we thought why that wasn’t in our risk assessment. And so you’re holding people accountable for the trade-offs that they’re making. You’re not holding people accountable for an individual thing that went wrong. It’s not like why did this happen? It’s why did you think that we should take this risk, what pressure was put on you, and let’s look at it like at a system level, as opposed to some person pressing a button on a backfill, and it made the database really sad. And now we’re going to run around screaming saying that they should be fired. And I think the other thing about accountability is it’s about time. So I think incidents are generally a lagging indicator, as opposed to a leading indicator, if that’s terminology people are familiar with. leading indicator basically means you find out pretty quickly whether a choice you’re making is good or bad. And a lagging indicator is something where a choice that you make has impact sometime in the future. And because incidents are a lagging indicator, often the people handling the incidents are not the people who made those trade-offs. And that’s really important to recognize when that is true. And to understand what is the what are the root causes to have that kind of discussion whether you go down the five why’s route or some other route But to have a really meaningful discussion about what were the choices we could have made to avoid this. And why didn’t we make them? Was it because we had loads of pressure on delivery? Was it because no one was thinking about it, and we just didn’t think it was a risk. And then that’s the problem that you have to solve. And those are the things that you can hold people accountable for think.
Jason Baum 35:18
Awesome. Thank you for answering that. One that I don’t know. It just came when you said accountability. It’s it just popped into my head because I feel like all we’ve ever heard for years was accountability, accountability, who’s accountable for this? who’s accountable for that? All the accountability, stuff that people say? And, yeah, it’s hard to be blameless, when, when that’s what the buzzword was before blameless. So now, we’re at the point of the podcast, where I like to ask sort of a fun question of you personally, because this is the humans of DevOps, and we’re all about the humans. So if there was one thing that you could be remembered for, what would that be?
Lisa Karlin Curtis 36:06
I think I would like to be remembered as someone who he made systems work better for people.
Jason Baum 36:16
Great. I think that’s, that’s certainly applicable for today’s conversation. Well, thank you, Lisa, so much for joining me today. It was an absolute pleasure.
Lisa Karlin Curtis 36:28
Thanks so much. I really enjoyed it. And thank
Jason Baum 36:31
you for listening to this episode of the humans of DevOps Podcast. I’m going to end this episode the same way I always do encouraging you to become a member of DevOps Institute to get access to even more great resources just like this one. Until next time, stay safe, stay healthy, and most of all, stay human, live long and prosper.
Thanks for listening to this episode of the humans of DevOps podcast. Don’t forget to join our global community to get access to even more great resources like this. Until next time, remember, you are part of something bigger than yourself. You belong