×

DevOps Institute

[E22] Observability: What Is It And Why Are Enterprises Now Paying Attention To It?

DevOps Basics, Podcasts, SRE

August 18, 2020

Josh Atwell, Senior Technology Advocate at Splunk, shares his insight about the rise of interest around observability and why enterprises should pay attention. Josh also provides some simple steps for getting started down the observability path.

The lightly edited transcript can be found below.

Intro:

You’re listening to The Humans of DevOps Podcast, a podcast focused on advancing the humans of DevOps through skills, knowledge, ideas and learning or the SKIL Framework. Here’s your host, DevOps Institute CEO, Jayne Groll.

Jayne Groll:

Hi everyone, it’s Jayne Groll, CEO of the DevOps Institute, and welcome to another episode of the Humans of DevOps podcast. I’m very excited today to invite someone I’ve known for a long time, somebody I respect a great deal, and also somebody I’d call my friend, Joshua Atwell, Senior Technology Advocate at Splunk. Hey Josh.

Josh Atwell:

Hey Jayne, how are you?

Jayne Groll:

I’m good, I’m good. We’re going to have a fun conversation today, Josh, because we’re going to talk about this ground swell of interest in a new pattern known as the observability.

Josh Atwell:

Yeah.

Jayne Groll:

And it’s interesting because I looked up the dictionary definition of observed, and it’s really fascinating when you kind of see that observe is to notice, right? To see, to watch. And of course that’s in common language, but you know, suddenly there’s this paradigm shift towards observability where a lot of enterprises are paying attention to this.

Jayne Groll:

So I’m going to ask you to just help us understand, what do you think is behind that groundswell, and why is observability something that enterprises should be paying attention to?

Josh Atwell:

Yeah absolutely. Well, first off, thanks for having me. I’m really excited to be here, and I reciprocate all of those feelings. It’s really great to get to talk to you. As with all things in our industry and every industry, when you see new things emerge, they’re almost always related to pain. And so pain, I think, is what’s really driving the groundswell, and that pain most specifically is around the complexity that our new system architectures have, our new application stacks have.

Josh Atwell:

We’ve really moved a long way from a lamp stack. We’ve gotten to where we’ve gassed up all over the place, in fact, I was having a conversation with someone earlier today, talking about a maintenance window that organization’s having, and they’re like, well, you should just have your application distributed in multiple regions so that doesn’t happen, and you don’t have to suffer that outage. I was like, “Look, latency and data gravity and data proximity, those things aren’t easily resolved just because you can have things in multiple regions. There’s a lot of technology that was never built for that.”

Josh Atwell:

So on top of that challenge, the adoption of cloud, adoption of new staff services, implementation of containerization, moving applications outside of an operating system into that run space, now building services and APIs versus everything being rolled into an application, just the volume and the variety of data that people, or just the complexity that people have, is just creating this tremendous pain. And observability is now a framework in these tool sets that are designed to pull data from the environment and give you that ability to observe and witness and notice what’s going on in your environment that you may not have been able to see previously.

Jayne Groll:

So how is observability different than monitoring? We’ve been monitoring for a long time and organizations have invested a lot in event management, or alert systems, or just monitoring in general. So what’s the difference?

Josh Atwell:

One example that I’ve seen that I really like is, assume you have a pie… Oh, thank you, sweetie. Just got delivered chocolate chip cookies, perfect for the podcast. You are all jealous now.

Jayne Groll:

Send some along, please.

Josh Atwell:

Okay. So one of the ways that I’ve seen it described that I’ve really enjoyed, and I thought was useful is, you can have a pipe with fluid going through it, we’ll just say water, you have a water pipe. Now, you know that water comes in one end of the pipe, water comes out the other end of the pipe. Let’s say there’s some filtration and some other things going on. You’re not always certain what’s happening between the two ends of the pipe because you can’t see inside of the pipe, and so you look for certain things, like maybe changes in temperature, changes in pressure, particulate that shows up on one end that didn’t go in from the other, to try to understand what’s happening within the system inside of that pipe.

Josh Atwell:

Individual components of that would be monitoring. I look at that as that is monitoring, you’re looking at monitoring the temperature, right? I’m monitoring the particulate, I’m monitoring the pressure. When you put all those things together to come to a summation of what’s happening into the environment, that I look at as observability. That is getting that large picture from those independent nodes or those independent sensors, or those independent data streams, to be able to give you that larger picture and larger understanding of what’s happening in that seemingly complex system.

Jayne Groll:

Yeah, I think you’re right. I mean, I ran IT ops for a fairly long period of time and when we monitored, it was pretty discreet,

Josh Atwell:

Oh yeah, absolutely.

Jayne Groll:

Yeah, so you were kind of looking at monitoring really through a blinder, depending on what you were looking at. And of course, then we would say, well, everything’s healthy, because mine is healthy. And so it came, you only looked at a certain parameter or you only looked at a certain component, and as long as that component was working, it also kind of gave a little bit of way to blame. Oh, mine’s working, yours isn’t working.

Jayne Groll:

The other thing I think is interesting is kind of a little bit of a difference between reactive and proactive. So I think observability has so much more proactive, because you’re at it from a broader level. Is that a fair understanding, that it is more proactive than reactive?

Josh Atwell:

I think when I look at how you apply a proactive lens, proactive versus reactive lens, I think the outcomes and the requirements are what would drive that. Monitoring is tell me what’s what’s happened, whereas I look at observability more of, give me a sense of change of state overall. Where you have an opportunity to be a little more proactive, because if you’re looking at the system as a whole, and you’re looking at multiple components working as they work together with their interdependencies, you start more quickly understanding how a service is being degraded and the impact that it has throughout the system.

Josh Atwell:

If you’re getting a lot of failures on an API call or a service trigger, maybe you have a serverless function that isn’t triggering properly, very quickly see the metrics on that can start tallying up. And so that gives you an opportunity to say, Hey, this is an abnormal state, abnormal condition, that is going to have a cascading effect. We have a better understanding of that cascading effect. We can either build in redundancies or health remediation to resolve those problems. We need to scale out, scale down, restart whatever.

Josh Atwell:

But that’s your proactive nature. I mean, you’re still having to react to changes in the system, but your ability to react and proactively prevent a service outage, or prevent a larger issue that would impact users, because that’s the other side of the pain, right? I’ve talked about this a lot in various channels and platforms. When I started in IT, all the consumers of it were people that worked in the company.

Jayne Groll:

Right.

Josh Atwell:

It wasn’t customers. Adrian Cockcroft actually quoted this a while back, and he talked about, today’s IT, the technology is directly touched by customers. It’s these mobile phones. These mobile phones are a big part of that because now I have connectivity to hundreds of companies well beyond just their website.

Jayne Groll:

Yeah, I think that’s really interesting too because you’re right. In my definition of IT, my customer was an internal business unit. That’s who I served, and that’s who my team served, and that’s where we kept the lights on. And we felt like we were contributing because we could be an enabler, a change agent for that. But you’re right, in today’s very interconnected world, the ability to understand from an external point of view what’s happening to your internal system, I like the word infer. Every time I look at a definition of observability, the word infer is there, because you can infer what’s happening, not only from the hey what’s going on in the marketing department, but what’s happening in terms of your business customers.

Jayne Groll:

And I think now, in particular, this crazy year where the only ones that have been able to really serve are those that do provide technology, and looking at capacity, looking at performance, looking at lights on, looking at resilience and reliability, this is a really interesting year where it’s got all of that in the spotlight.

Jayne Groll:

You know, you mentioned data. And so one of the things I find really fascinating, just in general, is the relationship between observability and data generally. I mean, you mentioned very complex systems, and the data can be overwhelming. So you can have very complex systems, you could be getting lots of data from your systems, but making sense of that data, I think, is an overwhelming challenge. And so what happens is we cherry pick the data we like, or the data that’s telling us what we want, and then we just kind of ignore the rest. So what’s the relationship there?

Josh Atwell:

Well, the key thing is, is that without the data, you have nothing, right? You have no chance to monitor or observe.

Jayne Groll:

Right.

Josh Atwell:

You know, the data is a requirement. The data is also your indicators. Even a tweet from a customer who says they’re unhappy about something, that’s a piece of data, that’s an insight into your system, how it’s being used and how it’s being received. And so the data is that critical element and that critical thread, and you actually highlighted something that’s most important about data, and it’s something that I talk about a lot with our customers and with partners. What questions are you asking of your data?

Josh Atwell:

If your only question is, is your a service, are you online or not? Well, that’s what you’re always going to work with. But you could also ask your service, are you okay? Are you handling the load okay? Do you need some help? Could we do something to maybe lighten the load on you as a service? I mean, it sounds kind of silly to say that like your anthropomorphizing a service.

Jayne Groll:

Rex and I have that conversation every day.

Josh Atwell:

Yeah, exactly. But you could ask, you could just simply change your question and as such, you have to look at different data, or perhaps look at your data in a new frame of mind or a new light. Generally speaking, you’ll have the data that you need that’s there, like it’s accessible, it’s just a matter of accessing it, putting it into a platform and reviewing it in a manner that can give you some actionable insight, and to be able to answer that question.

Jayne Groll:

Yeah, and I think, Josh, you’re hitting on something that I think is really important and that’s that there’s a culture that sits behind that, because if the culture is, let me just ask the question, are you online? Great. Move on.

Josh Atwell:

Mm-hmm (affirmative).

Jayne Groll:

That’s it for me. But there is a culture of observability that encourages you, whoever you are, to ask other questions.

Josh Atwell:

Yeah.

Jayne Groll:

And some of those questions, I know as if it is a human being like, Hey, are you okay? Can I help you? But at the end of the day, having that kind of mentality of, are you okay, system, some day the system will talk back and go, yes, I’m okay. But not today.

Josh Atwell:

I hope not.

Jayne Groll:

No, not today.

Josh Atwell:

Well, I’m feeling a little overwhelmed today.

Jayne Groll:

Exactly, exactly. I’ve decided I’m taking the day off. But having that kind of culture where we’re inspired to not only ask the questions or look at the data from a check, check, check, check, but also having more of a relationship with the data, that the data tells us, hey, maybe we do need to do something. Maybe we can offset some of the challenges that are associated with it.

Jayne Groll:

So talk a little bit, what are you seeing in your customers as far as like cultural changes when they start to kind of take on an observability approach?

Josh Atwell:

I think the most interesting component, from a culture standpoint, is that regaining sense of control, and just that you have a handle and understand what is actually happening, how changes you make to the system, how it responds to changes in customer behavior, how things respond, that it’s empowering. And as such, I think people are able to move with more confidence.

Josh Atwell:

I think, for instance, like the early days of cloud adoption. Even if you were gung ho, like I’m going to go and I’m going to develop it, I’m going to build in the cloud. Not knowing what’s happening underneath, and all the uncertainty we had in the early days of cloud, that was confidence shaking. It slowed down. The culture then shift as we gain confidence and gain better understanding and knowledge and awareness of what the underlying systems were delivering to us and what we could expect from them, that empowered us to do more and to take on bigger challenges.

Josh Atwell:

And so I think that’s the big culture shift is that as you adopt these frameworks, as you learn to ask more from your data, get better sense of what’s happening in your environments, and how your environments are impacted by various variables, and being informed and feeling like you have some sense of knowledge and control. It’s going to lead to confidence, which is then going to lead you to be more effective, not only in delivering on your mission, but in delivering in the systems that you’re using. You’re going to feel confident that what you build is sustainable and manageable and delivering on that mission.

Jayne Groll:

Yeah, and I think there’s an intelligence that’s available today that maybe wasn’t available in the past, and we’re starting to see more of that intelligence built in. And so if you have intelligent process, if you have intelligence built into your data platforms, and into your observability platforms, that makes it a little bit easier.

Jayne Groll:

So all of this sounds great. So I’m an enterprise CTO. I’m really looking at my SRE teams. I’m looking at the future of ops. How do you get started? How do you shift that approach from say a monitoring approach to observability. What are your kind of critical success factors? How do you get going?

Josh Atwell:

I think the first key thing is to recognize that you have visibility gaps that you need to fill.

Jayne Groll:

Pain, right? Pain.

Josh Atwell:

It’s pain, yeah. Interestingly enough, not everybody recognizes their own pain as pain, and more often than not it’s because they haven’t been exposed to solutions that can prevent and alleviate that pain. And so I think that’s the first step, is just being very honest and introspective on how much pain do we have?

Josh Atwell:

One of the things I love when talking to customers about the world of opportunity is asking questions like, do you know for certainty if X relates to Y, and what impact that has to your business, to your customers. And more often than not, they’re like, yes, we’ve thought about that, but we don’t have a great way of answering that. That’s your first step, recognizing that, and then seeking out that answer and looking at that curiosity on how we can do that. Because unless you know that you have pain, you’re not going to look for a remedy for the pain. So that’s my recommendation for a first step.

Jayne Groll:

But I think you said a really interesting word in terms of curiosity. I think everything that I’ve been learning about observability, and again, I’m not in the trenches anymore, but it does require a curious mind, a curious culture that says, let me ask the next question. So maybe you don’t feel the pain, but maybe you recognize that there are gaps, and those gaps you might have overlooked in the past, but again, in today’s very, very fast paced world where we’re putting a strong emphasis on reliability, we’re putting a strong emphasis on resilience. Not that we didn’t in the past, but now we’ve got frameworks around that. Now we’ve got tooling around it. Now we’ve got culture around that.

Jayne Groll:

So part of that groundswell, of course, is coming out of that, but also part of it is just necessity. Things are moving very fast. And if you’re an organization, I was talking to a CIO recently, a 9,000 person organization that’s 200 years old, try shifting that on a dime to operate in a different way, but that’s what they learn. They learned that we can honor the past, but we can’t be married to it. We have to be able to look at different ways to do that. So I think curiosity is an important observation about it.

Josh Atwell:

It’s interesting, this past October at Splunk Comp, I actually had the opportunity to interview a bunch of people. I was working on a video program that we were piloting, and in the interviews, I asked the question, what skill or technology do you think is going to be most critical for IT professionals and technologists in the next two years? And I was shocked when the majority of people said it was going to be either security, which made sense, security skills. Curiosity was the other one.

Josh Atwell:

I was not expecting it. The top skill needed today was communication, the ability to communicate and collaborate and to work well with other people, as we saw more organizations kind of meld in their work, particularly around dev ops and such. But yeah, they said that going forward, the most important skill is going to be curiosity, and recognizing that there’s always something there that can make things better and improve. And I was like, that’s really interesting.

Jayne Groll:

It’s funny you say that because DevOps Institute does its annual project, the Upscaling Enterprise DevOps Skills Report, and so the survey is going to actually launch in a few weeks for 2021, so we’re very excited about that. This past year, in 2020, human skills, well, for the past two years, human skills have been equal to automation and process skills. So curiosity is a human skill, right? At least today it’s a human skill. Someday when the robots take over, maybe not. But today it’s a human skill.

Jayne Groll:

It’s really fascinating when you look at human skills that have really emerged as being critical, empathy, right? Systems thinking, which is very much a part of observability.

Josh Atwell:

Yep.

Jayne Groll:

Design thinking, collaboration, communication, things like that. It will be very fascinating for us to look at the data that comes in from 2020, the year that was not a usual year, is being nice, and seeing where the ship goes in terms of, now what do we think the critical skills are? And I’m hoping that curiosity, which again, is very human, is something that emerges that organizations want to groom, they want to hire curious people, and they really want to do some intelligent risk taking. So thank you.

Josh Atwell:

Absolutely.

Jayne Groll:

I think, Josh, that you’ve given us a really nice, we talked about a groundswell, but we also want to be able to take this topic that people are hearing a lot about and making it digestible. Making it something that, in our space we have so many terms, you mentioned serverless, right? So there’s another new term that’s emerging. We have so much new vocabulary that organizations, particularly at the enterprise level, are trying to sift through.

Jayne Groll:

But observability, I mean, we’re seeing that, again, I’ll plug our SKILup Day in a second, we’re seeing that just in terms of the interest in observability, from a practitioner perspective. They don’t necessarily know, or they haven’t necessarily started down the path, but they’re curious, and I think that’s going to be a part of it. So thank you, I always enjoy spending some time with you. Usually we’re in person, so someday soon, I hope, my friend. I really do.

Josh Atwell:

I’m offering up a socially distant hug.

Jayne Groll:

Oh, there you go, yes, a socially distanced hug, I love it, that’s great. So anyhow, if you are listening and you are interested in learning more about observability, DevOps Institute’s SKILup Day in August is all about observability. So we have a great lineup of speakers, we have, as always, the expo hall with sponsors. We have the chat lounge, Murda will be back doing mixology. We’re doing yoga at the beginning of this SKILup Day.

Jayne Groll:

So August 20th is a SKILup Day on observability. Go up to the DevOps Institute website, register, it’s free. It’s really a fantastic event, and I really hope that we see more of you there. As always, Humans at DevOps podcast, we try to bring you industry thought leaders. Josh, really, thank you so much.

Josh Atwell:

Absolutely

Jayne Groll:

We’ll talk to you soon.

Josh Atwell:

Sounds good.

Jayne Groll: Stay well everyone.

Outro:

Thanks for listening to this episode of The Humans of DevOps Podcast. Don’t forget to join our global community to get access to even more great resources like this. Until next time, remember, you are part of something bigger than yourself. You belong.

Promo graphic of a giftbox with SKILup IT Learning logo

Community at DevOps Institute

related posts

Continuous Testing SKILup Day Event Recap 2022

Continuous Testing SKILup Day Event Recap 2022

Continuous Testing SKILup Day on December 1, 2022, offered insightful sessions, yoga, networking, a sponsor hall and a DevOps-inspired mixology class!  If you missed the event, we’ve got you covered with a round-up of the top themes from the sessions and conversations...