- AIOps and how AIOps fits within DevOps Culture
- How do end customers/users/humans benefit from AIOps
Thank you to our sponsor Range!
The Humans of DevOps Podcast was Voted one of the Best 25 DevOps Podcasts by Feedspot.
Want access to more content like this? Gain the tools, resources and knowledge to help your organization adapt and respond to challenges by joining the DevOps Institute Community. Engage in DevOps In The Wild, one of the fastest-growing DevOps communities today! Get started now! https://www.devopsinstitute.com/membership/
Have questions, feedback or just want to chat? Send us an email at email@example.com
Lightly edited transcript below
You’re listening to the humans of DevOps podcast, a podcast focused on advancing the humans of DevOps through skills, knowledge, ideas and learning, or the SKIL framework.
Richard Whitehead 00:17
Right, so the good news is people’s jobs like most definitely not at risk. The problem we’re trying to solve or solve, is it’s increasing at a greater rate than I think we can. We can sort of solve the problem. So the expanding nature of the problem, I think, secures people’s employment for a very, very long time. I think in every aspect of any form of digital transformation, when you look at any aspect of the business, it doesn’t matter how much effort how much code rewrites how much automation we do. The opportunity to refer to this as an opportunity, not a problem. The opportunity is increasing. So, so fast, that I don’t think anybody’s going to be able to jump anytime soon.
Jason Baum 01:08
Hey, everyone, welcome back. It’s Jason Baum, Director of Member experience at DevOps Institute. And this is the Humans of DevOps podcast. I hope you had a great week. We’re glad you came back to join us. So let me take you back to the 80s The machines are coming to get us scenes in lines from The Terminator and its many sequels are forever etched in our brains. It hauntingly depicts a world hell-bent on technological growth that led to the rise in advanced machine learning techniques and artificial intelligence that would ultimately lead to the world’s demise. Well, we’re on our way there, folks. But instead of doomsday scenarios, we’re all picturing. Ai ops has emerged as an essential step forward for enterprises in a variety of different industries. Gartner defines AI ops as artificial intelligence for IT operations, which combines big data and machine learning to automate it operations processes. At its core, it’s all about IT teams and organizations can use AI to manage data in their environments. Through this approach, teams can employ large-scale datasets, machine learning and automation to make ITops faster, simpler and more efficient. Many believe AI will not only impact organizations but will become a major facet in our everyday lives through the emergence of new applications. I mean, we already see ML and AI every day with things like face ID smart replies, product recommendations, chatbots, you name it. Today, we’re going to talk to Richard Whitehead evangelist and chief CTO of Moog soft about what aiops really means for humans, as opposed to being stuck in the 80s version of this story. Sorry about that. Richard, are you ready to get human?
Richard Whitehead 03:04
I certainly am
Jason Baum 03:06
First of all, thanks so much for coming on. appreciate having you here on the podcast. And my apologies for taking it back to the 80s. But every time I talk about AI, my mind immediately goes to the machines are coming for us.
Richard Whitehead 03:25
And certainly, when I started working in this space, I think every single PowerPoint presentation I saw both internally and externally had at least some reference to Skynet and a terminator. So yes. It’s not that far back.
Jason Baum 03:45
Yeah, no, it’s not that far back. And I think people are still scared by it. I have a funny story that I’ve told. Perhaps in the past, I’m trying to remember maybe not, of my mother actually keeps her Amazon name cannot be said because it’s actually in the room with me and we’ll start speaking (Alexa). But we know what we’re talking about. She puts it away. She actually keeps it in a cabinet and when she wants to use it, takes it out and plugs it in and then says the magic words to turn it on because she’s afraid of it listening to her.
Richard Whitehead 04:27
That might be a legitimate fear. It’s always listening that it has to but yeah, you know, I’m a pragmatist. I don’t fear artificial intelligence, but I do occasionally have a sense of disappointment. You know, when my door camera notifies me that there’s a person outside and it’s clearly a dog. And also some, some shopping recommendations. I get, you know I, if I were to buy an item such as a dishwasher, I don’t feel the need to be to have somebody suggest that I buy a dishwasher for the next six months. So I think people, you know, it’s less fear, it’s more disappointment in what AI can bring to you based on some of the more obvious and commercial barons that are out there.
Jason Baum 05:21
Yeah, so let’s talk about what AI is. So I feel like that’s a good place to start. You heard the Gartner definition, want to include that because I always feel like having an academic of sorts definition is important to hear. I’ve also heard someone very succinctly put it that AI, artificial intelligence is simply the problem. And then once it’s solved, it’s no longer AI, which I think is a fascinating way to look at it. I’m curious how you would define it.
Richard Whitehead 05:57
So much. So my definition is a little broader. To go back to the specific definition, the Gartner definition, which is actually evolved into that definition. Because when you put the letters AI together, people automatically assume it means artificial intelligence. So AI Ops is the application of artificial intelligence techniques to IT operations. So that’s really it. And, of course, means that it’s a very broad definition, which means there are a lot of technologies and techniques and solutions out there that all fit into this umbrella definition. So when people talk about artificial intelligence, there’s a general sense that what you’re talking about is technology or computers, that are in some way attempting to replicate humans, and that’s, that’s where the inevitable screenshot of the Terminator robot appears, and so forth. And I think that’s, that’s generally true. So, you know, when you think of chess playing, computers and things like that, and that sense has been largely reinforced by some of the early adoptions of AI in things like service desks. So you know, when you call into a service desk, your first interaction is likely to be with some form of AI capability where it attempts, to give you an answer very rapidly without any form of real human interaction. And, you know, they, in most cases, they’re attempting to sort of pass the Turing test. In other words, you’re talking to a computer and is trying to make it a human-like interaction to be as human-like as possible. So while it’s true for things like service desks, when you when you dig deeper into some technology, and start talking about, you know, the concept of sort of monitoring and observability remediation, and things like that, it becomes less attempting to replicate a human but more attempting to replicate what a human would try and do if they were involved. So it’s not a human interaction, it’s an application of human-based intelligence, but in an automated fashion. So with that broad definition, you’re incorporating not just sort of that chess-playing type thing, that’s actually the least of the component. But you are talking about things like machine learning, you’re talking about sort of some sophisticated algorithms that maybe do linear regression, and you’re talking about some techniques that are in the periphery of artificial intelligence, such as natural language processing, you know, I’ve mentioned already mentioned two that I’m personally relatively familiar with, which is machine learning, and natural language processing. And these are things that you don’t necessarily think of when you’re talking about AI, but they’re absolutely relevant and very pertinent to solving specific problems.
Jason Baum 08:53
And with that, that’s a good lead into this next question is, what are some examples of problems that we’re looking to solve with AI ops and machine learning specifically? And also, when you hear about, okay, we’re looking to solve problems and automate and speed up some processes. I think a lot of the some of the misconception then is, well, now I’m going to lose my job. AI is going to replace me. So perhaps you could address both this in your answer, what’s the problem? And then as we solve it with AI ops, how are we like, are people’s jobs at risk?
Richard Whitehead 09:36
Right, so the good news is people’s jobs are most definitely not at risk. The problem we’re trying to solve, I think, is it’s increasing at a greater rate than I think we can. We can sort of solve the problem. So the expanding nature of the problem, I think, secures people’s employment for a very, very long time. I think in every aspect of, of any form of digital transformation, when you look at any aspect of the business, it doesn’t matter how much effort how much code rewrite how much automation, we do the opportunity, I want to refer to it as an opportunity, not a problem, the opportunity is increasing so fast, that I don’t think anybody’s going to be out of a job anytime soon. In fact, I think if you look at some of the roles, the newer roles that are emerging as a result of digital transformation, such as site reliability engineers, developers in general and operations folks who are emerging in sort of a DevOps type, type capacity, you know, that’s an expanding market opportunity, not a shrinking one. So individual teams might be smaller as a result of, of this technology. But the market opportunity, in general, is such that I think it’s going to give me a long time before the demand for people in these roles cools off. So yeah not a problem there. We’re dealing with an exploding market opportunity. So basically, it sort of comes down to, I think I mentioned it earlier, the notion of automation. So when we talk about AI replicating human activity, I tend to think of it in this sense, when you’re looking at something that a human would do on their date, day to day, sort of line of work, when you’re solving a problem addressing an incident, debugging something, the question we always ask ourselves is, what’s the most common task, what’s the most common and repetitive tasks that a human performance, and those are the things that I think are easy targets for AI, to replicate, because they tend to be the mundane tasks, you know, we tend to refer to them a lot of toil. The stuff that you do every single time, that doesn’t necessarily add value, but it’s just a task that has to be performed in order to move on to the next job of actually resolving the issue or, or finding the error. And that’s something that I think, is sort of overlooked, people tend to think of AI as being an end goal, we’re going to completely replace a human, and you throw data at it and you get a solution at the other end, I tend to think of AI certainly, we sometimes refer to it at Moogsoft as applied AI, it’s basically a very small tool, you can take a very small tool to achieve a very specific task. And it could be something as simple as doing a bit of triage, augmenting some information. So that’s one less task you have to do. That’s one less system, you have to log into to get some additional information. If that information can be gathered for you and presented to you. That’s one less mundane task you have to perform in order to get to the really important stuff, which is using your human brain to resolve the issue. And so yeah, so applied AI is a good way of looking at it.
Jason Baum 13:16
That’s what it’s all about. Right? Getting rid of that mundane. I think that’s the goal.
Richard Whitehead 13:21
So certainly a DevOps ideal, right? Yeah. The daily work.
Jason Baum 13:25
Yeah. Efficiency. So um, so once we’re, it’s set up, right, we’ve got it, the mundane is gone. It’s working, you know, everything is being automated, is simply plug and play. And now we just let it go. Or, you know, can we trust the machines to continue it? And just in perpetuity, I guess forever? We’re learning the machine is learning and, and everything is all set?
Richard Whitehead 13:55
Well, there are a couple of angles to that. The first one is, can you just unleash the power of AI and have it do its job? And that the second aspect of that is, you know, is it a one time deal? Do you just use it up once and let it run? So to address the can you unleash it, maybe one of the talks or one of the challenges that I’ve had dealing with sort of very conservative-minded IT operations folks, when trying to bring in something as fuzzy as AI to a previously incredibly deterministic world where everything is well understood, and every action has a very well understood reaction. What are the challenges? Well, how can how trustworthy is it? Is it going to get the same results each time? And the answer is well, not always, because if the input data is different, then it might respond differently. Um, so from a trustworthy standpoint, you have to sort of take a step back and think, well, there are many different types of AI technology, even down to something like machine learning, there’s the concept of supervised and unsupervised machine learning. And so if you’re gonna just want to throw some data at the proper system, and have it do its thing, you’re probably describing unsupervised machine learning. There are certain techniques or certain areas where that’s very applicable. That’s particularly trustworthy in areas where there’s no real learning that needs to be done. I think a lot of the concern that people have over sort of machine learning and AI is, is where training has to occur, and how accurate is the training, but there are certain techniques that just work. So you don’t need to build a model, you just react to the data that’s coming in. And so an example would be, you have a flow of data coming into a system, and you’re looking at that data in real-time and trying to identify patterns. So you’re not necessarily comparing it to a historical model, you’re just looking at the data as it is, in real-time trying to determine patterns. So that’s a good example of unsupervised data because there’s no training model, you’re looking at the data in real-time, and coming up with an answer. So that’s a good example of something where you can just turn it on and let it do its magic. There are other areas where, you know, training becomes more of an important component. And I think, from our standpoint, when applying those techniques to an operations-type environment, that’s where the human becomes important. Because the supervised model at that point, the training is done by a human. So the system would say to you, this is something that I determined from the input data, what do you think, and the human has the opportunity to train it. So you know, practical turn, that might be the ability to tag data, or press a button to give it a thumbs up or a thumbs down. And that sort of human-guided supervised learning. Again, it becomes trustworthy, because the human has provided the input. It’s not something that the system has determined on its own, that you’re actually giving it some sort of positive affirmation. So if the model is good, it’s because a human has trained it to be good, based on their current knowledge.
The tools we use as a team have a direct influence on how we work together, and the success we create. We built Range with that in mind, by balancing asynchronous check ins and real-time collaboration, branch helps remote and hybrid dev teams build alignment and baton back on the calendar branch connects dozens of apps like JIRA and GitHub, in one place. So everyone can share progress and updates on work, making standups more focused and engaging for everyone. Visit userange.com/devops To learn more, and try Range free.
Jason Baum 18:15
Interesting, so as a follow-up to that does the risk of getting it wrong, play into the decision of whether the machine is let to let it go type, like what you’re saying just unleash it, as opposed to a human being kind of on the other end sort of helping it does risk play into that of getting it wrong?
Richard Whitehead 18:41
Well, the good news is, in most IT operations environment, the relative level of the risk is fairly low. But not in every case, obviously. And that’s where a lot of concern, I think comes I have no idea who coined the phrase, but I like it, which is new to error is human. To really mess it up you need a computer. And that’s one of the challenges with automation, is that you can really make a problem worse by fully automating some kind of reaction to it. Risk is certainly an issue. When you look at some of the stories in the press about artificial intelligence. Nobody ever really publishes the good stories, that’s just happened. That’s life. That’s we’re all used to that. We take that for granted. It’s the negative sides of AI that get a lot of publicity. And, you know, there’s a lot of concern about bias in learning models, and, and some of those sort of issues. And that’s really sort of a big data problem where you’re dealing with large amounts of data from questionable sources that have been used to train models. And from my standpoint, the way you mitigate that risk is you move away from third-party data. And you try and focus solely on your environment. So don’t use external training data. And you can do that in an IT operations environment, it’s much easier to do that if you’re not dealing with sort of medical data from the last 10 years, that may or may not be tainted by some poor quality data that was introduced that you have no control over. You’re dealing with an IT operations environment you’re dealing with, with infrastructure and technology that’s in your control that you have. So you can build models and do training, from data that that high-quality data that you that has good provenance, you know where it came from. So a lot of those concerns, like I say, that are based on poor quality models and poor quality data from questionable sources. The good news is IT operations has less of a concern with that data because we know where it comes from.
Jason Baum 21:06
So with all of that, and it sounds like there’s a lot of management that has to go on behind the scenes, who’s doing that, who’s going to manage the solution? How has the tech team changed? How is work being distributed? Where does AI ops play into this now? Do you need a data scientist? Do existing team members take on new roles? How are you structuring it?
Richard Whitehead 21:36
Right? So yes, we obviously have first-hand experience with that as a technology provider in that space. And the answer to the Do we need a data scientist is if you’re going to build a solution yourself if you’re going to roll your own as it were, then yes, you’re going to need a data scientist we have, we have data scientists on board as part of our team. They’re slightly outside the engineering team. And just like every other organization, they have different skills. They come from different backgrounds. They’re war science and they are engineers, the sort of programming languages tend to be more Python are focused and so forth. So different people. Absolutely. If you’re in IT operations, you probably shouldn’t necessarily be looking at getting a data scientist on board. Because there are technologies out there commercial technologies, open source technologies, where that work has been done for you. And I think when people ask me, you know, am I going to have to retrain my staff? I chuckle and say, No, the impact of AI on operations is minor, it’s almost trivial compared to some of the seismic shifts we’ve already seen in the last five to 10 years. As operations people, and we shift from this everything from to everything is code type environment, we now have operators who are themselves, they look just like software engineers, they’re conversant in one, two, maybe even three programming languages. They’re fully conversant with code repositories. And that that shift is far bigger than anything that the introduction of AI is ever going to change. So no, you’re not going to have to become a data scientist just to operate this. The technology is going to be in a form that’s easily consumable. It’s going to look like software, it’s going to act like software, you’ll treat it like software, you’re not going to be building models yourself, the technology is going to be doing that for you. So no, don’t think you’ll need a data scientist. But absolutely, you’re going to need to have people who are very consistently conversant with software and infrastructure as code and that sort of thing.
Jason Baum 24:04
So where does AI ops ml ops fit within DevOps culture?
Richard Whitehead 24:12
At the end of the day, it’s just technology. It’s a tool. Okay, so it’s, it’s neither a good fit nor bad fit. It’s just technology. If good AI ops technology will fit very well. Because it just looks like software, it reacts like software, you can configure it as code. The changes you make are going to be very easy to work with. The technology will offer both a strong UI but also strong API’s so the technology can fit into and be integrated into a DevOps toolchain. It’s just part of it. Part of the value stream. It shouldn’t stick out as necessarily being something that’s, that’s a standalone industry or a standalone job title, you shouldn’t have to hire an AI ops engineer. It’s just technology.
Jason Baum 25:18
So what are you excited about? With the future of AI ops? What’s coming down the pipeline that should get us all excited?
Richard Whitehead 25:28
I think, you know, for me, as somebody who is involved in the very early stages, just one, the first thing is the adoption of it. It’s the fact that we’ve made that shift from this is scary, I don’t know if I can trust it to, gosh, I can’t imagine life without it. Do you remember what it was, like 10 years ago when we had to do this stuff ourselves? How dull and boring was that AIOps also brings some stability. And there’s a certain irony to that because when we talk about things like the fuzzy logic of AI, people think of that as being kind of non-deterministic and scary. The reality is, it makes systems much more, much more robust. So the ability for a system to be able to adapt means that when you get certain changes, AI adapts along with it, and becomes very flexible, and means that the sort of the total cost of ownership, the maintenance of an AI system drops significantly because it’s adaptive. And that’s, I think, really significant, that that’s a that’s another thing that just improves your sort of daily life is knowing that when you plug something in, yes, you’re going to have to maintain it. But it’s not something that’s going to be a full-time job. It’s not something that every single day, you’re going to have to touch and tweak. And I think people forget that when they talk about automation. And you hear that term, sort of no-ops, floating around of like, well, we just fully automate everything. And that’s it that humans can go on vacation and never touch it again. Well, life’s not like that. One of the benefits, one of the goals, even of digital transformation is the ability for things to change at a blistering pace, you want things to be incredibly reactive and very dynamic. And you throw into that the natural entropy of any system and changes absolutely guaranteed, and the rate of change is accelerating. So nothing’s ever going to be installed and forgotten about. Most of us are not dealing with a telecommunications environment, where you install a switch, and then you’d love it and take care of it for 25 years. Everything changes dramatically. So having a system that’s at least a little bit adaptive, and doesn’t require constant attention. You know, that’s something that makes people very, very happy. And I think that’s something I’m, I’m looking forward to, people seem to benefit from.
Also just generally looking at new opportunities. So as I mentioned, as we start to deploy AI ops, in production environments, it’s the little things that are the game changes, the little benefits that are multiplied over, you know, hundreds of times a week that make everybody go Yeah, okay, this is really cool. I’m glad we installed that that made a big difference. expanding that to do some other intriguing use cases, finding new cases, new use cases is something I’m really excited about.
Jason Baum 28:46
It sounds like when this is going to when it’s when it you know what’s working is when you kind of forgot about it. Right. That’s, that’s the end goal. So I look, we’re coming up to the end. This is I could talk about the subject forever. I think it’s fascinating. I love hearing you speak about it. It’s, it’s, gosh, I can’t believe we’re here, right? This point when some of these, these mundane tasks are just no longer going to be a thing are already not a thing. So I do like to ask, kind of like, this isn’t like a gotcha question. But sometimes it is. Today’s is not. I like to ask a thinker. So what’s one question you wish I’d asked you? And how would you have answered it?
Richard Whitehead 29:39
Um, just from sort of a personal point of view as a tinkerer and an experiment, you know, I wish we had more time to talk about natural language processing. You know, I think I’ve been doing this for a very long time somebody asked me, How long have you been writing regex Richard and it’s, it’s measured in decades. Um, I think might be three decades now. And for me, you know, I, I joke that you know, I’ve only been writing regex for 30 years. So I’m a relative noob I’m still learning. And then along comes natural language processing. And by using sort of NLP, you can do things in, in a couple of seconds. That would take maybe, I don’t know, 3030 minutes to express as a regular expression. And, you know, for me, there are certain things that I enjoy doing from years ago, you know, I still, I still write code in using VI. And, you know, I still spend a lot of time on the command line on Linux systems. But if I never have to write another regex, again, I’d be a happy person. So, the power of things like natural language processing just, it impresses me, and also improves my daily life. So there you go. That’s, I answered that question.
Jason Baum 31:07
Great. Awesome. I love it. You should have been interested in interviewing yourself. And you would also have gotten through that line better than I just did. Well, I really appreciate your time, Richard and educating us on AI ops, ml ops, and you know, how it fits into DevOps as a tool and just, in general, makes our lives easier and not coming to cause doomsday. So I really appreciate you coming on.
Richard Whitehead 31:36
It’s all good. It’s not Skynet.
Jason Baum 31:38
Thank goodness. If anyone names our company Skynet, I think question there. There. Well, maybe just funny, I don’t know. Well, thank you so much, Richard, I really appreciate your time. And thank you for listening to this episode of the humans of DevOps Podcast. I’m going to end this episode The way I always do, encouraging you to become a member of DevOps Institute to get access to even more great resources just like this one. Until next time, stay safe, stay healthy, and most of all, state humans live long and prosper.
Thanks for listening to this episode of the humans of DevOps podcast. Don’t forget to join our global community to get access to even more great resources like this. Until next time, remember, you are part of something bigger than yourself. You belong