DevOps Institute

A Great Partnership: Site Reliability Engineering (SRE) and DevOps

DevOps Basics, SRE

two puzzle pieces fitting together
Image source Yuri_Arcurs via Getty Images

Achieving Speed, Quality and Reliability All at Once

By Eveline Oehrlich, DevOps Institute and Andreas Prins, StackState

DevOps is not a prescriptive methodology but was born out of the need to improve the software development lifecycle (SDLC) for both the development and the operations teams. When we think of DevOps, we are talking about an approach and/or a team that is responsible for both the development and operations of software. Some organisations have implemented DevOps by automating the continuous integration/continuous deployment (CI/CD) process, tasks, and workflows. We consider this to be a subset of DevOps.

Site Reliability Engineering (SRE) is another methodology that is being implemented as a standardised set of engineering practices to balance the speed of feature development with the operational reliability risks. While DevOps and SRE are related and one supports the other, we feel that it is necessary to share definitions, similarities, differences, and most importantly, how SRE can support DevOps. 

The following is intended to help you answer the questions of SRE, DevOps, one or both. We will look at the definitions of both, what they have in common, and where they differ. We know that any successful journey starts with a solid understanding.  

SRE DevOps

Definition: Google’s practice of Site Reliability Engineering (SRE), is an approach to operations that prioritises user-centric measurement, shared accountability, and collaborative, blameless learning. 

Definition: The phrase is a combination of the word’s development and operations. It is intended to represent a collaborative approach to the tasks performed by an organisation’s application development and IT operations teams.

Target audience: A specific role such as Site Reliability Engineers (SRE) or a specific team such
as the SRE teams. 
Target audience: Not targeted at a specific person or team, but rather at the organisation, including development and infrastructure and operation teams (and with the evolution of DevSecOps, the security team).   
What is it: An established best practice from Google, to leverage engineering practices when running systems in production.  What is it: A cultural shift, introduced in 2008 that accelerates the flow of work from development to operations (and everything in between) including automation, behaviours, measurements and sharing (CALMS).   

What it does: Ensures that systems (software applications, services and more) are available, resilient, efficient, and compliant with the organisation’s policies and eliminating toil.    

What it does: Deliver software faster and with better quality after each release. 

 

What SRE and DevOps Have in Common

Ways of Working

  • Adopting specific ways of working
  • Solve communication problems break down silos between different organisational units.
  • Improving the team or the organisation rather than the individual
  • Leverage a collaborative culture with shared ideas, processes, practices, and technologies with the goal to streamline product development to maximise business value.
  • Embrace feedback loops, a blameless culture, and psychological safety.
  • Leverage different team topologies (e.g., central team, a coach squad model). 

Production System

  • Improve efficiency, productivity and improve customer (end-user) satisfaction.
  • Improving the speed and quality of applications and services. 
  • Cost savings through improved process and automation work eliminating effort and duplication.

Tooling

  • The toolboxes of both are similar, or even overlap, but used for different automation purposes. 

Get Certified in SRE or DevOps. Learn more.

How SRE and DevOps are Different

SRE DevOps
Focus: Optimising the operations of software or services. Focus: Optimising the software development process.
Team members: Software engineers who improve operations through automation
and more.   
Team members: Team members are developers, operations, from test, infrastructure with knowledge of their domain and a desire to improve the way software is delivered.  
Role or title: A specific role with the title of Site Reliability Engineer. Role and title: The role and title may vary depending on the organisation and the state of DevOps. 
Scope: Typically remains within the IT processes and organisation. Scope: May extend to business stakeholders.
Typical metrics: Service Level Agreements (SLA), Service Level Indicators (SLI), Service Level Objectives (SLO).   Typical metrics: number of deployments, lead time from code commit to release, number of failed deployments, time it takes to recover from failure.
Automation perspective: Reduce all the repetitive operational tasks (or toil).  Automation perspective: Continue to integrate, deliver, and deploy in the same way after the developer commits the code. 
Impact on customer experience: SRE brings a focus on scalability and reliability. Impact on customer experience: DevOps focuses on delivering new capabilities, features and functionality to customers quickly and with quality. 

Stop asking if you need SRE and DevOps

The key thing to understand is that SRE and DevOps are complementary to each other and work together to streamline operations, eliminate organisational silos, and deliver high-quality software faster. It is important to note that while DevOps is focuses on the delivery of the application and/or service, SRE can improve the operation of the application and/or service. Here is why you need both: 

  • Both provide process acceleration in complementary areas.
  • Both bring different skill sets to the organisation.
  • Both provide for career progression within your organisation. 
  • Both improve the customer experience.

9 Ways SRE and DevOps Are Related

Know before you go

The bottom line is to adopt the practices and principles that work and are possible in your organisations. There are situations where you should discuss whether it is right to adopt or start with one. Here are some things to consider:

  • Consider the current state of your software development journey: One perspective is the state and maturity of the speed and quality of the software and services. In an initial stage of accelerating speed of software development, DevOps may be the focus, later you may want to focus on scale and reliability. 
  • Infrastructure is outsourced: In addition, not all DevOps journeys will require SRE, as organisations that have fully outsourced their IT infrastructure to cloud providers will only need a subset of SRE. In this case, engineers monitor and manage the interaction of applications and their cloud resources, and act as the point of contact for support escalations to their cloud providers.
  • Recognise the impact of cloud and cloud native. Cloud and cloud-native software development, where most are still moving to the cloud is the next wave around the corner, serverless, data platforms and many native cloud components are disrupting the current pace of software development. 
  • The almost new thing of a platform team: If you’re not ready for either SRE or DevOps, you might want to investigate the topic of platform engineering. These teams are being set up to support the adoption and utilization of modern cloud platforms.
  • Remote work and hybrid working methods: The pandemic may be over, but its impact on the way teams work and collaborate has and will continue to change. Remote working, while not directly a tech trend, it is disrupting the way we collaborate; we pair over Discord, we brainstorm over Miro and we have company meetings over Team. A new way of working demands asynchronous collaboration and a different way of documenting everything.
  • Acceptance and use of Artificial Intelligence (AI): AI is coming into our space; on the operations side, it’s been there for a while with event correlation and whatnot. But more recently it’s been with Github Co-pilot, which accelerates the speed of development, and with ChatGPT, which helps us to get quick knowledge on certain topics. AI is there to augment human understanding and accelerate delivery even further. 

We’re excited to be part of the SRE, DevOps and other journeys, and we predict a bright future ahead for all of us in IT. By aligning people, with the right processes and leveraging the right technologies and their associated trends, IT can do more than ever before. Welcome to the digital speed of 2023. 

You might be interested in the Observability Foundation Certification

Upskilling IT 2023 Report

Community at DevOps Institute

related posts

[EP112] Why an AIOps Certification is Something You Should Think About

[EP112] Why an AIOps Certification is Something You Should Think About

Join Eveline Oehrlich and Suresh GP for a discussion on Why an AIOps Certification is Something You Should Think About Transcript 00:00:02,939 → 00:00:05,819 Narrator: You're listening to the Humans of DevOps podcast, a 00:00:05,819 → 00:00:09,449 podcast focused on...

[Ep110] Open Source, Brew and Tea!

[Ep110] Open Source, Brew and Tea!

Join Eveline Oehrlich and Max Howell, CEO of tea.xyz and creator of Homebrew, to discuss open source including "the Nebraska problem," challenges, and more. Max Howell is the CEO of tea.xyz and creator of Homebrew. Brew was one of the largest open source projects of...