By Eveline Oehrlich, Chief Research Officer, DevOps Institute
IT Leaders today are facing daunting challenges including shrinking budgets, continuous firefighting modes, an explosion of events from the ever-increasing complex business technology ecosystem, and the continuous effort to increase the velocity and reliability of infrastructure, applications, and services. The drive toward digital services has forced these leaders to adopt new operating models such as Site Reliability Engineering (SRE). Site Reliability Engineering is defined as “an engineering discipline devoted to helping an organization sustainably achieve the ‘appropriate level of reliability’ in their systems, services, and products.”
SRE has become a must-have engineering practice for enterprises seeking to accelerate digital transformations or re-engineer their interfaces to digital-first. As enterprises are implementing SRE in their respective teams by developing and adjusting the best practices introduced by Google, the operating model continuously gains attention from decision makers within IT and the business.
You may also like The Origins of SRE from the Director of SRE Education at Google
This blog summarizes findings from our first-ever Global SRE Pulse survey. Our inaugural survey of over 460 SRE leaders and practitioners from both midsize and large enterprises provides a snapshot of the state, practices, health, activities and automation adoption across the globe. Here are the key findings from the ‘Global SRE Pulse’ 2022 report.
Daunting Challenges and Digital Transformation Require New Operating Models
The challenges of IT Leaders continue as the hype cycle of technologies and the continuous demand from both internal and external customers shapes the application and services landscape. SRE is here to stay, and it is a prevalent way to manage apps and services today.
- Digital transformation requires modern ways to operate. More than half of survey respondents who have adopted SRE perceived their company as leaders across customer experience and speed of innovation. The adoption of SRE played a big part in their success
- SRE is leveraged in both running the business and serving customers. When asked where SRE is leveraged today (the software your company builds or the set of services SRE teams interact with), 56% of survey respondents said they leverage SRE for operating their Systems of Engagement (SOE) and 42% for their Systems of Record (SOR).
- Faster, better, more reliable and cost-effective applications are great benefits of SRE. When asked how they would describe their company today compared to their competitors across customer experience, quality of products, offerings, processes, services and innovation, we found that 52% would describe their company as being a leader.
- SRE enhances collaboration, increases IT value from the business perspective, and is an essential engineering function for digital transformation. With the rise in complexity and the craze for digital transformation across global organizations, enterprises are adopting SRE for improved collaboration between development and operations and to continuously improve the reliability and health of applications and services for their customers and business partners.
Implementing SRE is a Journey
SRE requires an all-day, everyday commitment. Working in an SRE team is a rewarding experience. It provides a great opportunity for individuals to re-energize their career, be part of and belong to new engineering practices, learn new things, make a difference, and improve compensation, but it does not come without challenges.
- The biggest challenge is finding the right skills for SRE to work. Eighty-five percent (85%) of survey respondents cite the lack of staff with the necessary skills as their biggest challenge when implementing SRE. Additional challenges cited in the report include “value of SRE is not understood” (71%), “don’t have time to implement SRE” (53%), “lack of tools in place” (55%) and “lack of management support” (44%). When analyzing the challenges across the different company sizes, there were no significant differences.
- The biggest source of toil is process issues and new releases. While eliminating toil across computing resources is one focus area, eliminating toil across different processes is another topic for SREs to focus on. We found that 27% of our survey takers cite that business process issues are their number one source of toil. Process examples could be the onboarding of users or the interaction with customers. Nineteen percent (19%) of our survey takers cite “application release” as their main source of toil. When your business is digital, revenue flows thanks to the value your software provides. Development work that hasn’t been shipped to production yet is not producing any value; however, iterating and releasing software too quickly can also cause problems.
- Metrics, oh my. Service-Level Agreements (SLAs) are adopted by 85% of our survey respondents. SLAs are essential to managing the expectations between IT and the business teams on the service expectations. For example, an SLA would promise a 99.95% uptime of a specific service or application is typical. Unfortunately, many of our survey respondents are saying that they do not have a Service-Level Objective (SLO) (which would be the 99.95%) or a Service-Level Indicator (SLI) (Metrics such as SLAs, SLOs and SLIs are somewhat of a mixed landscape and still cause challenges for implementing and managing them.
Listen to The Humans of DevOps Podcast: Life of an SRE at Google with Ramón Medrano Llamas
IT Service Management is critical, but Observability is also high on SREs Radar
No business can afford downtime of critical services or applications as customer satisfaction, confidence, and revenue is at stake. To reduce downtime and ensure ongoing services, IT automation solutions are essential.
- IT Service Management automates core processes. Core processes such as incidents, service requests, problems, changes, and IT assets must be managed. Many of these different processes are automated through IT Service Management (ITSM) solutions. Nearly all our survey respondents are leveraging ITSM automation tools today.
- The second most adopted automation tools are observability and monitoring platforms. Twenty-nine percent (29%) indicate they leverage observability and monitoring tools and techniques everywhere. The goal of observability is to improve digital business service performance. This is only possible if it’s used everywhere, as it must provide end-to-end insight across a hybrid-cloud ecosystem.
Working in an SRE Team is a Rewarding Experience…and Pays Well
While the pandemic has brought on many changes, the technology sector is doing well. However, many individuals are evaluating their current job and are looking for a change. While some are talking about the ‘Great Resignation,’ we do believe that there is a ‘Great Reshuffle’. If you’re reevaluating your IT career and you’re looking for a change, SRE could be for you.
- Site Reliability Engineers feel a sense of belonging, are energized and have expanded their skills. SRE energizes individuals and aligns them with the business. When we asked our survey participants what leads them to an SRE role, we found that over 50% of respondents agreed that they had expanded their skills and capabilities. Forty-four (44%) strongly agree that they are more engaged and excited about their job. Thirty-six (36%) strongly agree that they are more valued as a team member and finally, 34% of survey respondents feel more valued and appreciated.
- SRE pays more. Besides being energized, the compensation within SRE is higher. Fifty-two percent (52%) of our survey respondents indicated that they agree (strongly, or somewhat) that their compensation has improved.
The Global SRE Pulse shares promising findings around SRE adoption and the benefits of improving system reliability. The results demonstrate that SRE enhances development and operations collaboration, increases IT value from the business perspective, and is an essential engineering function for digital transformation. As we look ahead, SRE will continue to play an important role in driving value for modern and complex software environments – especially in teams’ efforts for continuous improvement.
The full research findings are part of the Global SRE Pulse 2022 report that is now available for download.