June was an eventful month for DevOps Institute. We declared June as Site Reliability Engineering (SRE) month! To celebrate and help the community SKILup on all things SRE, we hosted several events and shared new resources dedicated to Site Reliability Engineering. But first…
DevOps Institute continued to highlight DevOps Certified Professionals in our Certification Spotlight blog series. Featured ‘DevOps Certified Professionals’ hold multiple accredited DevOps Institute certifications. The blogs share insights into why they chose to become certified, how they learn and study best, and how each certification has benefitted their career.
You can read Niladri Choudhuri’s certification spotlight here and Marc Hornbeek’s certification spotlight here. Stay tuned for new certification content and watch for #DevOpsCertified posts on Twitter and LinkedIn to help us celebrate the accomplishments of the newly certified Humans of DevOps.
We also continued to highlight contributions from our DevOps Ambassadors throughout the month of June. You can see all of their contributions on our blog here, as well as our Medium publication, The Humans of DevOps, here.
Chief Ambassador, Helen Beal and several DevOps Institute Ambassadors also came together to discuss why Site Reliability Engineering is important. To read insightful responses from DevOps practitioners all around the world, read the article here.
To kick-off the Site Reliability Engineering festivities, several ambassadors hosted a Global DevOps Institute Ambassador CrowdChat. The CrowdChat was a great way to dive into SRE and discuss one of the fastest-growing enterprise roles and set of operational practices for managing services at scale. See the full Crowdchat here.
On June 18, we hosted the main event – SKILup Day: Site Reliability Engineering! The one-day virtual conference centered around the people, process and technology aspects that are currently shaping SRE. SKILup Day featured ‘how-to’ insight from speakers Dinesh Sekar, Aaron Rinehart, Ravi Lachhman, Andrew Chee, Jayne Groll, Dave Stanke, Amy Tobey, Marcel Birkner and Shelby Spees.
Attendees also had an opportunity to learn about the history of SRE directly from experts at Google and experience two live Q&A panels.
In addition to a full day of sessions, the event offered a scavenger hunt, networking lounge, exhibit hall, resource library, and even a DevOps inspired mixology class!
Didn’t get a chance to attend June’s SKILup Day dedicated to Site Reliability Engineering? We’ve got you covered with a quick round-up of the key themes that emerged from the sessions and conversations around the importance of the SRE role.
Why devote a full day to learning more about the SRE function?
SRE has grown to be a global community and not just a Google phenomenon. Recently, there has been a huge growth in SRE practices. According to Dinesh Sekar, SRE Transformation and Competency Development at Standard Chartered Bank, Glassdoor reported 54,000 open positions for Site Reliability Engineers as of early June 2020 in both the tech and non-tech sectors. Jayne Groll, CEO of DevOps Institute, also alluded to the importance of the SRE trend in her session, saying “SRE is the most innovative approach to ITSM since the early days of ITIL®.” Also based on the Upskilling 2020: Enterprise DevOps Skills Report, SRE adoption has risen from 10% in 2019 to 15% in 2020.
There is a huge focus on the skills necessary for the SRE role. And professional development is often part of the day-to-day.
Throughout June’s SKILup Day, many of the presenters examined the competencies that are necessary for the discipline. It’s not only problem-solving abilities, but also being able to have open communication, handle complexity, lend expertise for product management, demonstrate empathy, and possess an innate curiosity and desire to continuously learn.
“If they are allergic to doing things by hand over and over again and if they have computer skills to take the work they’ve been doing by hand and make a computer do it instead, they’ve got a good character trait for SRE,” Benjamin Treynor Sloss, VP 24×7, Google says.
Success is not in the code but should be measured by value to the customer.
“Don’t make humans do boring, repetitive work. They should be focused on the value stream,” Shelby Spees, Developer Advocate at Honeycomb, shared during her presentation, ‘Fast and Simple: Observing Code and Infra Deployments at Honeycomb.’ David Stanke of Google reiterated this in his session, which focused on how SREs can align technical work to user benefit. Customers determine the quality of the product and therefore engineers should understand customer problems and be involved in the user experience.
What’s your error budget?
Most of the sessions touched on the concept of an error budget to be able to question assumptions and conduct experiments to find out what the user really cares about. This goes hand in hand with chaos engineering and embracing risk. An error budget is a typical metric that sets SREs apart from DevOps teams, according to Ravi Lachhman, Evangelist at Harness, who compared the two job functions in depth during his presentation.
“If we stop blaming humans, we can start to build more robust systems that create a situation where humans don’t have the fear of making errors because maybe it’s safe to do them or maybe they are aware of safety boundaries or we can stop them in the first place by not making the error possible,” says Amy Tobey, Staff SRE at Blameless.
Leverage observability to reduce risk from external dependencies.
When migrating from monolithic to microservices applications, Andrew Chee Senior Sales Engineer at LightStep, stressed the importance and role of observability. Why? Because of the complexity introduced by microservices. Teams need to plan in advance around best practices for Service Level Objectives (SLO), API contracts and developer productivity in general. In order to “crack the case,” Andrew Chee suggested monitoring the performance of new services from developers as they are being built.
Advice to organizations wanting to implement SRE: Start where you are.
Most speakers stressed simplicity when starting out. Marcel Birkner stated that “keeping tools to a bare minimum” and “reducing complexity” were some lessons learned when implementing SRE at Instana. Jayne Groll also highlighted simplicity and eliminating toil as guiding principles of SRE.
For a quick recap of all the sessions, check out the sketches below. You can also still watch the videos of the sessions and download the slide decks by viewing the SRE SKILup Day on-demand.
There are plenty of events, fresh content, and exciting announcements in the pipeline. We’ve designated July as Continuous Delivery Ecosystem month! Join us for the virtual SKILup Day on July 16.