DevOps Institute

Our DevOps Journey is Incomplete without Data

DataOps

March 30, 2020

By: BMK Lakshminarayanan

Every company is a ‘software company’, “Software is eating the world”; along similar lines I recently heard that every company, regardless of size, is a ‘data company’.

It’s true that somewhere or other every organization produces, consumes, analyzes, reports data and makes decisions to promote, buy, sell, acquire, expand, down-size and so on based on data.

DevOps momentum has seen the rapid growth of new tools for CICD, ARA(Application Release Automation) and frameworks for enabling application delivery at pace.

When it comes to continuous delivery, modern architecture patterns and practices like microservices, our delivery teams face challenges with data. In this post, I discuss some of the challenges that I have gone through and share some ideas and concrete steps to help further understand the challenges around data in the DevOps space.

If we can only go as fast as only our weakest link, then data, data management, data architecture, and associated practices need our attention and love.

Who Owns the Data?

Let’s dive deep into the ‘data’ problem. The typical mentality in any enterprise is that the data team or data management team owns the data. There is a prevalent idea that they are the only team to protect data from unauthorised access, maintain the standards and conventions, ‘owning’ the data and that they are the very last defence system in the organisation. The question is: how did we get into this situation? The term ‘silo’ in DevOps hits me hard, as I see that the data team is the biggest silo in the entire organisation and value stream.

The Problem: is it Data, the Data People or Both?

In a typical object-oriented programming situation, I see that the ‘enterprise data team’ inherits a lot from their predecessors. They inherit the process, standards, procedures, planning, execution, operations and management. The challenge is recognizing and accepting the opportunity to improve, modernize, and simplify, i.e. apply lean principles. And making things developer-friendly is frequently the most significant barrier. Sometimes there  is a mix of emotions for me: anger, sadness and pity.

There is a misconception that the ‘old school’ way of doing things is the best way, and that DevOps and continuous delivery are just for applications and not for data.

If improving our daily work and the way we work is more important than the work itself, why could we not help them and take them along this DevOps journey?

We Are Responsible for our System Complexity

IMHO,  the roots of the problem for most organizations are:

  1. How we think about data: application, operational, analytical and intelligence –  this includes data from apps, log files, monitoring, performance, core systems, lake, files and message hubs
  2. Struggling to understand the reality of realtime vs batch and the Impact of that in our business decisions
  3. How we move the data within an enterprise between different producers and consumers, data in business events and the flow of that event data within an enterprise between different system
  4. How to ‘unlock’ the data and make it available for the right users and the right use cases
  5. Data constraints that we live with due to tools, process and practices

Let us honestly answer these following questions. My list is long, but let’s start with:

  1. How many of us know how many ETL jobs exist in our enterprise? How many of them are still active?
  2. For storing scripts, Do we trust a ‘shared network’ drive more than a ‘source control’ system?
  3. Do we believe that we are sourcing the data only one time from the ‘source’ system/systems?
  4. Do we have or provide clarity to our development community on Systems of Record vs Systems of Engagement?

Our system complexities are because of our thinking, thinking particularly about the data. We tend to compromise a lot as we can’t get the data in the right way, at the right time, in the proper format.

Sometimes we impose these constraints on ourselves; a classic example which is due to data governance, data security, data normalization and data centralization.

Microservices Era and Big Ball of ‘Data’ Mud

If our systems distributed, why our Data is centralised? Centralising the data has benefits but is that the right way to build distributed systems/Microservices architecture?

We started thinking, doing about “Microservices” and distributed architecture patterns to simplify/break-down the monolith to enable “Enterprise agility”. Our attempts to apply this thinking to “Database” was not fruitful — either we did not pay attention or did not make any effort. The challenges we have, again the old habits come back and bite us.

Domain-Driven Design principles, Event-Driven Architecture — we can secure the support from the “Dev” community but not from “Data” — why?

  • We believe that building massive databases even with modern “Enterprise-Grade Modern Database” tools, is the right thing to do.
  • We believe that applying the same 30 years old naming conventions and standards to even modern databases, is the right thing to do.
  • We believe that preventing the “data” access in Dev, Test from “Developers” and making them do TBD (Ticket Based Development) — is that right thing to do.
  • We believe that not giving the “PROD” database “read” access to support folks and expect them to “Production Support” is that right thing to do.

Our understanding is changing, our approach towards these are changing; I see a bright future for the developer community to deal with data.

Breakthrough Challenges

It is not easy, but it is not impossible.

1) Naming Conventions: I would put this into our “Developer” productivity, the reason, having a meaningful data model, schema, table, column names are important for developers and Ops to do the right work. Instead of referring multiple documents for names, acronyms, we need better meaningful naming standards and conventions. It is “Freedom” from those constrained naming conventions of history. If you Enterprise has inherited old historical conventions of limited characters, it is now time to free them up. #DeveloperProductivity #DataArchitecture

2) Domain, Boundary, Schema: The DDD education is a fun exercise with Data & Data modelling team. This is tough ask with our Data friends. Traditionally we have build monolith applications with monolith databases; In some cases, even our depended applications have/had database/schema level integrations. If we are breaking the monolith and taking the route of DDD, Bounded Context, then we need to move to the database/service boundary. Try this following with you Data Architecture, Data Modelling team:

a) Request for Database/Service Boundary (if this fails, try the next)
b) Request for Schema/Service Boundary

After a round of discussion with my Data friends, we agreed on the Schema/Service Boundary. We could able to limit the dependencies/cross-domain pollution with this approach. Service Account/Schema for read & write access. aligning the domain to a schema (at least)

3) Flow — Requesting Data model changes: This is another area we improved significantly. Initially, we needed to fill a SharePoint form to request a data model change. This goes to the DA (Data Analyst & Data Modeller), he produces the script, reviewed by DBA & by the Developer who requested, then applied to DEV Database and the dev starts developing the features. We realised that we could not fit this in our 2 — weeks sprint.

We could only go as fast as our weakest link.

There were some bold discussions and decisions around enabling the developers to make the changes. We:

a) Educated our developers on standards

b) Enabled developers to use a local (dev) database

c) Enabled DA to review this in parallel

d) Followed PR (pull request) for the entire lifecycle so that we had the transparency, comments and communication flow visible to everyone

4) Adopted modern practices: is continuous integration and automated deployment only for application source code? What about databases? This is where we had our breakthrough, and we helped our DBA to adopt our source control systems. Instead of mailing the scripts we pulled the scripts from source control. We enabled them to do automated database deployment. We did break the tradition of DBA’s logging in manually rolling out the changes, instead using automated deployment via pipelines.

5) Relational or NoSQL? We, like everybody (most of the enterprise), were thinking everything was relational, forcing everything to relational. But in some scenarios, we wanted to implement the ‘document’ database approach of treating an entity as a document or graph instead of just a table made of several columns. When our front-end is requesting via API a JSON payload, our database is serving a relational table. Our radical thinking helped us to push the agenda for JSON schema & payloads. This helped the development team a great deal as they were able to roll-out the changes quickly.

How Do We Measure Our Progress?

How can we fit the database changes for the feature within the sprint? What kind of turnaround time do we have for database changes with quality in mind? How consistent is our process, our application source code and database schema? We needed to build release pipelines and reduce the cognitive load for the team for doing different things for different components.

Learnings and Advice

Problems are not unique and there are common themes; you are not the first one to solve for these – many have already worked hard and found answers for your hardest questions. It is a matter of reaching out to help.

Spread and Share: You need a platform to share the learnings and experience with the rest of the organisation to motivate them and help them to achieve more.

Show and Tell: Move away from PowerPoint presentations and show people the actual code, the screen and the  work-in-progress.

Keep Challenging: You are not alone, do not give up – this is continuous learning and education. Sometimes you need to take medicine over several days to feel better or cure.

If you need to go faster, you need discipline. Help the teams, draw the lines, get them focused, and hear what they need to say. Be sensible with your data architecture and design; architect and design systems in such a way that you can do continuous delivery.”

ORIGINAL ARTICLE

 

 

sidebar graphic with register for London SKILup Festival on September 13, 2022CTA

Membership at DevOps Institute

related posts

8 Insights From the Upskilling IT 2022 Report [Infographic]

8 Insights From the Upskilling IT 2022 Report [Infographic]

By Eveline Oehrlich Chief Research Officer, DevOps Institute This year’s Upskilling IT Report reveals a critical need to close DevOps skills gaps, identifies top skills capabilities, and highlights emerging job roles to help individuals and organizations accelerate IT...

[EP81] What is a “Radical Enterprise” with Matt Parker

[EP81] What is a “Radical Enterprise” with Matt Parker

On this episode of the Humans of DevOps, Jason Baum is joined by Matt K. Parker, author of A Radical Enterprise: Pioneering the Future of High-Performing Organizations. Matt and Jason discuss successful and truly radical business models, what leads folks to try and...

What Are Cloud AI Developer Services?

What Are Cloud AI Developer Services?

Cloud AI Developer Services are growing and cloud providers now offer these services to developers. These hosted models allow developers to gain access to Artificial Intelligence/Machine Learning (AI/ML) technologies without needing deep data science expertise.  As an...