DataOps: The New DevOps of Analytics

For example, an engineer can provision the environment for their code to run  using containers, which makes the actual handover a non-issue, because the container includes everything necessary to run the desired software.

In other words, containers have become the common language between the two teams, as these containers now solve the age-old problem of how to get software to run reliably when moved from one computing environment to another.

In the same notion, DataOps is a team sport and needs its own common language.

Gartner calls this data literacy or data as a second language.

Using  modern approaches – like self-service data preparation tools that come equipped with their own built-in data operations – data practitioners such as data analysts, data engineers, and data scientists can not only collaborate and co-develop insights in a zero-code environment, but they can also streamline the delivery of work into the rest of the organization.


Large Scale, Global Consumption and Provisioning A key goal of DevOps is to make software available at speed across many geos and countries while at the same time accommodating a lot of users.

However, accomplishing this requires a unified management environment, where monitoring, cataloging and concurrent usage and logging takes place via a centralized control plane.

In DataOps, the same is true, as success necessitates a unified catalog of data assets and data preparation flows, along with versioning and monitoring of the environment.

Where the Similarities End and Why Understanding the Differences is Important Although in many aspects the similarities of these two concepts are uncanny, there are several important differences which include: 1.

Relevance and Timeliness My colleague Dave Levinger, VP of DevOps summed it up best when he said, “You can run your business using an old product, but you cannot run your business on old data.

While both get old, today’s data is obsolete far more quickly.

” While DevOps can survive using set guidelines and approaches, data pipelines cannot be rigid and programmed to deliver the same logic continuously.

Data changes all the time and it needs to be re-discovered and re-inspected constantly.

However, this approach is difficult in the DataOps world, as technical resources are scare and typically don’t understand the data’s  business context.

Business users are the ones that understand its context, but they often don’t have the technical expertise needed to gain a complete and accurate picture.

This is where Augmented Data Preparation comes in handy.

Augmented data preparation software uses machine learning to discover new sources, patterns, and anomalies in data – with little to no human intervention required.

Further, data prep solutions offer both an accelerated delivery of data pipelines and an uplift in data accuracy, creating a more agile environment for DataOps.


Varied Agenda and Alignment of Goals Unlike DevOps where both teams are technical and product-centric, DataOps involves both technical and business people.

On one hand, business users are now involved in the ideation and creation data products and tools, such as self-service data preparation, provide the playing field for them by simplifying data blending and cleansing into a visual and intuitive interface.

On the other hand, data in many cases is a corporate asset, and therefore requires the same regiments of governance and auditability.

This requires centralized technical teams to ensure data integrity and security, as well as operationalization of only the trusted and accurate data assets to the broader organization.

While in DevOps, both parties are interested in building a high quality and scalable product and delivering it into the market, in DataOps the business teams are more interested in the data discovery and analytics part of data projects, and less in the securing and governance aspects of it.

Addressing the needs of both groups calls for a technology that brings data preparation and data operations together, allowing everyone to partake, regardless of their own agendas.


The Ecosystem DevOps built its roots in 2009 and today an entire ecosystem has been built around this practice which includes source code management tools and the continuous integration, delivery, and testing suite of products.

It also includes how an organization approached project management, document management, monitoring, support and ticketing.

In DataOps, this ecosystem is pretty bare bones, as this is still a relatively new concept, but with time DataOps ecosystem will grow as well.

The business background between DevOps and DataOps is analogous but businesses today need to move fast.

Whether it is product or data development, innovation and time to market remain the core foundations of gaining a competitive advantage.

In this agile world, just as developers and operations teams need to co-design, co-develop and co-own products, data developers and data operations need to work hand-in-hand in a self-service, integrated environment to create new data strategies and insights rapidly and continuously.

About the Author Farnaz Erfan, is the Senior Director and Head of Product Marketing at Paxata, a pioneer and leader in enterprise-grade self-service data preparation for analytics.

      Sign up for the free insideBIGDATA newsletter.


. More details

Leave a Reply