The Fiction of the Data Lifecycle

For me, it comes down to two things:The stuff that happens to and with data is really important to know about.

It is something we want to be able to draw because we want to communicate it to people and make sure we understand it.

People like tidy.

Tidy makes things easy to grasp.

And it makes things easy to sell.

The diagram shows up frequently where people are trying to sell something (software, simplified processes) to people who know they need to care about data, but find the whole thing a bit complex.

These are not bad things.

The bad things happen when we start to assume that this little diagram actually represents reality.

When we start to assume that Thing X will have occurred before Thing Y.

That all data goes through a definable gate as it moves through the different, consecutive stages of its life.

Not every human gets married.

Not every dataset gets published.

It also sets up a negative role for the idea of data management, as something that exists to force data through these sequential pipes of process.

The diagram suggests that we manage data so that it follows the lifecycle.

I think we manage data so that it’s good, and trustworthy, and safe to use.

And I think we do that throughout the data lifecycle, however we express it or draw it.

Yes, we need to talk about the things that happen to and with data.

Yes, we maybe need to draw that as a picture sometimes.

But the visual shorthand, with its promise of tidy and consecutive steps, can mislead as easily as it can inform when it’s used in a technical setting.

We owe it to our data to challenge this narrative where it becomes a lazy crutch for our thinking.

Where it fundamentally doesn’t match to reality.

Where it brings about the wrong outcomes for our data and our businesses.

So no more pictures?.Well, I’m a big fan of pictures and a few years ago I tried to draw what I felt the real data lifecycle is, based on my experience with well-established, feral operational data.

Here’s what I drew in 2016:Everyone is going to have a different perspective on this, and emphasise different things.

For me it’s really important to recognise that a lot of stuff is happening at the same time.

That for most operational data, there’s no easy “start”.

It’s not comprehensive — it doesn’t even begin to show how data breeds and leaves a trail of new little datasets in its wake — but it felt like a picture of the most important things we needed to talk about.

But I got something wrong in that diagram.

Data management isn’t a little process ticking along on its own.

All these things need to happen with good data management.

So here’s what I’d draw now:Good data management is not the harness that forces data to jump through the hoops of the data lifecycle.

It’s not a step in the lifecycle, or a box to be ticked.

It’s the entire context for the conversation, the canvas on which we draw the data lifecycle.

Even if what we draw isn’t always tidy.


. More details

Leave a Reply