We Found Structure in a Structureless Place: A Smarter Approach for Data Science Solutions

Identifying Data Sources is circled too.

Yes, you may have noticed “Identifying Data Sources” is coupled with “Decomposing the Ask” in the Analytics Paradigm Diagram.

Equally as important as clarity around any problem is the ability to do something about it.

Even with a defined ask, a brilliant team, and a well thought out plan, inaccessible data leads to a dead end.

So, don’t jump into building project plans or begin modeling without first securing the data source(s).

In a lot of “agile” shops, project management elements are often overlooked.

By overlooked I, of course, mean purposefully ignored in an attempt to short-circuit the process.

Documentation, strategy, and meetings have all become dirty words at the behest of getting things done quickly.

Sure, there are legitimate reasons for the trepidation, but fostering this type of culture is dangerous.

As a company or team scales, a lack of structure will result in work redundancies, missed deliverables, and frustrated teammates.

From here, normalized deviance begins — the markers surrounding what is acceptable slowly shift in the wrong direction.

Before you know it, expectations are unclear, and the answers to the simplest questions are unknown:Who is your team?Who are the subject matter experts (SMEs) you can rely on?Who are the stakeholders?Is there a budget constraining the project?What are the deliverables and what are their respective timelines?What are the resources and tools available?“How” conforms to the 80/20 rule.

Answer the question and take ownership during the first phase of the project process, because the next part is time consuming and mulligans aren’t an option.

The extracting, transforming, and loading (ETL) of data will consume 80% of the project time while the analysis comprises the remaining 20%.

Of course, ETL is encompassed under the wide umbrella of “data engineering”, and it can mean many things.

At this point, a specific subset of readers has checked-out: business intelligence (BI) analysts, management consultants, younger data scientists… ok, basically everyone who isn’t a data engineer.

However, the experienced consultant recognizes that even though organizations separate these roles, the work is inextricably linked.

Data engineering is the “cousin” of data science, and it’s our job to eliminate ETL barriers to move a project forward through this phase.

Think through ETL in terms of the “data hierarchy of needs”.

Thinking this way can help demystify data engineering and quickly identify barriers.

Exactly like Maslow’s hierarchy, there is a tiered system in place from lowest to highest level of data “needs”.

In order to load, you must first transform.

In order to transform, you need to first extract.

This seems like common sense to anyone who’s created ETL jobs, but there are so many ways in which teams fall apart simply because they don’t know the existing systems and do not have a well thought out plan!.Ensure you are creating and maintaining a data dictionary, review fundamental components of data engineering, and work through these questions:How do we collect data?How do we move and store data?How will we explore and transform data?How should we aggregate and label data?How will we learn and optimize data?Data Hierarchy of Needs“What” is left to do?.Oh right, the actual analysis.

At this point, there is a well-executed plan, and it’s time to go to work.

Because this is a post about process and not the technical pieces behind the process, Python, R, ML, and viz are forfeited to discuss the analytics cycle instead (pictured below).

In this cycle, deliverables and answers depend on what your stakeholders require.

If you’re building dashboards (everyone’s favorite tool), the final deliverable might be descriptive only.

Provide summary statistics, highlight outliers, patterns, and trends.

If you’re embarking on a value-creation journey, then become more diagnostic and make some inferences.

Start to include feature engineering for new insights, perform deeper data and/or text mining, and look at correlation for understanding relationships between data elements.

Finally, if you’re lucky, you’ll go on to perform the fun predictive and prescriptive analytics like machine learning or neural networks.

But look how much work we had to do to get here!Analytics Cycle DiagramRegardless of the deliverable, start simple, expand outward, and no matter what, stay agnostic towards the data set.

Never be afraid to produce a minimally viable product (MVP).

Clients love their deliverables, but as long as expectations are managed, a client will be happy with preliminary findings sooner rather than super complex data science (much) later.

There will inevitably be holes in the data, a lack of answer, or answers you weren’t looking for.

Therefore, ensure to encapsulate and acknowledge those limitations in your findings:Never underestimate the power of “I don’t know.

” Embrace these words as a positive.

Finally, remember to tell a story.

If I show up to a presentation with a PowerPoint and say, “98,000+ patients die each year due to medical error in hospitals,” you might lift an eyebrow and go back to playing on your phone.

If instead, I say, “There are five people in this room.

One of you will experience medical error next time you’re admitted to a hospital.

In fact, there’s so much adverse harm being done, the yearly death rate is the equivalent of losing one commercial jumbo jet airliner full of 270 passengers each day.

By the way, all of these deaths are preventable based on the root cause data.

You asked me to investigate the potential impact of adverse events, and these are my findings.

” You might actually put your phone down and look up at my slide containing a pictograph of 365 airplanes.

That’s a story.

A StoryRegardless of how powerful the analysis is, if the story isn’t compelling, it will be impossible to create valuable change within the organization.

Follow the analytics cycle, but remember the purpose of a process approach is to answer a question and tell a story.

Never forget TS Eliot’s famous quote:“Where is the wisdom we’ve lost in knowledge?.Where is the knowledge we’ve lost in information?”Go forth and keep structure at the top of your mind.

This may have been a welcomed review or a frustrating reminder of what’s missing in your current organization.

In the future, I will write about specific deliverables, but this topic is foundational.

Unlike the technical solutions, the need for a structured process approach is unchanging.

Most of all, I hope it was encouraging and something you can implement on your team.

When you’re working this week, think about your process and revisit it regularly.

If you feel like you’re getting stuck, zoom out.

If you feel like you’re overwhelmed and not making progress, perhaps leverage some of the productivity guru’s advice such as David Allen’s, “Getting Things Done” methodology or my favorite flavored alternative, Graham Alcott’s, “Productivity Ninja”.

Do so, and you too will find exponential value in your solutions as an analyst and a consultant.

~JLB.

. More details

Leave a Reply