Four questions to help accurately scope analytics engineering project

For us, it means producing straightforward metrics on top of the table/tables that can be compared to the outputs from another system (typically the system that the data was extracted from).

For instance, we frequently calculate orders and revenue and compare those metrics with the results we get natively in Shopify.

Both of these metrics are very straightforward to calculate, and if the numbers match you immediately have a high degree of confidence that the data set is in good shape.

Problems with your raw data are typically cross-cutting: they’ll affect all records in a table or all records in a certain date range.

If you audit 2-3 metrics, you can generally feel pretty good about the data quality overall.

If your organization has previously used the data from a given ingestion pipeline, it’s incumbent that you invest in learning what’s already been built.

If you’re using dbt:Do not just say “screw it” and start your own work from scratch.

 The primary source of analytics engineering tech debt and errors result from multiple code bases attempting to transform and analyze fundamentally the same data.

Two code paths equate to double the surface area for errors and infinitely more opportunity for confusion.

It’s not always easy to follow the work that’s been done before you, but it’s critically important that you integrate into an existing code base rather than starting from scratch.

 I’ve seen teams who are unwilling to write code collaboratively in this way, and instead, every analyst builds their own silo.

 This is a recipe for an incredibly unproductive data team.

New analysis typically builds on existing concepts by attaching new data to data that has already been modeled.

It’s typical to start with your application database or shopping cart and then spider outwards into other data systems: event tracking, customer success, advertising, email, etc.

As you bring each one of these systems into your pipeline and your data model, you’ll need at least one key to join the data.

It seems like that would be fairly straightforward, but often it just isn’t.

Here are some examples:This stuff happens constantly, and knowing where to look for problems before you hit them is absolutely critical.

If you’re missing a key between two systems, it can stop an entire analytics project dead in its tracks, because often a) adding the key involves bringing in stakeholders from other parts of the org, and b) there is no way to retroactively get the data.

Make sure to check your keys before diving in.

If they’re missing, go through the effort of working with the stakeholders to create them.

This “process instrumentation” is a critical part of what makes a great analytics engineer: you can’t always expect that business processes will naturally spit out the data you need and sometimes you’ll need to roll up your sleeves and make sure it gets gathered.

As a consultant, I have to have a strong process: if I mis-scope a sprint, it probably means that I’m not going to be getting enough sleep over the coming two weeks.

Or maybe it means that we lose $$ on the project—the stakes are real.

That’s why I (and everyone at Fishtown) am so focused on being good at scoping.

On an internal data team, the stakes are just as high, but the feedback typically isn’t as clear or immediate.

Your stakeholders will notice if you consistently miss deadlines or fail to deliver key results, they just might not say anything to you.

It’s often hard to give direct feedback to a colleague —when’s the last time you said something like “Your team’s miss on that deadline led me to miss my committed OKR for the quarter, and I’m pissed”?Even if it isn’t made explicit, consistent failure by a data team to deliver predictable outcomes damages trust and thereby damages the team’s ability to make an impact on the larger org.

Avoid this outcome by identifying issues during the scoping phase of a project and by communicating the limitations of your approach with stakeholders.

 Original.

Reposted with permission.

Bio: Tristan Handyis currently building Fishtown Analytics to help venture-funded companies implement advanced analytics through  building tools to facilitate an opinionated analytics workflow.

Related: var disqus_shortname = kdnuggets; (function() { var dsq = document.

createElement(script); dsq.

type = text/javascript; dsq.

async = true; dsq.

src = https://kdnuggets.

disqus.

com/embed.

js; (document.

getElementsByTagName(head)[0] || document.

getElementsByTagName(body)[0]).

appendChild(dsq); })();.. More details

Leave a Reply