High Volume and Multi-source Analytics

Or, what about data you might want to buy from data brokers and syndicators?  Data from third parties brings yet more disparate data silos that don’t just magically co-locate with, line up with, or join to a company’s own data.

This introduces yet more data entropy.

It’s a big challenge.

  Data lakes attempted to solve this but they add a lot of complexity and have their own challenges.

  You can use ETL or data federation to bring disparate data together.

  But there are two faster approaches you can try today: Solution 1: Data Sharing The cloud database provided by Snowflake has a native capability called Snowflake Data Sharing, a.

k.

a.

The Data Sharehouse, which lets companies that each use Snowflake share views of data with each other, but without actually copying or moving any data.

  Data Sharing also allows one company to join their data with data from another company, which helps logically coalesce second- and third-party data.

  This virtually collapses what would otherwise be data silos into each other, reducing dataset entropy without expending a lot of energy.

Solution 2: Multi-source BI analysis Another approach requires going up a to the level above the data tier to end user tooling, such as BI tools.

  Some BI tools have multi-source analytical capabilities that allow for data blending across disparate backend data sets, storage approaches, and query engines.

  This approach also allows some of the more advanced end users to do some of this blending on their own while requiring less setup and data engineering.

What’s needed is advanced data blending that users can enable directly from their dashboard.

  This includes cohort creation and application directly from reports to take a set of customer ids from the result of an analysis on one source and use it to filter an analysis on another source.

  Users can then visualize them side by side.

By using these two approaches, together with modern cloud-native data storage, data engines, and BI tooling, relatively simple yet hyper-scalable approaches can start to help tame data entropy, handle the exponentially-growing size, speed, and potential sources of data, and still deliver solid value and insights to end users.

About the Author Justin Langseth is Co-Founder & Chairman of Zoomdata.

Justin co-founded Zoomdata in 2012.

Previously, he co-founded and was CTO of Claraview (sold to Teradata in 2008) and then Clarabridge (spun off from Claraview).

Prior to Claraview, he was co-founder and CTO of Strategy.

com, a former real-time data monetization and insights subsidiary of MicroStrategy.

He is the lead inventor on 16 granted technology patents related to data monetization, data personalization, and real-time, unstructured, and big data.

He graduated from the Massachusetts Institute of Technology where he received a SB in Management of Information Technology from the MIT Sloan School of Management.

  Sign up for the free insideBIGDATA newsletter.

 .

. More details

Leave a Reply