What Happened to Hadoop? And Where Do We Go from Here?

Monte Zweben, CEO of Splice Machine, has an interesting take on what happened to Hadoop, specifically three main reasons behind its downfall: Schema-on-Read was a mistake googletag.

cmd.

push(function() { googletag.

display(div-gpt-ad-1439400881943-0); }); First, the so-called best features of Hadoop turned out to be its Achilles heel.

With the schema-on-write restriction lifted, terabytes of structured and unstructured data began to flow into the data lakes.

With Hadoop’s data governance framework and capability still being defined, it became increasingly difficult for businesses to determine the lineage of their data, causing them to lose trust in their data and data lakes to turn into data swamps.

Hadoop complexity and duct-taped compute engines Second, Hadoop distributions provided a number of Open Source compute engines like Apache Hive, Apache Spark and Apache Kafka to name a few, but this turned out to be too much of a good thing.

These compute engines were complex to operate and required specialized skills to duct-tape together that were difficult to find in the market.

The wrong focus – the data lake vs.

the application Third and most importantly, data lake projects began to fail because Hadoop clusters often became the gateways of enterprise data pipelines that filter, process, and transform data that is then exported to other databases and data marts for reporting downstream and almost never find their way to a real business application in the operating fabric enterprise.

As a result, the data lakes ended up being a massive set of disparate compute engines, operating on disparate workloads, all sharing the same storage.

This is very hard to manage.

The resource isolation and management tools in this ecosystem are improving but they still have a long way to go.

Enterprises were not able to shift their focus away from using their data lakes as inexpensive data repositories to platforms that consume data and power mission-critical applications.

Many organizations are concerned about the recent developments in the Hadoop ecosystem and are under pressure to demonstrate the value of their data lake.

It’s critical for enterprises to determine how they can successfully modernize their applications after the fall of Hadoop, and the best strategy for getting there.

Hadoop was once the most over-hyped technology, and today that moniker belongs to AI.

Beware of the hype cycle, you may have to answer for its effects one day.

Contributed by Daniel D.

Gutierrez, Managing Editor and Resident Data Scientist for insideBIGDATA.

In addition to being a tech journalist, Daniel also is a consultant in data scientist, author, educator and sits on a number of advisory boards for various start-up companies.

  Sign up for the free insideBIGDATA newsletter.

.

. More details

Leave a Reply