Monitoring Apache Spark – We’re building a better Spark UI

By Jean-Yves Stephan, Data Mechanics.

The Spark UI is the open source monitoring tool shipped with Apache Spark, the #1 big data engine.

It generates a lot of frustration among Apache Spark users, beginners and experts alike.

Data Mechanics is a YCombinator startup building a serverless platform for Apache Spark — a Databricks, AWS EMR, Google Dataproc, or Azure HDinsight alternative — that makes Apache Spark more easy-to-use and performant.

In this article, we present our ambition to replace the Spark UI and Spark History Server with a free and cross-platform monitoring tool for Spark called the Data Mechanics UI.

The project is at the prototype phase, but wed love your feedback before we push it to production.

  The familiar Spark UI (jobs page).

It’s hard to get the bird’s eye view of what is going on.

The Spark UI lacks essential node metrics (CPU, Memory, and I/O usage).

The Spark History Server (rendering the Spark UI for terminated Spark apps) is hard to setup.

  This GIF shows our prototype Data Mechanics UI in action!What is new about it? Lets go over the main sections.

Summary statisticsTheData Mechanics UI – Summary Statistics.

The section shows the duration of the app, the total amount of resources (CPU uptime), the total duration of all the Spark tasks (should be close to your CPU uptime if your app is well parallelised).

This information – surprisingly hard to get! – is critical if you care about your infrastructure costs.

RecommendationsThe Data Mechanics UI – Recommendations.

This section builds upon the Data Mechanics platform auto-tuning feature where infrastructure parameters and Spark configurations are continuously optimized to boost performance and stability based on the history of the past runs of a given application.

This section gives high-level actionable feedback to developers, such as:Executors CPU UsageThe Data Mechanics UI – Executors CPU Usage.

This screen lets you visually align system metrics on CPU utilization with the different Spark phases of your app.

In a couple of seconds, you should see if your app is spent on an expensive shuffle operation, if a lot of resources are wasted due to inefficient parallelism, or if it is bottlenecked by I/O operations or CPU-intensive operations.

So this information is critical to understand your application performance and make smarter choices.

You can then click on a specific job or stage to dive deeper into the problematic phase.

Executors Peak Memory UsageThe Data Mechanics UI – Executors Peak Memory Usage.

This screen shows you the memory usage breakdown for each executor when the total memory consumption was at its peak.

Again, youll immediately see if youre flirting with your container memory limits (maybe hitting OutOfMemory issues) or, on the contrary, if your memory is largely overprovisioned.

Memory issues are the most common sources of crashes for Apache Spark.

OutOfMemory comes in two flavors:So this screen should give you critical information to make and keep your Spark applications stable.

  Due to technical reasons, the Data Mechanics UI will not be implemented in Spark open-source.

But it will work on top of any Spark platform, entirely free of charge.

To use it, you’ll need to install an agent – a single jar attached to Spark.

The code for the agent will be open-sourced, and well provide init scripts to install it automatically for each major Spark platform.

Once this is done, youre done! The agent will send the Spark event logs to the Data Mechanics backend infrastructure, which will serve the Data Mechanics UI in your web browser!Initially, it will only be available for terminated apps (a few minutes after theyve run), so it will be more of Spark History Server than a live Spark UI replacement.

We hope itll be useful to you nonetheless!  Data Mechanics is a managed platform for Apache Spark – like Amazon EMR, Google Dataproc, Databricks, and others.

Our serverless features make Spark more easy-to-use and performant.

It is deployed inside our customers cloud account on a Kubernetes cluster that we manage for them, and it is available on AWS, GCP, and Azure.

The Data Mechanics UI will be a great complement to this platform — it would give Spark developers the high-level feedback about their code that they need to develop, scale, and maintain stable and performant Spark applications.

Our ambition is to simplify Spark monitoring not just for our customers but for the greater Apache Spark community.

But its a big undertaking! If you think you can benefit from it, sign up with your email using this form so that we notify you when its ready.

The more people sign up, the harder well work to release this ASAP.

Thanks! Bio: Jean-Yves Stephan, a former software engineer and Spark infrastructure lead at Databricks, is now the Co-Founder and CEO at Data Mechanics, a serverless platform making Apache Spark easy to use and performant.

Related: var disqus_shortname = kdnuggets; (function() { var dsq = document.

createElement(script); dsq.

type = text/javascript; dsq.

async = true; dsq.

src = https://kdnuggets.

disqus.

com/embed.

js; (document.

getElementsByTagName(head)[0] || document.

getElementsByTagName(body)[0]).

appendChild(dsq); })();.

Leave a Reply