Introduction In this post, I have penned down AWS Glue and PySpark functionalities which can be helpful when thinking of…
Continue Readingnullable
Schema Evolution in Merge Operations and Operational Metrics in Delta Lake
Try this notebook to reproduce the steps outlined below We recently announced the release of Delta Lake 0. 6. 0,…
Continue ReadingAutomated Data Quality Testing at Scale using Apache Spark
Automated Data Quality Testing at Scale using Apache SparkWith An Open Source library from Amazon — DeequTanmay DeshpandeBlockedUnblockFollowFollowingJun 29Photo by Stephen Dawson on UnsplashI…
Continue ReadingCleaning PySpark DataFrames
Yes, there is an empty cell in literally every row. Here's where we benefit from passing column names to subset:df…
Continue ReadingBig data analytics: Predicting customer churn with PySpark
They could be subscribing to a competitor’s business, or abandoning the service altogether. Design by Artpunk101In case you missed it,…
Continue ReadingCustomer Churn Prediction with PySpark on Sparkify Data
Customer Churn Prediction with PySpark on Sparkify Dataom tripathiBlockedUnblockFollowFollowingFeb 21This is udacity’s capstone project, using spark to analyze user behavior data…
Continue ReadingUnderstanding Customer Churning with Big Data Analytics
I have come up with the below 7 features,Representative User InteractionsWe would reasonably expect some of the other user interactions…
Continue Reading