Introduction In this post, I have penned down AWS Glue and PySpark functionalities which can be helpful when thinking of…
Continue Readingtrue
Schema Evolution in Merge Operations and Operational Metrics in Delta Lake
Try this notebook to reproduce the steps outlined below We recently announced the release of Delta Lake 0. 6. 0,…
Continue ReadingFalse discovery rate, Type-M and Type-S errors in an underpowered A/B test
If we interpreted the p-value as: “Given that the true difference between two versions is zero, probability of observing the…
Continue ReadingPredicting Cancer with Logistic Regression in Python
Predicting Cancer with Logistic Regression in PythonUnderstanding the data, logistic regression, testing data, confusion matrices, ROC curveAndrew HershyBlockedUnblockFollowFollowingJul 1SourceIntroduction:In my first logistic…
Continue ReadingAutomated Data Quality Testing at Scale using Apache Spark
Automated Data Quality Testing at Scale using Apache SparkWith An Open Source library from Amazon — DeequTanmay DeshpandeBlockedUnblockFollowFollowingJun 29Photo by Stephen Dawson on UnsplashI…
Continue ReadingCleaning PySpark DataFrames
Yes, there is an empty cell in literally every row. Here's where we benefit from passing column names to subset:df…
Continue ReadingMachine Learning Classification with Python for Direct Marketing
The question is timeless but not rhetorical. In the next few minutes of your reading time, I will apply a…
Continue ReadingHow to deal with outliers in a noisy population?
How to deal with outliers in a noisy population?Dario MakaricBlockedUnblockFollowFollowingJun 11Defining outliers can be a straight forward task. On the…
Continue ReadingPython and LRU Cache
Python and LRU CacheJose Alberto Torres AgüeraBlockedUnblockFollowFollowingMay 26Since version 3. 2 python we can use a decorator namedfunctools. lru_cache() , this function…
Continue ReadingBoolean Logic Using the Scala Compiler
In this way, we check every single possible combination of boolean inputs to the function and whether the resulting expression…
Continue ReadingBeginning Python Programming—Part 4
There is a keyword that allows us to skip over a loop iteration. It is the continue keyword. Using the…
Continue ReadingLearning to add git hook tasks: PHP-CS-Fixer
What I’ll show you below is one way to set that all up that will make use of the git…
Continue ReadingBeginning Python Programming — Part 3
Beginning Python Programming — Part 3Familiarizing ourselves with using operators and noneBob RoeblingBlockedUnblockFollowFollowingMay 20Photo by Antoine Dautry on UnsplashIf you are just stumbling onto this…
Continue ReadingOptionals, Operators in Swift
We have no idea if we will have data or not until runtime, or when the application is running. The…
Continue ReadingEntenda o que é AUC e ROC nos modelos de Machine Learning
Entenda o que é AUC e ROC nos modelos de Machine LearningVinícius RodriguesBlockedUnblockFollowFollowingOct 26, 2018As curvas AUC e ROC estão entre…
Continue ReadingIntroduction to Formality (Part 1)
First, because it keeps the language simple, obviously. After all, this can be easily solved with macros outside the “official…
Continue ReadingHow to assess a binary Logistic Regressor with scikit-learn
How to assess a binary Logistic Regressor with scikit-learnBookmark this python function that makes assessing your binary classifier easy. Greg…
Continue ReadingHow to interpret a binary Logistic Regressor with scikit-learn
How to interpret a binary Logistic Regressor with scikit-learnGreg ConditBlockedUnblockFollowFollowingApr 17Functionality OverviewLogistic Regression is a valuable classifier for its interpretability.…
Continue ReadingKotlin when Expression
Well, I'm sure you already have it. The warning we had in our first example becomes an error, which means…
Continue ReadingKotlin Unleashed: when Expression
Well, I'm sure you already have it. The warning we had in our first example becomes an error, which means…
Continue ReadingThe Most Impactful Offensive Players of the 2018–19 NBA Season
The Most Impactful Offensive Players of the 2018–19 NBA SeasonAhmed CheemaBlockedUnblockFollowFollowingMar 25Steve Dykes / Associated PressThe best basketball players make their…
Continue ReadingBig data analytics: Predicting customer churn with PySpark
They could be subscribing to a competitor’s business, or abandoning the service altogether. Design by Artpunk101In case you missed it,…
Continue ReadingObioha uche
Obioha ucheBlockedUnblockFollowFollowingMar 21Working with Paginated API Response Data in LaravelThis wouldn’t be my first Medium post if I had been…
Continue ReadingRating Sports Teams — Elo vs. Win-Loss
I added player 1’s error and player 2’s error in each matchup to my total error. So the real Brier…
Continue ReadingCustomer Churn Prediction with PySpark on Sparkify Data
Customer Churn Prediction with PySpark on Sparkify Dataom tripathiBlockedUnblockFollowFollowingFeb 21This is udacity’s capstone project, using spark to analyze user behavior data…
Continue Reading