Machine learning models are chosen based on their mean performance, often calculated using k-fold cross-validation. The algorithm with the best…

Continue Reading# mean

## Arithmetic, Geometric, and Harmonic Means for Machine Learning

Calculating the average of a variable or a list of numbers is a common operation in machine learning. It is…

Continue Reading## A Gentle Introduction to Jensen’s Inequality

It is common in statistics and machine learning to create a linear transform or mapping of a variable. An example…

Continue Reading## Why using a mean for missing data is a bad idea. Alternative imputation algorithms.

Why using a mean for missing data is a bad idea. Alternative imputation algorithms. Kacper KubaraBlockedUnblockFollowFollowingJun 24Photo by Franki Chamaki…

Continue Reading## Evolution of Traditional Statistical Tests in the Age of Data

Well it’s simple we use a theorem and make an assumption based on it. *Enter* The Central Limit TheoremIn particular,…

Continue Reading## Evaluating the effectiveness of feature selection algorithms for automating the feature selection process

And how can we use accumulated knowledge to improve the FS algorithm?Common feature selection methodsMany feature selection methods have been…

Continue Reading## All About Means

Well, for the arithmetic mean, the geometric mean, and even some of the other means, we have some kind of…

Continue Reading## Finding a Difference that Matters

Lower values means it’s less likely that the means are equal. For the Tukey HSD, which calculates the difference of…

Continue Reading## Understanding Confidence Interval

Remember that for frequentist, there is one true population mean that exists, independent of how many times you draw sample.…

Continue Reading## How to Manually Scale Image Pixel Data for Deep Learning

Images are comprised of matrices of pixel values. Black and white images are single matrix of pixels, whereas color images…

Continue Reading## Kalman Filters : A step by step implementation guide in python

Kalman Filters : A step by step implementation guide in pythonThis article will simplify the Kalman Filter for you. Hopefully you’ll learn…

Continue Reading## Why Sample Variance is Divided by n-1

Why Sample Variance is Divided by n-1Explaining high school statistics that your teachers didn’t teachEden AuBlockedUnblockFollowFollowingFeb 20Photo by Tim Bennett on UnsplashIf you…

Continue Reading## An Introduction to the Bootstrap Method

The related statistic concept covers:Basic Calculus and concept of functionMean, Variance, and Standard DeviationDistribution Function (CDF) and Probability Density Function…

Continue Reading## Unstructured data is an oxymoron

Strictly speaking, “unstructured data” is a contradiction in terms. Data must have structure to be comprehensible. By “unstructured data” people…

Continue Reading## Optimizing Jupyter Notebooks – A Comprehensive Guide

To abbreviate the code, we introduce the sum()method, a generator expression and removal of pow().Already by doing these three changes…

Continue Reading## First Impressions of GPUs and PyData

Currently my favorite approach is to use Numpy functions as a lingua franca, and to allow the frameworks to hijack…

Continue Reading## Which hypothesis test to perform?

Based on the data, can we conclude that the mean intraocular pressure of the population differs from 14 mm Hg?Step…

Continue Reading## A/B testing: the importance of Central limit theorem

In this article, I will explain the practical benefits of this theorem and its importance in A/B testing.A central limit…

Continue Reading## Hypothesis Testing: how to determine significance ????

The main question we are interested in answering is:Does discount amount have a statistically significant effect on the amount of…

Continue Reading## Music for Data Scientists? Music by Data Scientists? …What…?!

By Foster Provost, NYU Mean Reversions first album released today (Oct 17, 2018).Mean Reversion is the collaboration of data scientist songwriters…

Continue Reading