Data cleaning is a critically important step in any machine learning project. In tabular data, there are many different statistical…

Continue Reading# number

## Extended floating point precision in R and C

The GNU MPFR library is a C library for extended precision floating point calculations. The name stands for Multiple Precision…

Continue Reading## When is round-trip floating point radix conversion exact?

Suppose you store a floating point number in memory, print it out in human-readable base 10, and read it back…

Continue Reading## Covid-19, your community, and you — a data science perspective

By Jeremy Howard and Rachel Thomas, fast. ai Co-FoundersWe are data scientists—that is, our job is to understand how to…

Continue Reading## ChaCha RNG with fewer rounds

ChaCha is a CSPRING, a cryptographically secure pseudorandom number generator. When used in cryptography, ChaCha typically carries out 20 rounds…

Continue Reading## A new take on the birthday problem

Vitalii Tymchyshyn and Andrii Khlevniuk posted a new paper here entitled “On the average number of birthdays in birthday paradox…

Continue Reading## Any number can start a factorial

= 2019…000 The factorial of 3177 is a number with 9749 digits, the first of which are 2019, and the…

Continue Reading## Splitting lines and numbering the pieces

If you want to see line numbers, use your editor. ” That way of thinking looks at the tools one…

Continue Reading## Predicted distribution of Mersenne primes

We’ll construct a plot below using Python. Note that the conjecture is asymptotic, and so it could make poor predictions…

Continue Reading## More bc weirdness

Actually no. It assumes that any single letter that could be a hex number is one. But in numbers with…

Continue Reading## Estimating vocabulary size with Heaps’ law

Heaps’ law says that the number of unique words in a text of n words is approximated byV(n) = K nβwhere…

Continue Reading## Proving that a choice was made in good faith

This is something I’ve helped companies with. It may be impossible to prove that a choice was not deliberate, but…

Continue Reading## Detecting a short period in an RNG

The last couple posts have been looking at the Cliff random number generator. I introduce the generator here and look…

Continue Reading## Fixed points of the Cliff random number generator

I ran across the Cliff random number generator yesterday. Given a starting value x0 in the open interval (0, 1),…

Continue Reading## Data breach trends

This post gives a crude, back-of-the-envelope calculation to address the question. We won’t look at number of breaches per se…

Continue Reading## Number of feet in a mile

Here are a couple amusing things I’ve run across recently regarding the number of feet in a mile. Both are…

Continue Reading## Bootstrapping at scale in Snowflake

The answer, of course, is that we need a “good enough” alternative. We’re sampling after all, so the level of…

Continue Reading## How Machine Learning Can Lower the Search Cost for Finding Better Hikes

How Machine Learning Can Lower the Search Cost for Finding Better HikesPerry JohnsonBlockedUnblockFollowFollowingJul 9I recently went on a weekend camping trip…

Continue Reading## Scraping and Exploring the Entire English Audible Catalog

Scraping and Exploring the Entire English Audible CatalogToby MandersBlockedUnblockFollowFollowingJul 2Last week I wrote a script using the HTML-Requests package for Python…

Continue Reading## The Political Twittersphere of the UK

The Political Twittersphere of the UKAn analysis of how the constituent parties and members of the UK government differ in their…

Continue Reading## Visualisation of Information from Raw Twitter Data — Part 2

Lets check it out!For this we need to download and import the Botometer Python library, and get a key to…

Continue Reading## A beginner’s guide to Kaggle’s Titanic problem

A beginner’s guide to Kaggle’s Titanic problemSumit MukhijaBlockedUnblockFollowFollowingJun 22Image source: FlickrSince this is my first post, here’s a brief introduction of what…

Continue Reading## Beginning Python Programming — Part 14

Beginning Python Programming — Part 14An introduction to multi-threadingBob RoeblingBlockedUnblockFollowFollowingJun 20Photo by Franck V. on UnsplashIn part 13 of Beginning Python Programming, we covered…

Continue Reading## Classification of Moscow Metro stations using Foursquare data

Classification of Moscow Metro stations using Foursquare dataStanislav RogozhinBlockedUnblockFollowFollowingJun 12This post is the capstone project of the Coursera IBM Data…

Continue Reading## Downloading Data From Twitter Using the REST API

This is the second article of a list of publications about adquiring data from Twitter and using it to gain…

Continue Reading