Data Science for Fitness: 50 is the new 30 — Part I

Data Science for Fitness: 50 is the new 30 — Part ILuis Miguel SánchezBlockedUnblockFollowFollowingFeb 24The following article will try to explain a very interesting experience for me, that along with the algorithmic music composition algos (sans neural nets) I developed in 2013–2014 is one of the most rewarding projects I have undertaken: Data Science for Fitness.

In these series of practical applications of Data Science (aren’t you tired of tutorials around MNIST, Reddit, Yelp datasets, etc?) I am drafting, I plan to tell mini-stories: how I got there, why, when, etc.

and touch subjects that are related to the one in the title.

This is not straight code review, although some will be covered.

These mini stories will be more about data strategy, data science and its practical applications around a topic, instead of just code.

Coming back to fitness, I became sensitive to this subject in September 2008.

Back then I was working for Lehman Brothers, ‘ground zero’ of the financial crisis and where I had developed a great deal of expertise developing and launching structured finance deals around exotic data sets.

After Lehman’s collapse, I took a long, hard look at myself in the mirror.

I saw a depressed, physically unfit (technically, obese) Wall Street executive who had specialized in an area of finance that was taking a lot of heat.

But I also saw my inner data scientist.

Quants are essentially data scientists with a lot of time series analysis and financial backgrounds.

Quants are problem solvers, and at that point I had big problems: the gradual deterioration of my health due to my obesity & work related stress.

Fitness contrast: mid 30s, vs late 40sThe quant/data scientist in the mirror looked back at me and said “You can solve your fitness problem with data!”Let me stop right now to tell you that if you’re interested in learning about a quick fix to lose weight fast, then you might as well move to another article because this story is not for you.

Or if you are looking to learn data science, or programming, then this article is not for you either.

If, on the other hand, you’re looking for unique knowledge to empower you to accomplish your fitness goals, to look at things in new ways, if you like data science, and you are thinking about improving your fitness, maybe this article will provide the right motivation for you, and maybe get you in the right direction to develop your own research and fitness program.

Back to the story: I was a fat quant looking for a rational method of getting in shape, and in line with my quant mentality.

A good quant knows that asking the right question brings you halfway to the answer.

So, in 2008 I committed to researching the topic, developing a scientific and unbiased analysis, and using my own data and the findings on myself.

As a good quant / data scientist, I am borderline obsessed with data.

In the interview I had with the “Data Science Handbook”, I even tell the story of how in 1992 I wrote an an Expert System (basically, an analytical hierarchy process, a structured technique for organizing and analyzing complex decisions, based on mathematics and psychology), to decide if I should propose to my wife…Anyway, I have been manually collecting certain types of data since 1985 for whatever analysis I was into.

Since then, I had been collecting and storing data first in written form, then on 5 1/4-inch disks, 3 1/2-inch floppys, magnetic tapes, zip disks, CDs, DVDs, hard drives, you name it.

In 1990 I got my first email and Unix (the granddaddy of Linux) account and that year started to add some rudimentary types of online data collection to my offline data collection, since I had access to the incipient Internet.

But first I had to tweak certain tools, or develop my own, since Mosaic (the first browser) had not even been invented yet, and I had to use Gopher, IRC and FTP to collect data.

Many devices, many types of data captured over the yearsRegarding fitness, my data collection was very sparse from 1987 to 2000: a few data points a week using 2–3 devices.

However, since 2000, my data became more dense with hundreds of data points per day, and using over 30 different devices and services.

The chart above shows some of the devices and services I have used to capture my own body parameters over the years.

With the diverse datasets I had collected, plus knowledge gathered from papers published in medical journals, conversations with people over 40 in good shape, data scraped from sites such as bodybuilding.

com, etc.

and combining my own experience:Could I use a data science approach to help me get in shape?Could I use data science and elements of quantitative analysis to find what would work best for me in a relatively short period of time?Could I write code to consolidated data from the multiple hardware, software and web based systems that I use and have used?Could I program custom metrics, test hypothesis, and develop near real time alerts for deviations away from my fitness goals?Could I use classification models for the analysis?.Could I use regression models?Could I derive actionable intelligence from my data?The answer is yes to all.

In the Facebook post above, privately share only among close friends, you can see my peak shape in January 2012 and my fitness evolution since 2009.

A very important factor in my program was the incorporation of domain expertise, in particularly from people over 40s (good combination of theoretical and practical knowledge), and in good shape.

Thanks again to Gregg Avedon and Steven Herman, whose knowledge complemented and enhanced my own knowledge, and provided inspiration.

Below are some of the parameters captured over many years.

I intend to share an extract of my data and some code in my GitHub or Bitbucket account at some later date.

Please follow me in Sourcerer if you want to be up to date, since 99% of my code is in private repos, and Sourcerer consolidates all my coding activity in a very nice format.

Somewhere in the Caribbean, in 2010Some of my programming language/library specific expertise, percentile rank against all other users in GitHub, GitLab, BitBucket, etc.

and the top areas of concentration of my codeA little review about Sourcerer: It still has a few bugs, but the work these guys are doing is GREAT and very useful, specially for people like me, whose code is mostly in private repos and with very little contribution to open source projects (Wall Street vs Silicon Valley.

Anybody can relate?).

Using machine learning, Sourcerer analyzes your code and ranks your coding skills (commits, lines of code, code frequency, style, etc.

), against ALL other users in GitHub, BitBucket, GitLab, etc.

and summarizes your expertise by technology, programming languages, etc.

None of your proprietary code in private repos is shared with Sourcerer, simply, just analyzed.

To the left is a sample of my Sourcerer profile.

If you are a coder with public and private repos in many places, you should definitely check it out.

Coming back the the article, the parameters in bold below will be the ones I will attempt to explain in future series of “Data Science for Fitness: 50 is the new 30”.

They are the ones with the highest impact in my transformation from fat to fit, and the ones with the highest explanatory value in the machine models I developed.

They are:Total daily calories consumedBreakdown of calories consumed (calories from protein, carbohydrates, fat)Caloric expenditure per exercise: Weightlifting, snowboarding, running, bicycling, golfing, otherMuscle massFat mass (bodyfat %)Visceral fatVO2 MaxExcess post-exercise oxygen consumption (EPOC)Recovery timeTraining effectT-3 Total Test*T-4 Total Test*Average body temperature*Total fatSaturated fatFiberCholesterolTotal weightBody ageBlood sugar levelBody mass indexThe results of my system were outstanding.

This is a summary:Approximate peak to trough key measures: From 250 lbs total top weight in 2008 to 190 lbs lowest weight in 2009–2011More remarkable than the 72 lbs of body fat loss in 2009–2011 is the 30 lbs of muscle gain in 8 years, which is metabolically very difficult if you are in your 40s or 50s and do not use anabolic steroids.

Hell, it is difficult to gain that much even in your 20s or 30s!Another great help came from Gregg Avedon (18 times on the cover of Men’s Health magazine), from Florida, fitness model and writer for said magazine.

Gregg looks great in his early 50’s, and his “health food recipes” had a lot of the key nutrition ratios I later found out worked well for me.

We kept some interesting communication via Facebook etc.

for tips, progress reports.


I said that the results were outstanding because we need to consider that over the age of 30, everybody suffers from age-related sarcopenia, responsible for a loss of 3% to 5% of muscle mass each decade (it accelerate at older ages, from 0.

5% to 1% of muscle mass loss per year after the age of 50).

Even if you are active, you’ll still have some muscle loss.

Any loss of muscle matters because it lessens strength and mobility.

So, in a way, reversing muscle loss is a way to look and feel younger.

In addition to the benefits of diet and exercise, I also experience massive muscular growth for a short period of time (check out my 1/2012 picture above), using a controversial “technique” called “glycogen super compensation”, which I will briefly explain the last article of the series I am planning.

Another friend and mentor, from the over 50 crowd, Steven Herman.

Steven is from NYC, a former Madison Avenue executive now fitness model & instructor, as a pastime.

Looking great is his mid 50s.

Gylcogen super compensation was not part of my normal fitness routine, but a “side experiment”, to try to quantify certain parameters for muscle growth.

It is not sustainable over time, so I do not recommend it to anybody, but you can read more about it here.

So how did I achieve those gains against all odds?.Quantitative analysis / data science and discipline.

But it all started with data collection and data consolidation.

The start — Data ConsolidationThe chart below shows my total weight fluctuation over the years, where I tested numerous weight loss systems (Nutrisystem, Atkins, etc.

), with mixed results.

The shaded areas represent the different characteristics of the knowledge captured and analyzed.

Fitness data and knowledge was acquired from the period 1985 to 2008 (over 23 years of random trial and error and training w/o a proper plan).

From 2008 and on, I analyzed my own data and included domain expertise from people in the field of fitness and nutrition, optimized for a very specific, double objective function: max fat loss and max muscle gain (tricky since it is metabolically impossible to accomplish both simultaneously).

In order to use all the different data types I had captured and spread across different mediums, I had to consolidate them into a single database.

I choose MongoDB, since:It is a schema-less database, so my code defines my schema which to me is perfect for data science.

Because it stores data in the form of BSON (binary JSON), it helps store very rich, granular data while being capable of holding arrays and other documents.

For example, I can see the total calories consumed in a workout, but in a more granular inspection, I can see the second by second caloric consumption captured by one of my devices (ie.

FitBit real time heart rate data, calculated calories, etc.

) all in the same mongodb “document”.

“Volatility” higher than certain threshold for some key fitness and nutrition parameters do not let you to get “six pack abs” no matter how hard you work out.

MongoDB supports dynamic queries.

It is easy to scale.

No complex joins are needed.

Performance tuning is easy compared to any relational databases.

Enables faster access of the data due to its nature of using the internal memory for the storage.

Supports search by regex as well.

MongoDB follows regular release cycle of its newer versions, like Docker, etc.

In a nutshell, I find that it is the best db to work with time series and the different features present in diverse time series.

(As another example, all my algorithmic music dbs were initially stored in HDF files, now migrating to mongodb)Below is a chart representing the different stages of this project, and the tools I used:In the next posts, I’ll explain some of the important features of my system, and expand a little more, in no particular order, in some of the key findings in:The cumulative work load and type of work load for a given unit of time (frequency and intensity) is a VERY important aspect in an optimized fitness program.

Nutrition: Types of foods, timing of meals, quantity of food matters.

Type & intensity of workouts: Relationship to: a) Improving fitness and muscle mass, b) over training and decreasing muscle mass, and c) maintaining muscle mass.

Time series matter.

Psychological aspects: Compliance to yourself, quantification and motivation.

In the chart above, I have found that most people who do not make any progress or very little progress (the bulk of the population) are working out under patterns “B” and “C” and they do not have any idea what their optimal values are since they are not collecting data, and if they are collecting it, they don't know how to analyze it and place it in the context of their own goals.

As of February 2019, I am currently more in type “C” situation, since it has been difficult for me to keep the strict compliance I had for my own fitness a while back.

Nevertheless I have managed to keep most of my acquired muscle mass, and can switch to pattern “A” again for 8–12 weeks to get back in shape from my current baseline.

Since this project covered so many areas in data science and also in fitness, I would love to hear your feedback as to where I should focus my next article:Data collection?Data visualization?Machine learning models and validation?Other?Until then, I hope to hear your comments/feedback.

Please share/clap if you liked this post.

You can connect with me via twitter and LinkedIn and if you are curious about my time series analysis and simulations applied to algorithmic music composition (time series was very important for my fitness project), you can listen to my AI generated music in Apple Music, Spotify, and SoundCloud (aside from launching earlier than Google’s Magenta project; to me, my algo music sounds better.

You tell me).

CheersParts of this story were originally published in my personal blog a few years back.

All the pictures and charts are mine and/or have permission to post.


. More details

Leave a Reply