Training Data, What is it? Who’s doing it?

Or is Mturk more of a platform for training those A.


s?Better Understanding with insight on A.


and Training DataWithout getting too in depth about Artificial Intelligence, a simplified understading would be a “pipleine-of-algorithms”.

See all those bubbles around A.


as algorithms that feed into the system, and allows it to become intelligent.

It takes in data from specific domains, performs a chain of calculations, and gives out a prediction.

One key distinction from traditional algorithms and A.


algorithms is that one is manually programmed, while the other learns from itself.

All this is capable due to its training data that allows it to distinguish the inputs from outputs, based on thousands and millions of examples.

What kind of Training Data?Well relating to Mturk purposes, clean and labled data is very desirable.

Raw Data from the real world is messy and usually needs humans to categorize it so the algorithms know what to ‘train’ off of, and the A.


system knows what to ‘learn’.

Sometimes even with a good system, things don’t work out perfectly as intended due to some real life implications that weren’t accounted for.

So when that happens, give the task to something more intelligent: a human.

Tech Companies can take a two-way street on how they inform the public on what their technology is actually built behind.

Google’s military Drone A.


was open about how they used ‘low-paid-workers’ from crowd sourced data labeling company Crowned Flower, who renamed to ‘Figure Eight’.

Companies may take the deceiving road as well, just as software company Expensify did.

Expensify, an automation company built on handling annoying tasks of compiling all kinds of expense reports, developed their next amazing “SmartScan” tech which would handle it.

But in reality, just like the amazing ‘Turk’ deception, it was secretly just humans in the background: many actually Mturk workers!Enough about Mturk, YOU help train A.


all the timeGoogle Captchas:You know, those little tests websites have to make sure your a human and not lying like those deceitful robots.

Google uses their captchas as a great way for collecting useful data.

Think about those weird blurry words you have to make out, your correct answers can help character recognition with Google vision and books.

How about those uneven and skewed numbers?.Sounds like a good application for Google street view and helping confirm addresses.

I’ve felt Google’s captchas lately have primarily focused on those little squares where you pick out the cars, traffic lights, or street signs from select images.

I would find it very unlikely that google is not using that data for their self-driving car project Waymo.

Facebook:Has one of the most advanced facial recognition systems in the world.

Each person has numerous photos in which they or their friends tag themselves in.

Has all the data needed to develop a great system that can automatically tag your friends faces when you post pictures.

Data accumulation and inputting cost $0, users already did so.

Are we Google’s guinea pigs, or anyones?Google’s depth sensing research team required tons of footage where mobile cameras viewed a static space from multiple angles.

Now where would you get that?.Well of course the Viral Mannequin Challenge during 2016–17Used 2000 Youtube challenge videos to train their AI model to predict depth from videos in motion.

What is amazing about Artificial Intelligence is how that it is built on our human intelligence, but usually manages to transcend that and goes beyond whats ever thought possible.

Training data, a crucial component to how well the system works, is vastly available through the internet which is a significant propellor for todays A.



Don’t forget about how currently there’s no ‘stringent’ copy-right restrictions on what you can use as training data, for now.

