How I created over 100,000 labeled LEGO training images

Some of the 300,000+ images I captured while leaving the machine running for a few days.

How I created over 100,000 labeled LEGO training imagesDaniel WestBlockedUnblockFollowFollowingFeb 28If you are a hobbyist or researcher working on an AI project, it’s quite likely that you’ve run into the unfortunate situation of having to generate a large amount of labeled training data.

Of course, having spent all your funding taking some sketchy GPUs off the hands of an unkempt yet thankful Bitcoin miner, you can’t afford to outsource your annotation pipeline at the moment.

As part of my quest to build a universal LEGO sorting machine (more articles to follow soon!), I’ve come across this problem myself.

Although much of the training data is to be automatically generated, the project still requires a relatively small but sizeable amount of good old human-labeled ground-truth images.

Most deep learning tutorials and education seem to assume that you will always be handed a nice tidy dataset ready and waiting to work on, but for most real-world applications you won’t be classifying centred, handwritten digits into the categories 0 through 9.

In a real AI project, some if not most of the hard work is dealing with simple data acquisition.

I’ve been through the hard yards in this regard, and I thought it would be helpful to share some of the lessons and tips I’ve learned.

A sneak preview of the LEGO sorting machine prototype in action.

The ProblemThe recognition component of the LEGO sorting machine works by funnelling LEGO elements, one at a time, along a belt underneath a camera.

To begin the labelling process, I left the machine running for a few days and collected around 300,000 unlabeled images of LEGO elements.

I probably won’t be able to label them all, but hey, the images are easy to collect.

Why not go crazy?There are tens of thousands of distinct LEGO part numbers which the machine could potentially be trained to recognize.

However, I’ve excluded a number of categories — At the moment I don’t care about distinguishing between ‘Tile, Round 2 x 2 With Pizza Pattern’ and ‘Tile, Round 2 x 2 With Chinese New Years Soup Pattern’.

But before you start thinking I’ve clearly made things too easy on myself, at the end of the day I’m still left with over 2500 part categories.

Dealing with all these part numbers is definitely overkill — deduplication was heavily used.

(If you’re feeling hungry for some Chinese New Year’s Soup, you can currently buy a 2 x 2 tile for yourself off Bricklink for only AU $22.


Anyway, in order to obtain the labeled data to use for training, I now needed to manually assign one out of 2500+ possible part numbers to each individual image of a LEGO element that the camera had taken while the machine was running.

The purpose of all supervised learning is to transfer the knowledge of an existing black-box neural network (in this case, a human brain) to a new neural network (usually running on a computer).

Your human-brain-black-box-classifier is great at the teaching task but unfortunately it is tragically slow.

It’s also probably needed for other tasks that we haven’t yet figured out how to offload to some GPU running in an Amazon warehouse.

We might start getting into trouble if we do too much of this.

Our overall goal will be to lower the amount of time it takes our brain-classifier to label a single image.

There are 3 key steps I was able to take to speed up the process by a massive amount.

As with all great engineering solutions, the common theme of the three steps is to avoid as much work as possible.

Step 1: Cheat (AKA: Work the Problem)The very best thing you can do to allow you to effectively label a large number of data samples is to take advantage of the means with which the samples are generated in the first place.

Often, you will have some level of knowledge about the sample generation process, and it’s a great idea to abuse this power as much as possible.

In my case, I noticed that the camera has a ~10–20 frame window to take a snap of the part as it moves along on the belt.

So, instead of taking just one image, the camera takes every single frame the part is completely in view and stores those 10–20 images together as a bundle.

An example bundle.

So while my 300k or so images are all unlabeled, I still have access to a very valuable set of meta-information that tells me every bundle of 10–20 images must have the same label.

This means that my labelling speed is increased by 10–20x compared to attempting to label each image individually.

As an aside: Capturing multiple images of each part also gained me massive benefits with respect to actual classification performance.

I’ll cover more details in a future article, but if you feel that you could pull off something similar for your own project, give it a try!Step 2: Be lazy (AKA: Streamline the Process)For any real-world case, creating your annotations by manually throwing values one-at-a-time into `labels.

csv` is going to be extremely slow at best, and completely impractical at worst, especially if your annotations are more complicated than simple labels (e.


bounding boxes or segmentation masks).

So, in almost all cases, it’s worth spending a bit of time up front to create an annotation utility.

My first version of the LEGO label utility was very simple but was the bare minimum to allow me to get labelling done at a decent speed.

Since I don’t know which part number is associated with a particular part name, I needed to add in a simple text search utility which would allow me to search something like ‘Brick 2 x 4’ or ‘Plate 1 x 2’ when presented with a fresh bundle of images to label.

Without the search tool, I’d have to use Google or scroll through my parts database to look up the correct part number for an element.

The bare minimum for usability, but we can do better.

I was quickly fed up with having to manually type in part numbers, and doing it all while sitting at a computer is a bit of a drag.

So, I decided to leverage my web dev experience and spent a few hours putting together a simple web app.

This ended up being a big win — not only was the user experience smoother which meant I was able to label faster, but because it ran on my phone I was also able to substitute some of the time I usually spend mindlessly scrolling through Twitter with labelling LEGO bricks.

It’s actually quite the calming experience (especially compared to the Twitter scrolling).

Label Utility 2.

0: Search, then tap the icon that matches the bundle at the top.

Compared to manually googling the part number and entering it into labels.

csv, the speedup from using a simple labelling app was about 2–5x on average, and that’s not including the convenience of being able to do the labelling from anywhere.

Using a web app as a labelling app also has a very appealing secondary upside: It becomes easy to begin including other people in the labelling process, by simply giving them a link.

I haven’t attempted this yet as part of my project but I’d be eager to try it, especially when factoring in the next step.

Step 3: Get someone else to do it (AKA: Use AI-Assisted Annotating)While you’ve been working on labelling your data, you’ve been prototyping your actual model and doing mock training runs on the side, right?.Of course you have! Once you’ve annotated a decent amount of data (let’s say 10–30% of what you’ll need for your finished product) you should have the capability to train your network to do a kinda crappy but semi-decent job at your task.

If you’re building a classifier for example, maybe instead of getting a top-1 accuracy of 95%, you can get a top-5 accuracy of 90%.

This is the situation I found myself in after having labelled about 30,000 images by hand using my first two strategies.

I realized that by hooking up this ‘proto-classifier’ to my labelling utility, I could drastically improve my labelling speed even further.

Instead of needing to do a text search for every single bundle of images, I could first present the top guesses from the proto-classifier, one of which would be correct 90% of the time.

Label Utility 3.

0: Just choose one of the AI’s suggested options!.The text search is still there for when the AI fails.

This gave me another 2–10x speedup — it was especially effective for weird parts for which I wouldn’t easily be able to remember the name.

I am now able to label an entire bundle of images 1–2 seconds on average.

That’s 5–10 images per second!.Not bad for a human!By implementing the above key steps in my labelling pipeline, I was able to achieve a 40x — 1000x speedup in my annotation pipeline.

Without employing any outside help, I have already labeled over 100,000 images, and that was mostly just in the spare time I’d usually spend scrolling through social media.

Finally, here’s a video of real-life usage of the label utility.


. More details

Leave a Reply