Diving into Google’s Landmark Recognition Kaggle Competition

{} ;This step effectively reduces the 500+ gigabyte training set to ~50 gigabytes.

Much more usable.

More importantly, I already know that when training this large of a dataset, time is going to be an issue.

By pre-sizing the images on the hard drive, we don’t recompute multiple times during the training epochs.

Just in case, I kept the original files if I need a different size later.

Check if files are emptyAnother unfortunate discovery was the corruption of several files that lacked data.

Incomplete records can cause the model to fail part way through a train (which can take up to 9 hours).

It is much better to check first.

import osfilenames=os.

listdir("/home/user/data/google/")# Are there any empty files?for filex in tqdm(filenames): try: filex=directory+"/"+filex if (os.

stat(filex).

st_size == 0): #0 means bad file print(filex) except: print (filex)Any files that failed I replace with a picture of the same type of landmark.

The most common landmarksWith so many files, another big concern is just the sheer number of landmark classes.

The most common, “id 138982” has over 10,000 instances, while some landmarks only have 1.

We have 203,094 different classes here, and we really should think about reducing that number.

Dimensionality is hardIf you look at other CNN datasets, you have some simple ones like Mnist (10 classes) and more complex ones like ImageNet (1,000 classes).

So a dataset with 200,000 categories is crazy.

So let’s make this easier and make it only at tough as ImageNet.

We take the 1,000 most common classes and uses only 11.

1% of the dataset.

Not surprisingly, it trains quickly and gets up to around 50% with ResNet34.

Through lots of experimentation, I get something that looks like this.

(I like old-school pen and ink)Resnet34 experiments for test 1I certainly had trouble with the GAP Score but, I think most people were non-plussed by it.

Implementing my GAP Score iteration didn’t match the public Kaggle score exactly.

However, it did show I was on the right path.

A larger sample size classes (top 100, top 1000, top 2000, etc.

), higher accuracy, and higher validation GAP score led to a higher Kaggle score.

However, as the dataset grows, we also have problems with how long it takes to train and have with lower accuracy on the predictions.

Disappointment moving toward the 2nd Stage testI started very strong in the competition with a 1st Stage public score getting me in the top 30.

I was very excited about this as I felt my training was only getting better.

My categories and accuracy were increasing rapidly.

However, my training rapidly hit a wall with 20,000 categories.

I only had around 30% with 7,000 different categories (I couldn’t even get predictions for all the new categories).

Although I tried several different solutions at that point, I was too distracted by the sister competition (Google Landmark Retrieval 2019) to go back and fix errors.

Areas for improvementData cleaning and Landmark vs.

no-landmarkWhen taking a look across all the training data, I noticed that landmark 138982 was the most common one and decided to investigate.

When looking through 200 images for this class, there were no landmarks.

For example, see below.

We have insides of laboratories, classrooms, landscapes, people in front of trees, and nothing that looks like a landmark.

Some were in color, and some were black and white (these images appear to have a green tint just for this function), and quite a few had a little tag at the bottom.

It turns out they are from the library of ETH Zurich, and I still have no idea why they are considered a landmark.

After all, would a photo in the Library of Congress “be” the Library of Congress?That doesn’t make sense.

Overcome by hubris, I neglected to include a no-landmark or have data cleaning regarding if a landmark was present.

In hindsight, this was a colossal mistake, and detecting landmarks vs.

non-landmarks were in many solutions.

Competitors discussing landmark vs.

non-landmark:2nd Place — Doesn’t discuss, just puts out external data8th Place27th PlaceFinding the nearest neighborsSeeing the images that were closest to the test images was also extremely important.

I used the nmslib to help with this.

Based on my Resnet50 training nmslib would compare the softmax predictions of 1 test vs.

the predictions made on my validation set.

Then it would find the K closest validation images to that test image.

See my scribbles explaining it below.

If it were ImageNet, it would make sense that dog images would have higher predictions of dogs.

If the K closest neighbors in the validation set are dogs, it is very likely the test image is a dog.

Likewise, if it is an image category 12345, it should be ranked very highly in its 12345ishness.

If the five nearest validation images are 12345, it probably is.

Now my mistake here was using softmax and trying to use thousands of variables instead of using the last 512 features.

Trying to have 200k of categories eats lots of RAM.

I was up to 62G when I crashed.

With some wildly astronomical calculations, I discarded it in my final submission to go work on the other competition.

A few more problemsAlso, since there were some distracting images.

The KNN would miscategorize entire subjects.

For example, this image:As you can see the picture above prominently features, people are posing for the photographer, and it matches the closest to the photos below.

These comparisons are focusing on people but not on landmarks.

Even then, it misses if they are modern or old-timey photos.

I think with better architecture and more categories, I wouldn’t have had this problem.

The best uses of KNN were in the following solutions:1st Place8th Place15th Place20th Place27th PlaceBetter architectureWhy Resnet50?.I was having trouble saving with ResNeXt architectures, started DenseNet too late, and EfficientNet functionality is being built.

Next time, I will go from ResNet34 to ResNeXt or the new EfficientNet.

Slowing stepping up categories I also attempted to increase the number of training categories slowly.

Start with 1000, then do 2000, then do 3000, etc.

The thought being it is faster to learn a fewer number of categories, then taking that learning into a broader set would be more accurate.

Like adding in additional flashcards when learning another language.

However, I did not significantly help out with training or accuracy, and I ditched it.

Perhaps the steps up were too large, and I should have had smaller ones.

ConclusionsLarge datasets are hard.

Fun but hard.

KNN is extremely valuable but doesn’t break your computerLove multi-GPU processingMove onto larger datasets quicklyAgain big thanks to Fast.

ai, which taught me how even to get this far!References:My Github code — Placeholder GAP for Pytorch/Fast.

aiKaggle CompetitionNMSlibEfficientNet.

. More details

Leave a Reply