Google Landmark Recognition using Transfer Learning

We needed the images to be marked by their classes in training and validation sets hence we set up our data sets segregated accordingly.Post data pre-processing, our method included using the VGG16 pre-trained model on the data along with DeLF for dealing with difficult test images..DELF matches local features in the images about which we will discuss in detail in the later half..In the end we used a sample test set from the training data to retrieve an accuracy score and our model was able to reach up to 83% accuracy (claps!/gasps!) .I..Obtaining the Image dataData came in the form of CSVs with image URLs and landmark IDs..Our sample of 2000 classes contained class labels (Landmark IDs) from ‘1000’ to ‘2999’. Data pre-processing for this project can be broadly classified into following steps:a) Train-validation-holdout data: Since the files in test folder were majorly junk and did not contain any landmarks, we had to make our own holdout set from training.csv for testing our final models. We took 1% images from each class to make the holdout dataset. Next was the train validation split on the remaining 99% data. 20% images from each class were labelled as validation set and the remaining 80% were used for training. At the end of this step, this was the data distributionTest set had 1183 rows. Validation data had 31,651 rows and training data had 130,551 rows.b) Fetch the image files: After having these splits ready with image urls and unique IDs, next step was image download in appropriate folders..Three separate folders were created, one for each train, validation and holdout set..Resized images (96×96) were downloaded to respective folders (on GCP) which took about 9 hrs to complete.c) Make the directory structure: The train and validation data had to be in a specific directory format for us to be able to use Keras functionalities..We transformed our data to this structure where each class is a subfolder inside Train/Validation folder..This subfolder contains all the image files belonging to that class.II..Data PreprocessingData CleaningSome of the URL links were broken and the downloaded file was corrupt..All such files were removed using a filter on file size for >1000 bytes before moving the files to specific directories..Next we found that some of the classes went missing in validation folder.. More details

Leave a Reply