Histopathologic Cancer Detector – Machine Learning in Medicine

We can freeze the low-level feature-extractors and focus only on the top-level classifiers.But what if our dataset is way different from the original dataset (ImageNet)?In fact, our histopathologic cancer dataset seems to fit into this category.Instead of freezing specific layers and fine-tuning the top-level classifiers, we are going to retrain the whole network with our dataset.Even though it’s not going be as fast as fine-tuning only the top classifiers, we are still going to leverage transfer learning because of the pre-initialized weights and the well-tested CNN architecture.In our Histopathologic Cancer Detector we are going to use two pre-trained models i.e Xception and NasNet.ModelThis is our model’s architecture with concatenated Xception and NasNet architectures side by sideand this is how it looks in code.Keep in mind that the above model is a good starting point but in order to achieve a top score, it would certainly need to be refined so don’t hesitate to play with the architecture and its parameters.Data AugmentationWhile our dataset of 170 000 labeled images may look sufficient at the first sight, in order to strive for a top score we should definitely try to increase it..One way to artificially do it is to use data augmentation.Data augmentation is a concept of modifying the original image so it looks different but still holds its original content..In order to do it we can for example zoom, shear, rotate and flip images.Take a look at the following example of how we can ‘create’ six samples out of a single image.Data augmentation code used in the Histopathologic Cancer Detector project looks as follows.TrainingFinally, we can proceed to the training phase..We are going to train for 12 epochs and monitor loss and accuracy metrics after each epoch.Besides training and validation plots, let’s also check the Receiver Operating Characteristic Curve which is a Kaggle’s evaluation metric.Our top validation accuracy reaches ~0.96..It means that we can correctly classify ~96% of the samples and tell whether a given image contains a tumor or not.What’s Next?After reading this article, you should be aware of how powerful machine learning solutions can be in solving real-life problems..Think about it this way, we’ve developed an impressive tumor identifier in just about 300 lines of Python code..I encourage you to dive deeper into such areas because, besides the obvious benefits of learning new and fascinating things, we can also tackle crucial real-life problems and make a difference.Don’t forget to check the project’s github page.gsurma/histopathologic_cancer_detectorCNN histopathologic tumor identifier..Contribute to gsurma/histopathologic_cancer_detector development by creating an…github.comQuestions?.Comments?.Feel free to leave your feedback in the comments section or contact me directly at https://gsurma.github.io.. More details

Leave a Reply