Comparison of the best NSFW Image Moderation APIs 2018

Comparison of the best NSFW Image Moderation APIs 2018A comprehensive benchmark of multiple Image content filtering API providers across different categories like Nudity, Pornography and Gore.A human being can instinctively decide whether what they are seeing is inappropriate or not..I decided against using the open source YACVID 180 image dataset as it relies primarily on using nudity as a measure of NSFW content.Collection of NSFW images is a tedious, time consuming and downright painful task hence the low image count.The dataset has been open-sourced and is available for download Here.[WARNING:Contains Explicit Content]Here is a sheet containing the raw predictions of the APIs on each image in the dataset.Metrics:Each of the classifiers are evaluated on the universally accepted metrics such as:True Positives: TPIf a classifier called something NSFW and it was actually NSFWTrue Negatives: TNIf a classifier called something SFW and it was actually SFWFalse Positives: FPIf a classifier called something NSFW and it was actually SFWFalse Negatives: FNIf a classifier called something SFW and it was actually NSFWAccuracyIf the model makes a prediction can you trust it?PrecisionIf the model says an image is NSFW how often is it right?RecallIf all the NSFW Images how many does it identify?F1 ScoreIt’s a mixture of Precision and Recall and often similar to accuracy.I evaluated the following APIs for the content moderationAmazon RekognitionGoogleMicrosoftYahooAlgorithmiaClarifaiDeepAIImaggaNanonetsSightengineX-ModeratorPerformance Across CategoriesI first evaluated each of the APIs by category to see how they perform at detecting each of the different types of NSFW content.Pornography/ Sexual ActsThe Google and Sightengine API really shine here by being the only one that is able to detect all the pornographic images correctly..Microsoft and Imagga have the worst performance on this category.Links to original images: Porn19, Porn7, Porn18, Porn14The images that are easy to identify are explicitly pornographic..Most of them predicted NSFW content with a very high confidence.Links to original images: Porn6, Porn2, Porn10, Porn3The images that were difficult to identify were due to occlusion or blurring which made it difficult..The definition of what is considered nudity has always been subject to debate and as is clear from the images that are difficult to identify they mostly fail in cases where one could argue these are SFW.Links to original images: Nudity10, Nudity7, Nudity13, Nudity14The images that are easy to identify had clear visible nudity and are explicit..Suggestive nudity is almost as easy to identify for a machine as nudity but the places where it makes a mistake are in images which normally look like SFW images but have some aspects of nudity.Links to original images: Suggestive13, Suggestive10, Suggestive2, Suggestive8Once again none of the providers got the easy to identify images wrong..This indicates that these algorithms find it easier to identify artificially generated images than naturally occurring images.Links to original images: SimulatedPorn1, SimulatedPorn16, SimulatedPorn19, SimulatedPorn9All the providers have perfect scores and high confidence scores.Links to original images: SimulatedPorn15The one image that Imagga got wrong could have been construed as maybe not porn if you didn’t look long enough.GoreThis was one of the most difficult categories as the average detection rate across APIs was less than 50%..However even in the best performing images 4/12 providers got the images wrong.Links to original images: Gore7, Gore9, Gore17, Gore18There was no discernible pattern in the images that were difficult to predict..X-Moderator also does very well here.Links to original images: SFW15, SFW12, SFW6, SFW4The easy to identify images had very little skin showing and would be very easy for a human to identify as SFW..Only 1 or 2 providers got these images wrong.Links to original images: SFW17, SFW18, SFW10, SFW3The difficult to identify SFW images all had a higher amount of skin showing or were Anime (high bias towards Anime being porn)..Which begs the question if these are truly SFW?Overall ComparisonLooking at the performance of the APIs across all the NSFW categories as well as their performance in being able to correctly identify safe for work(SFW) content, I saw that Nanonets has the best F1 score and Average Accuracy thus performs consistently well across all categories..Google which does exceptionally well in detecting the NSFW categories marks too many of the SFW pieces of content as NSFW thus gets penalized in its F1 score.By ProviderI compared the top 5 providers by accuracy and F1 Score to showcase the differences in their performance.. More details

Leave a Reply