How fuzzy matching improve your NLP model

Using different algorithms to score strings between 0 and 100.To see whether we can query back expected location name, let see whether Japan (inputted “jepen”) and United States (inputted “United tates”) can be returned.# Default scorer is Weighed Ratiofor location in ['Hong Kong', 'jepen', 'United tates']: result = process.extract(location, countries, limit=2) print(result)The result is[('Hong Kong SAR China', 90), ('Congo – Kinshasa', 57)][('Japan', 60), ('Yemen', 60)][('United States', 96), ('United Arab Emirates', 86)]QRatio: A quick ratio comparison for strings.# Partial Ratioprocess.extract('Hong Kong', countries, scorer=fuzz.QRatio, limit=3)The result is[('Hong Kong SAR China', 64), ('Kongo', 57), ('Togo', 46)]There is UWRatio (Same as WRatio) and UQRatio (Same as QRatio) in case you have to deal with unicode.Take AwayTo access project template, you can visit this github repo.About MeI am Data Scientist in Bay Area..Focusing on state-of-the-art in Data Science, Artificial Intelligence , especially in NLP and platform related..You can reach me from Medium Blog, LinkedIn or Github.ReferenceFuzzywuzzy in Python (Original)Fuzzywuzzy in JavaCountry datasetWord Embeddings Story. More details

Leave a Reply