VerbiAge: Using NLP to help writers craft age-specific writing

VerbiAge: Using NLP to help writers craft age-specific writingSameeran Kunche is an Insight alumnus from the Winter 2018 Insight Data Science Fellowship in Silicon Valley..In this blog, Sameeran shares how he developed his Insight project, VerbiAge, an app to help authors write book descriptions for K-12 readers.A book description is an advertisement..Furthermore, the task of choosing the best words for a book description becomes particularly difficult for books marketed to the K-12 demographic, where it is unclear who — a young reader, a parent, or an educator — is making the decision to buy a book.During my Insight Data Science Fellowship, I focused my efforts on using natural language processing (NLP) to build a tool that would alleviate some of that uncertainty for writers..The app makes predictions using a model trained on thousands of book descriptions to give a writer a sense of how the words they’ve used stack up against other book descriptions..Book descriptions and lists of recommended reading by grade level are both publicly available from the California Department of Education, which published lists of recommended books among four grade categories spanning K-12 with roughly twenty-five hundred book descriptions per category.The classes in the data were considered to be balanced.Training a classifierThe workflow I used for training the classifier is not uncommon for a text classification problem..Each feature that is used by a model is a dimension along which an observation is represented, and the density of training data decreases exponentially with higher dimensionality..This is often referred to as the curse of dimensionality.To mitigate the effects of sparsity on the trained model, I generated and inspected two curves that describe a model’s training: the validation curve and the learning curve.Validation curve..Although the model presented here was solely trained on tf-idf values of unigrams to help writers write book descriptions for K-12 students, many opportunities for additional feature engineering, dimensionality reduction, and alternative use cases for the tool still remain open for exploration.Interested in transitioning to a career in data science?. More details

Leave a Reply