I wrote a Python program to calculate the most commonly used words in subreddits. Here’s what I found…

I used a set instead of a regular list for more efficient lookup time O(1) vs O(n)Yikes, I’m still adding new words to this setThe modification to check for the common wordsnow I have an ordered mapping of unique words to a subreddit and so I have to just display it..Python has a neat library called Matplotlib which can represent data beautifully..I used the pie chart component to display the data and picked the top 10 words.Using matplotlibThe results were glorious, I tested it out on a few subreddits each with a sample size of 10,000 comment posts ..Here’s what the subreddits had to say:Warning, lots of foul language ahead!.(it is Reddit after all)Our dear President is a hot topic over at the politics subredditFor a subreddit about atheism, religion sure is discussed a lotOur president is yet again the highlight of a lot of newsLots of positivity from this subreddit, to be expected from a supportive communityAs a football fan I can confirm that ‘goal’ and profanity are the most commonly used phrasesNot surprising to see Facebook here with all the controversies recentlyThis was a really fun hack to do and it turned to be more technically challenging than I anticipated..You can check out the code for it along with my other projects hereThanks for reading!.I hope you learned something whether it was about Reddit or tech. More details

Leave a Reply