Exploring Reddit’s ‘Ask Me Anything’ Using the PRAW API Wrapper

Exploring Reddit’s ‘Ask Me Anything’ Using the PRAW API WrapperA brief PRAW tutorial for future Reddit analystsAlexander ShropshireBlockedUnblockFollowFollowingMay 14PRAW + Python can be used to quickly access Reddit’s APIAt the time of this writing, Reddit.

com sits as the 5th most popular website on the internet in the United States.

Most of you probably have spent time on the site, but for those unfamiliar, Reddit is a massive collection of community-driven forums, or “subreddits”, where people share news, content, and opinions on almost any topic.

One of the most popular communities, Ask Me Anything (r/IAmA), connects famous or interesting figures with everyday Redditors in order to answer questions directly from an audience of anonymous engagers aiming to take full advantage of the unparalleled access to some of the most compelling people in the world.

In this article, I’ll be focusing on this specific community to guide current and future data scientists at a high level through the process of connecting to basic information housed in Reddit’s API using PRAW — a Python “wrapper,” which is like an add-on package that simplifies sets of API calls into easy-to-use functions for users.

This particular walkthrough is more about quickly setting up the connection, avoiding any deep dive into business applications (contact Reddit Partnerships for that).

That said, I hope to leave you with not only a new data skill but also a collection of some of the most compelling AMA’s to ever make it to Reddit!Step 1: Install or Update PRAW in your Terminal#to install for the first timepip install praw#to updatepip install praw –upgrade prawStep 2: Create and/or Login to Your Reddit Account to begin Authenticating via OAuthClick here to create an account or log in to an existing account, and ensure you’ve verified your email.

Click here when you finish that to arrive at your list of applications, and click on the somewhat hard-to-find button to ‘Create an app’.

Beware of Reddit’s API guidelines.

Name your app and your use case (‘script’ is good for basic, innocent exploration), write a brief note about your plans and input a required redirect URL (I used my GitHub URL for this).

After you click ‘create app’, your page will update to reveal both a personal use script key, which will often be referred to as a ‘client id’ in many API tutorials and a secret key, which you may know as an ‘API key’.

When using an API that requires an API key and password you should never hardcode these values into your main file.

In order to protect your API keys from being publicly broadcasted, check out this great instructional jupyter notebook with some more detailed notes on setting up a .

secret/ directory to store your sensitive information in a local JSON file and properly call upon it in when needed.

Your credentials.

json file should look something like this:{"client_id": "XXXXXXXXX-XXXX","api_key": "XXXXXXXXXXXX_XXXXXX-XXXXXXX","username": "<your Reddit username>","password": "<your Reddit password"}Step 3: Create your first Authorized Reddit InstanceLoad your secret keys fromcredentials.

json, or your similarly titled folder:# Load secret keys from credentials.

jsonimport jsonurl = 'https://www.

reddit.

com/'with open('/Users/<Your CPUs User>/.

secrets/credentials.

json') as f: params = json.

load(f)Import PRAW wrapper and authorize Reddit instance:import prawreddit = praw.

Reddit(client_id=params['client_id'], client_secret=params['api_key'], password=params['password'], user_agent='<name it something descriptive> accessAPI:v0.

0.

1 (by /u/<yourusername>)', username=params['username'])Step 4: Obtain a Subreddit Instance from your Reddit InstanceTo obtain a subreddit instance, pass the subreddit’s name when calling subreddit on your reddit instance.

For example:subreddit = reddit.

subreddit('iama')print(subreddit.

display_name) # Output: iamaprint(subreddit.

title) # Output:I Am A, where the mundane.

print(subreddit.

description)Some other subreddit methods for basic analysis:created_utc: Time the subreddit was created, represented in Unix Time.

description: Subreddit description, in Markdown.

display_name: Name of the subreddit.

id: the ID of the subreddit.

name: Full name of the subreddit.

subscribers: Count of subscribersStep 5: Obtain a Submission Instance from your Subreddit InstanceTo gather some data on the submissions within the subreddit of interest, we can use a for loop to iterate through a specified number of submissions, sorted by either controversial, gilded, hot, new, rising, or top submissions on the page.

#iterating through the 10 submissions marked hotfor submission in subreddit.

hot(limit=10): print(submission.

title) # Output: the submission's title print(submission.

score) # Output: the submission's upvotes print(submission.

id) # Output: the submission's ID print(submission.

url) # Output: the URLSome other submission methods for basic analysis:authorProvides an instance of theRedditor.

created_utcTime the submission was created, represented in Unix Time.

distinguishedWhether or not the submission is distinguished.

link_flair_textThe link flair’s text content (sort of like a submission category within a subreddit), or None if not flaired.

nameFull name of the submission.

num_commentsThe number of comments on the submission.

over_18Whether or not the submission has been marked as NSFW.

spoilerWhether or not the submission has been marked as a spoiler.

stickiedWhether or not the submission is stickied.

titleThe title of the submission.

upvote_ratioThe percentage of upvotes from all votes on the submission.

urlThe URL the submission links to, or the permalink if a self-post.

Step 6: Create a Pandas DataFrame of Basic Submission Stats Taken From the ‘Ask Me Anything’ SubredditI chose to make a 200-row, 7-column DataFrame made up of the top submissions on the r/IAmA subreddit, where many notable Ask Me Anything discussions take place.

This may take quite a while depending on the size of the data being taken from the subreddit and the speed of your local or virtual machine, so I like to include print statements in the for loop so that I can track progress.

ama_title = []time = []num_upvotes = []num_comments = []upvote_ratio = []link_flair = []redditor = []i=0for submission in subreddit.

top(limit=200): i+=1 ama_title.

append(submission.

title) time.

append(submission.

created_utc) num_upvotes.

append(submission.

score) num_comments.

append(submission.

num_comments) upvote_ratio.

append(submission.

upvote_ratio) link_flair.

append(submission.

link_flair_text) redditor.

append(submission.

author) if i%5 == 0: print(f'{i} submissions completed')This will print out a note for every 5 submissions the for loop iterates through.

Once it finished, you can throw your data into a Pandas DataFrame for turnkey wrangling and analysis.

ama_df = pd.

DataFrame( {'ama_title': ama_title, 'time': time, 'num_comments': num_comments, 'num_upvotes': num_upvotes, 'upvote_ratio': upvote_ratio, 'link_flair': link_flair, 'redditor': redditor })ama_df.

head(10)Which should yield something like:Step 7: Exploring DataThe simplest way to help us quickly gauge the most meaningful AMAs is to plot the leaders by most commented (num_comments), most upvoted(num_upvotes), and most positive (upvote ratio), then do the same after grouping the rows by Topic/Category (‘link_flair’), taking the mean of these same stats for each group.

Most Engaging AMA FiguresSome of the people in these lists are unsurprising celebrity figures: Obama, Bill Gates, Bernie Sanders, Robert Downey Jr.

There are, however, some really interesting surprises that I urge you to read if you have the time.

Leah Remini discusses her remarkable journey toward emotional and spiritual freedom from the church of Scientology, a viral undecided voter during the 2016 presidential election Ken Bone, an open and honest survivor of Joseph Stalin’s dictatorship Anatole Konstantin, and recently detained WikiLeaks founder, journalist-computer scientist Julian Assange.

Some of the purely positive AMAs made me really appreciate some of the non-super-celebrity discussions.

There was an engineer who implementing new feature suggestions on the spot to his free version of Photoshop (ivanhoe90), access to space scientists from NASA’s New Horizons team aiming beyond Pluto (Alan Stern), software developers and hiring managers at NASA’s JPL, and the 2nd person to walk on the moon, Buzz Aldrin.

These extremely compelling adventurers are joined by comedy legend Eric Idle, who brought the entire Monty Python crew in on his AMA fun to connect with fans.

Most Engaging AMA TopicsIt’s no surprise that professional Athletes garner an engaged following on the web due to their high visibility, expansive media coverage, and likely support of enormous corporations like the NBA, UFC (Ronda Rousey), sports agencies, and apparel brands.

It’s no secret that these AMAs are sometimes booked alongside efforts to promote a product, an event, or a brand.

The Nonprofit topic appears in all three bar charts due to Bill Gates on behalf of his foundation, net neutrality activists, and ACLU supporters.

A community-first platform like Reddit is truly a great hub for those seeking to organize to incite change or, perhaps, to talk about trolling your fast food competitors on Twitter (Wendy’s social team).

Highly-upvoted Newsworthy Event discussions were led by Sid Fischer, a school shooting survivor who was in the 3rd room shot into at Stoneman Douglas High School in Parkland, FL, and, on a somewhat lighter note, a guy who dressed up as the Monopoly Man to photobomb former Equifax CEO Richard Smith’s Senate hearing.

Crime/Justice leads the pack for most positive AMA topics due to a fairly unanimous cause for celebration spurred by a topic that proudly reads “Idaho passed an Ag-Gag law that made it a crime to expose animal abuse on factory farms.

We sued them and won.

” The famous O.

J.

Simpson murder trial prosecutor, Christopher Darden, also took a moment to open up to the AMA community in July of 2017.

Elon Musk and the new CEO of Reddit Steve Huffman helped boost the upvote ratio of the Business topic.

I hope I was able to expand the research potential of those interested in data science and those newly connected to Reddit’s API via PRAW!Please reach out with any questions, comments, thoughts, or ideas for future articles.

Thank you for reading,AlexFor detailed API connection & exploration code, check out my GitHub!For additional details on getting the most out of PRAW, check out the documentation.

Let’s connect on LinkedIn!.

. More details

Leave a Reply