How to build a PostgreSQL database to store tweets

On the other hand, we will create a second table that will store information about the tweets: creation date, text (the tweet itself), user id, that will be our FOREIGN KEY (a key that uniquely identifies a primary key of another table) relating this table with our user table, and the retweet count.We will define a function that once called will connect to our database with the credentials (using the command pyscopg2.connect) and create a table containing the name of the term we want to search for..For this last step, postgres give us the possibility of setting up a cursor that encapsulates the query and reads its results a few rows at a time instead of executing a whole query at once..So, we will take advantage of that, create a cursor (using database.cursor()) and then execute our query to create the user table and then the tweets-containing table..We need to consider some points here: it is important to use the IF NOT EXISTS command when we perform the query to CREATE TABLE, otherwise postgres can rise the error that the table is already created and will stop the code execution; we need to clarify which type of variable each column contains (VARCHAR, TIMESTAMP, etc.), which column is the primary and foreign key, and in this last case, which column REFERENCES to; after we have execute the queries is important to commit this (database.commit()), otherwise, no changes will be persisted, and close the connection to the cursor and the database.Afterwards, we need to define a function that will help us store the tweets..This function will follow the same logic that we use to create the table(conect to the database, create a cursor, execute query, commit query, close connection), but instead we will use the INSERT INTO command..When creating the user table, we declared that user id will be our primary key..So, when we store the tweets we need to be careful how we insert this in the table..If the same user has two tweets, the second time the function is executed, it will rise an error because it detects that particular user id to already be in the table, as primary keys has to be unique..So, we can use here the ON CONFLICT command to tell postgres that if the user id is already in the table, it doesn’t have to insert it again..On the contrary, the tweet will be inserted into the tweets table and will be referenced to that user id in the user table.There are two ways to capture tweets with Tweepy..The first one is using the REST search API, tweepy.API, which searches against a sampling of recent public tweets published in the past 7 days..The second one is streaming realtime tweets by using the Twitter streaming api that differs from the REST api in the way that this one pull data from twitter while the streaming api pushes messages to a persistent session.In order to stream tweets in Tweepy, an instance of the class tweepy.Stream establishes a streaming session and send messages to an instance of StreamListener class.. More details

Leave a Reply