Learn To Build Your Own Historical Stock Price Data Sets

Learn To Build Your Own Historical Stock Price Data SetsLearn How To Source and Build a DB of Stock PricesCostas AndreouBlockedUnblockFollowFollowingMay 20When it comes to implementing any sort of strategy or model, the first thing anyone needs is data.

Whether that is to calculate the Value At Risk of your portfolio, or try to work out a leading indicator for Stock Price movements.

Sometimes that data is available for free, which is great; other times however, that data is not freely available.

You either have to pay a hefty subscription fee, or worse, the data is simply not available.

So, how do you go about obtaining your own data set?Well, given that the data set is nowhere to be found; historical data is out of the question.

So, what we can do instead, is source daily data and build our own repository.

Every day, we can scrape the data we are after and slowly slowly, build our own historical data set.

Photo by Max Nelson on UnsplashHow To Get StartedThe first thing we need to get started is a source of daily data.

In this example, we will be looking at UK stocks (even though there could potentially be places where we can find historical data).

In this example case, I have identified www.

lse.

co.

uk as a potential source of our daily stock prices.

We can attempt to source data for the stocks with tickers TSTL and SOLI.

Data ScrapingAs we have seen in depth in my previous article Data Scraping Stock Prices with Regex and BeautifulSoup, we can connect to a website using the urllib library, navigate the HTML with BeautifulSoup and even pull out the information we are after using Regex.

As a first step, we will define the ticker we are after in a variable, then connect to the connect URL and then extract the information we are after:from urllib.

request import urlopenfrom bs4 import BeautifulSoupimport ssl, re, datetime, pandas as pd# Ignore SSL certificate errorsctx = ssl.

create_default_context()ctx.

check_hostname = Falsectx.

verify_mode = ssl.

CERT_NONETicker = 'TSTL'url = 'http://www.

lse.

co.

uk/SharePrice.

asp?shareprice=' + Tickerhtml = urlopen(url, context=ctx).

read()soup = BeautifulSoup(html, "html.

parser")tags = soup('span')for line in tags: if re.

search('"sp_sharePrice sp_' + Ticker + '_MID" data-field="sharePrice"',str(line)): print(Ticker, re.

findall('([0-9]+.

[0-9]+)', str(line))[0])which returns exactly what we are after:We can therefore turn this into a callable function, with only some minor modification:def EOD_Price_Call(Ticker): url = 'http://www.

lse.

co.

uk/SharePrice.

asp?shareprice=' + Ticker html = urlopen(url, context=ctx).

read() soup = BeautifulSoup(html, "html.

parser") tags = soup('span') #print(soup.

prettify) for line in tags: if re.

search('"sp_sharePrice sp_' + Ticker + '_MID" data-field="sharePrice"',str(line)): return(Ticker, re.

findall('([0-9]+.

[0-9]+)', str(line))[0])This would allow us to call this function as EOD_Price_Call(‘TSTL’).

This pretty much concludes all the data scraping we needed to do.

Next thing we need to figure out, is how to save this data in our database.

Photo by Markus Spiske on UnsplashCreating the Database and Saving Our DataAs we have previously seen in depth in my previous article From Excel to Databases, Python allows us to quickly spin up a database and store our data.

To do that we can use SQLite.

We can create a database called FinanceExplained and then we can create a table called Price, if it doesn’t already exist.

We can do this with:import sqlite3, timeconn = sqlite3.

connect('FinanceExplained.

sqlite')cur = conn.

cursor()cur.

execute('CREATE TABLE IF NOT EXISTS Price (COB DATE, Ticker VARCHAR(5), Price Numeric)')We can then create a function that will take in a date and will store the Ticker Price we have scraped earlier.

We can also check if the stock information already exists, then we can skip the part where we try to source the data.

def UpdateDB(date): NoOfDBRecords = 0 cur.

execute('SELECT * FROM Price WHERE COB = ?', (date, )) try: cur.

fetchone()[0] print('Entry in the db already exists') except: print('Inputing in the DB') for tick in Tickers: temp = EOD_Price_Call(tick) cur.

execute('INSERT INTO Price VALUES(?, ?, ?)', (date,temp[0],temp[1])) print('Input',date,temp[0],temp[1]) time.

sleep(3) conn.

commit()We have to be careful however, to ensure that the date we submit when we call this function is the correct date.

If we were to submit, today’s date, then that would be incorrect.

We have to ensure that we pass in the last working day.

Photo by Element5 Digital on UnsplashWorking Out the Last Working DayWhen it comes to working with dates, we will almost always need to use the datetime and time package.

Pandas also offers some advanced functions that we will utilise.

The way we can approach this, is to first determine whether today (i.

e.

when we run our script), is a business date.

Pandas already offers us the bdate_range() function that returns whether a date is a business day or not.

import datetime, pandas as pd, time#Function that tells you whether the date passed is a business datedef is_business_day(date): return bool(len(pd.

bdate_range(date, date)))So, if we were to say is_business_day(‘19/05/2019’), it would return 0; i.

e.

today, a Sunday, is not a business day.

Next up, we need to figure out the last working date.

Once again, we can build a new function using the one we just built as the base:#Checks to find out when was the last business date if today is not a business daydef last_business_day(date): i=0 while (is_business_day(today – datetime.

timedelta(days = i)) == False): i = i+1 return(today – datetime.

timedelta(days = i))This script will return a date; the last business day from the date we passed in as an argument.

Putting It All TogetherPutting it all together now that we have all the pieces coded up is easy.

As we need to do, is check if today is a business day.

If it is, then call our UpdateDB function which will call the data scraping function and save the data.

Otherwise, we need to figure out the last working day, and call the UpdateDB function with that date.

Therefore, putting it all together, it looks like this:from urllib.

request import urlopenfrom bs4 import BeautifulSoupimport ssl, re, datetime, pandas as pdimport sqlite3, timeconn = sqlite3.

connect('portfoliodb.

sqlite')cur = conn.

cursor()cur.

execute('CREATE TABLE IF NOT EXISTS Price (COB DATE, Ticker VARCHAR(5), Price Numeric)')today = datetime.

date.

today()#——-Set Up————————–Tickers = ('TSTL','SOLI')#—————————————#Function that tells you whether the date passed is a business datedef is_business_day(date): return bool(len(pd.

bdate_range(date, date)))#Checks to find out when was the last business date if today is not a business daydef last_business_day(date): i=0 while (is_business_day(today – datetime.

timedelta(days = i)) == False): i = i+1 return(today – datetime.

timedelta(days = i))# Ignore SSL certificate errorsctx = ssl.

create_default_context()ctx.

check_hostname = Falsectx.

verify_mode = ssl.

CERT_NONE#Calls the lse website to get EOD datedef EOD_Price_Call(Ticker): url = 'http://www.

lse.

co.

uk/SharePrice.

asp?shareprice=' + Ticker html = urlopen(url, context=ctx).

read() soup = BeautifulSoup(html, "html.

parser") tags = soup('span') #print(soup.

prettify) for line in tags: if re.

search('"sp_sharePrice sp_' + Ticker + '_MID" data-field="sharePrice"',str(line)): return(Ticker, re.

findall('([0-9]+.

[0-9]+)', str(line))[0])def UpdateDB(date): NoOfDBRecords = 0 cur.

execute('SELECT * FROM Price WHERE COB = ?', (date, )) try: cur.

fetchone()[0] print('Entry in the db already exists') except: print('Inputing in the DB') for tick in Tickers: temp = EOD_Price_Call(tick) cur.

execute('INSERT INTO Price VALUES(?, ?, ?)', (date,temp[0],temp[1])) print('Input',date,temp[0],temp[1]) time.

sleep(3) conn.

commit()#Check if today is a business days – if not determine the last business daysif (is_business_day(today) == True): #print('if') UpdateDB(today)else: #use last_business_day function print('else') UpdateDB(last_business_day(today))cur.

close()Closing ThoughtsIn this article, we explored how you can scrape data off the internet and how you can build your own database of historical data.

This will enable you to carry out any analysis you may require.

Let me know how you put this into practice, and make sure to follow me if you want to see my future blogs about Finance and Technology.

.

. More details

Leave a Reply