Basic Data Wrangling & Visualization with an ETF

Basic Data Wrangling & Visualization with an ETFLester LeongBlockedUnblockFollowFollowingJun 25OverviewAs promised from a prior post, we’re finally going through some coding examples.

On choosing a topic to share with you, I actually got the idea from taking one 10 minute breaks from performing Deep Work — a productivity technique from Cal Newport [1}.

For those of you that need a refresher, Deep Work is:A type of work performed in a distraction-free environment that focuses your concentration to create greater value that others will have great difficulty to replicate.

I personally go super try hard for around 40–50 minutes and need a 10–15 minute break afterwards.

During one of these breaks, my colleagues were talking about the market and bitcoin.

Boom — that’s when the next topic for this article came up.

Today, we’ll go over some basic time series data wrangling and visualization in Python.

Then we’ll top it off with a moving average trading strategy.

Now just for the record, I’m not saying this strategy is good or anything, but just want to show you what you can do after some data manipulation and visualization.

With the administration stuff out of the way, let’s get to it!Terms of Time Series & Data WranglingWhen learning new things, I always like to build from the ground up.

This way, we’re all on the same page and can build more complex models later down the line.

Basically, I would rather err on the side of explaining a little too much than not enough.

Personally, I always hated being confused, when someone made a huge logical jump.

Definition of Time Series: Time series is a type of data that is taken over time, and the data points are known to be related to each other [2].

Examples of time series include stock prices, daily temperature in Machu Picchu, the speed of a baby turtle running into the ocean, etc.

Definition of Data Wrangling: Data wrangling is a process of transforming raw data into cleaner data that makes models better [3].

For example, you get a ton of tweet data from Twitter.

Sticking that raw text data in a machine learning NLP model would not work out too well.

But if you remove unnecessary words, clean up special characters, structure the data for input and outcomes, etc.

, then you just might have a decent model that can predict or classify.

Basic Time Series Data Wrangling & VisualizationIn Python, it’s standard practice to import packages first and assign them shortcuts to save time.

For example, pandas will be called as pd from now on.

import pandas as pdimport matplotlib.

pyplot as plt%matplotlib inlineNow, let’s import the NASDAQ etf from csv into a dataframe (think of it like a worksheet in Excel) [4].

qqq = pd.

DataFrame.

from_csv(‘QQQ Historical Data.

csv’)print(qqq.

head())It’s always good to examine what the data looks like, so you have a better feel of what you are analyzing.

Below is a common format for stock and etf data:Price Open High Low Vol.

Change %Date 2014-12-31 103.

25 104.

50 104.

91 103.

10 24.

37M -1.

03%2015-01-02 102.

94 103.

76 104.

20 102.

44 31.

31M -0.

30%2015-01-05 101.

43 102.

49 102.

61 101.

14 36.

52M -1.

47%2015-01-06 100.

07 101.

58 101.

75 99.

62 66.

21M -1.

34%2015-01-07 101.

36 100.

73 101.

60 100.

48 37.

58M 1.

29%print(qqq.

shape)The shape command shows us how many rows and columns in our dataframe, in that order.

(1127, 6)print(qqq.

describe())Describe shows us general summary statistics.

Open High Low Pricecount 506.

000000 506.

000000 506.

000000 506.

000000mean 102.

880198 103.

857688 101.

770731 102.

864565std 17.

100460 17.

089441 17.

049718 17.

081374min 74.

040000 74.

830000 72.

000000 74.

05000025% 86.

740000 87.

715000 85.

620000 86.

73000050% 105.

260000 106.

300000 104.

040000 105.

07500075% 117.

812500 118.

677500 116.

722500 117.

745000max 133.

500000 133.

500000 132.

220000 133.

280000loc vs.

ilocOkay, so here’s a little command that always got me confused, when I first started learning Python.

When we use loc, it gets us a labeled row, while iloc gets us the indexed data point.

Below are examples of each:print(qqq_2015.

loc[‘2015–03–16’])Price 106.

7Open 105.

73High 106.

74Low 105.

62Vol.

25.

67MChange % 1.

29%Name: 2015-03-16 00:00:00, dtype: objectprint(qqq.

iloc[0, 0])103.

25Below is a sample code on how to grab a specific time series, in this case all of 2015 of the QQQ etf.

qqq_2015 = qqq.

loc[‘2015–01–01’:’2015–12–31']Plotting is pretty straight forward in python.

You can tell it what size you want the plot to be, what time series to plot, and when to display as seen below:plt.

figure(figsize=(10, 8))qqq[‘Price’].

plot()plt.

show()To make a new column in a dataframe, you use a similar pattern like making a new variable.

Below we will create a new column of getting the price difference and making a daily return.

The point of doing these two things is to see how much our trading strategy will gain or lose in a day.

qqq[‘PriceDiff’] = qqq[‘Price’].

shift(-1) — qqq[‘Price’]qqq[‘Return’] = qqq[‘PriceDiff’] /qqq[‘Price’]Another part of a trading strategy is telling the algorithm the direction of the price.

With this formation, the strategy can know whether the ETF or stock is going up or down.

From that, we can tell the strategy what to do next — i.

e.

trade the price to go up (long) or down (short).

qqq[‘Direction’] = [1 if qqq[‘PriceDiff’].

loc[ei] > 0 else 0 for ei in qqq.

index ]print(‘Price difference on {} is {}.

direction is {}’.

format(‘2015–01–05’, qqq[‘PriceDiff’].

loc[‘2015–01–05’], qqq[‘Direction’].

loc[‘2015–01–05’]))The final element in making a moving average strategy is making a moving average, which can be done with .

rolling() and .

mean().

Here, we’ll go with a 50 moving average, which shows the average closing price of the last 50 days.

Below is the code with the plot:qqq[‘ma50’] = qqq[‘Price’].

rolling(50).

mean()plt.

figure(figsize=(10, 8))qqq[‘ma50’].

loc[‘2015–01–01’:’2015–12–31'].

plot(label=’MA50')qqq[‘Price’].

loc[‘2015–01–01’:’2015–12–31'].

plot(label=’Price’)plt.

legend()plt.

show()Build Moving Average Trading StrategyWe have all the components for a trading strategy.

Adding from the code above, we can make a 50 simple moving average (SMA) and 100 simple moving average (SMA) cross over strategy.

Once the 50 SMA goes above the 100 SMA, we will trade for the intent that etf will go higher.

Once the 50 SMA goes below 100 SMA, we will close our long or not trade at all.

qqq[‘MA50’] = qqq[‘Price’].

rolling(50).

mean()qqq[‘MA100’] = qqq[‘Price’].

rolling(100).

mean()qqq = qqq.

dropna()The code below sets up how many shares we buy, and allows us to graph of the returns during our trades:qqq[‘Shares’] = [1 if qqq.

loc[ei, ‘MA50’]>qqq.

loc[ei, ‘MA100’] else 0 for ei in qqq.

index]qqq[‘Close1’] = qqq[‘Price’].

shift(-1)qqq[‘Profit’] = [qqq.

loc[ei, ‘Close1’] — qqq.

loc[ei, ‘Price’] if qqq.

loc[ei, ‘Shares’]==1 else 0 for ei in qqq.

index]qqq[‘Profit’].

plot()plt.

axhline(y=0, color=’red’)Looking at this for the first time, it’s pretty hard to tell if we are profitable or not.

Another way to look at it is the price gained from the share we traded with the .

cumsum() code — making a new column and plotting it:qqq[‘wealth’] = qqq[‘Profit’].

cumsum()qqq[‘we alth’].

plot()plt.

title(‘Total money you win is {}’.

format(qqq.

loc[qqq.

index[-2], ‘wealth’]))The plot above shows us how much money we gained from trading 1 share of QQQ etf, so we gained $48.

03 trading 1 share (43.

12% total return).

43.

12% is not a bad return, but it kind of is bad compared to just buying and holding the etf without trading it (68.

57% total return) from the same time period.

ConclusionCongrats, you’ve just went through some data wrangling, visualization, and built a moving average trading strategy!Image taken from a random YouTube profile [5]When you sit back and think about what you covered, it was really a lot.

You learned how to import data from CSV’s, make dataframes, look at rows, look at data points, create new columns, a few statistical functions, and plotting them for a simple moving average trading strategy.

I understand I explained more on the data science side, than the finance side.

If anything is unclear, comment below or send me a message!In future articles, we’ll go over more coding walk-throughs with a mix of data science business ideas.

Special thanks to Dr.

Wan, I modified ideas from one of his courses to build this article [6].

Disclaimer: All things stated in this article are of my own opinion and not of any employer.

Investing carries serious risk.

Please consult your financial professional before making any investment decision.

Lastly, this post contains affiliate links and thank you for your support.

[1] C.

Newport, Deep Work: Rules for Focused Success in a Distracted World (2016), Grand Central Publishing[2] NIST, Introduction to Time Series Analysis, https://www.

itl.

nist.

gov/div898/handbook/pmc/section4/pmc4.

htm[3] Trifacta, What is Data Wrangling?, https://www.

trifacta.

com/data-wrangling/[4] Investing.

com, Invesco QQQ Trust Historical Data, https://www.

investing.

com/etfs/powershares-qqqq-historical-data[5] TT KR, YouTube Profile, https://www.

youtube.

com/channel/UCkAqayQ4Q8XD6CwWSxD6DbA[6] X.

Wan, Build a simple trading strategy, https://www.

coursera.

org/learn/python-statistics-financial-analysis.. More details

Leave a Reply