Pandas and SQL together, a Premier League and Player Scouting Example

Pandas and SQL together, a Premier League and Player Scouting ExampleStephen FordhamBlockedUnblockFollowFollowingMay 14Pandas, SQL, Excel data transferThe Python Pandas library makes transferring tabular data from one platform to another simple.

This tutorial aims to show you how to move data between a Pandas DataFrame, an SQL database and a Excel spreadsheet, all through the use of Pandas.

So wherever your data initially resides, you can transfer it into a format you feel comfortable analyzing and manipulating.

Converting/Reading an SQL Table into a Pandas DataFrameIt is often necessary to shuttle data from one platform to another.

It may be useful to transform an SQL Table/Query into a Pandas DataFrame.

In order to begin, as a prerequisite, 3 modules must be installed.

These include; pandas, sqlalchemy, and PyMySQL.

Here, the Premier League Table is in the ‘football’ Table within MySQL.

The goal is to transform this Table into a Pandas DataFrame.

To start, import the Pandas module as the alias pd, following convention, and import the sqlalchemy module.

Following this, it is necessary to make a connection between Python and the MySQL database.

Fortunately, the sqlalchemy module provides a function called .

create_engine() that enables this connection to be established.

The string inside the .

create_engine() function is called the connection string.

The create_engine() function produces an Engine object based on a URL.

These URLs are usually formatted with a username, password, hostname, and a database name.

The typical format of a database URL is:For clarity: ‘mysql+pymysql’ indicates that I am using mysql as my database management system, and functions within the module pymysql to interface with this database management system (DBMS).

username = root, password =****, hostname = localhost, and database name = football_db.

To match the Engine Configuration using your particular DBMS, you should consult https://docs.

sqlalchemy.

org/en/13/core/engines.

html for the connection string formatting specifics.

Now, when I run the code in my Jupyter Notebook, the DataFrame appears.

To confirm, I can simply use the type function which indicates I have a Pandas DataFrame.

A useful question here would be; why would we want to transform the SQL football Table into a Pandas DataFrame?To illustrate, one of many possible reasons, it is now possible to style the DataFrame and thereby enhance its readability for intended audiences.

Here, two useful functions are defined; one which colours the goal difference column using conditional formatting, green if the goal difference is positive and red if it is negative.

We can even go one step further, and define a function which magnifies the cells as the user scrolls over them.

This would not be possible in MySQL.

Reading an SQL Query into a Pandas DataFrameTo read a query from SQL into a Pandas DataFrame, we first write the query in its native language, here, SQL.

I have assigned this query the variable name query_1, and as it extends over multiple lines I have included the query within triple quotes.

This query can then be read by adding its variable name and the engine as parameters to the pd.

read_sql_query() method.

As shown below, the query is now a Pandas DataFrame.

Exporting the Football DataFrame to an Excel fileExporting the newly created DataFrame is possible using the pd.

to_excel() method.

Specify the filename of your choosing, here I chose ‘goals_game_stats.

xlsx’, as this seems appropriate.

Include the filename extension .

xls if you have the xlwt module installed or the .

xlsx extension if you have the openpyxl module installed.

Its seems unnecessary to include the index specified with the Pandas DataFrame, so I simply set the optional parameter index to False, and the index is then subsequently omitted from the corresponding Excel file.

The Pandas Parsers really do make data transfer that bit easier!From Excel to MySQL using Pandas… and a bit of scouting for Barcelona F.

CTo go full circle, I thought I would finish this tutorial, by demonstrating how Pandas can be used to export an Excel File to MySQL for querying.

As all the examples have been based on the Premier League, I think it would be good to use a new data-set.

To this end, lets investigate the best football players in the world.

Here is a brief look at the players and some of the columns in the Excel Spreadsheet.

The data-set below was sourced from Kaggle.

I save this data-set as a comma separated values (CSV) file, and use the pd.

read_csv() method, with the encoding set to ‘latin-1’ to parse the Excel file.

To confirm parsing has worked, I use the head method on the players_data DataFrame.

The DataFrame displayed below skips the middle columns.

If we want to check all 55 columns are present, a nice method to use is the pd.

set_options() where I set the ‘display.

max_columns’ to 60 to be inclusive of all my columns.

The video now shows all the columns are included, and have been correctly parsed.

I can now connect to my ‘football_db’ through my Python Script.

I then write a query to create a Table.

The Table name player_name seems appropriate.

Following this, I specify the columns and their corresponding datatypes, before executing this query using my cursor.

Now the magic!I use the DataFrame.

to_sql () method, in my case, players_data.

to_sql() method, and write this DataFrame to the newly created Table, ‘player_data’, connect with the engine mentioned earlier, omit the index, and set the ‘if_exists’ argument to ‘append’ to add the data to the Table.

The Table that was originally from Excel has now been imported into the MySQL database.

It is perfectly set up for querying now.

The Best Players, Best Defenders and Expensive Barcelona Targets…Let’s see who the 5 best players in the World are based on their ‘Overall’ and ‘Potential’ metrics.

Or maybe we want to know who are the most impressive defenders.

Ideally, the ideal defender should have good heading accuracy, reactions and marking attributes!With Barcelona’s shock exit from the Champions League at the hands of Liverpool, summer recruitment will be a key objective at the Catalan Club.

As I have given myself temporary head-scout status at Barcelona, it would be ideal to recruit a player with core Barcelona-esque attributes.

I therefore need to query for a player that has a high work rate, is attacking minded, and has Finishing and Acceleration beyond 80.

To run this query, I omit Messi, since he is already at Barcelona, and exclude Ronaldo on the basis he is destined to stay at Juventus F.

C.

Barcelona, if you need a chief-scout please get in touch!The query results turn up 5 players.

With Barcelona rekindling their interest in Antoine Griezmann this summer to boost their Champions League ambitions next year, he could prove to be a costly recruit at a cool €78 M Euros according to our data-set.

SummaryThis tutorial has focused on how tabular data can be moved between an SQL database, a Pandas DataFrame and Excel.

The Python Pandas library makes it simple to move data, in particular I like using Pandas when I need to import data from a DataFrame/SQL database to Excel.

Hopefully, this tutorial has demonstrated how data transfer can be achieved.

Let me know if you like the examples presented here, or whether I should switch to more financial/healthcare based examples in forthcoming tutorials.

.. More details

Leave a Reply