Name Your Favorite Excel Function and I’ll Teach You its Pandas Equivalent

(source)By combining the INDEX/MATCH formulas, you can get around the restrictions of a VLOOKUP.

In a nutshell, VLOOKUP scans the leftmost column of a given range until it lands on the first instance of a given “search_key”.

It then returns the value of the cell in the index provided, which will always be to the right.

But again, as I said above, with INDEX/MATCH you can search a whole spreadsheet (by column or by row) for values instead of being forced to only search the left-most column with a VLOOKUP.

Check out this monthly podcast dashboard I built for Recode while working as a Traffic Analyst at Vox Media to see how I leveraged a SMALL function nested in a INDEX/MATCH function to display episode-level data.

Now let’s move on to the fun part… how to do this with Pandas!I’ll show you four ways to code this in Python leveraging the Pandas library, but first, let’s get some context on our goal.

We have two DataFrames named df_1 and df_2 (seen below).

df_1df_2Our goal is to match the last_name column in df_2 with the correct rows in df_1 to get something like this:Solution 1: Using .

map()code: df_1[‘last_name’] = df_1.

id.

map(df_2.

set_index(‘id’)[‘last_name’].

to_dict())Here we create a new column, last_name, for df_1.

The values of that new column are created by mapping the ids from df_1 to the last names in df_2 where those ids match.

By calling the .

to_dict() method on our series df_2.

last_name we get a key : value pair of the index which would be: {0: ‘Snow’, 1: ‘Targaryen’, 2: ‘Tyrell’, 3: ‘Lannister’, 4: ‘Stark’}.

In this case the .

map() method would not work, as the values in our id column for df_1 do not match the keys in this dictionary.

So we passed the .

set_index() method first to change the index to the ids of df_2 which would give us this dictionary: {1003: ‘Snow’, 1004: ‘Targaryen’, 1005: ‘Tyrell’, 1002: ‘Lannister’, 1006: ‘Stark’}.

We can now use the .

map() method to map the values of our Series (id) to the corresponding values of our new dictionary.

A fellow data scientist, Bernard Kurka, helped me figure out this first solution.

Check out some of his Medium posts too.

Solution 2: Using .

join()code: df_1 = df_1.

join(df_2.

last_name)The .

join() method will join columns of another DataFrame.

By default it does a left join, but you can specify this by passing the how= hyperparameter and changing it to ‘right’, ‘outer’, or ‘inner’}.

Solution 3: Using .

merge()code: df_1 = df_1.

merge(df_2, how=’left’, on=’id’)The .

merge() method merges a DataFrame or named Series objects with a database-style join.

Here I specified how I wanted it joined with how= and on what column/index by specifying the on= hyperparameter.

Solution 4: Using pd.

concat()code: df_1 = pd.

concat([df_1, df_2.

last_name], axis=1, join=’inner’)The pd.

concat() method concatenates pandas objects along a particular axis with optional set logic along the other axes.

So here I passed the pandas objects I wanted concatenated as a list.

The first being a DataFrame and the second being a Series.

By setting axis=1, I specified the axis to concatenate along, with 1 pertaining to columns and 0 to index (rows).

The join= hyperparameter specifies how to handle indexes on other axis(es).

I think it is really cool that we’ve seen there are several ways to do the exact same thing.

If you ever find yourself needing to use one of these methods I hope you can reference this as a guide on how to do so.

I’d love to turn this into a series where I teach you how to do more things you love doing in Excel with Python and libraries like Pandas, so let me know in the comments below what you’d like to see next.

As always, thank you for reading!.

. More details

Leave a Reply