Jazz & Bossa Nova: Siblings (?)

And vice-versa?What chords are common to Jazz and Bossa Nova?1.

Importing LibrariesFirst off, let’s import the required libraries: Matplotlib and Seaborn will be imported for data visualization; Pandas, for data analysis; Bs4 and Requests, for web scraping.

import matplotlib.

pyplot as pltimport seaborn as snsimport pandas as pdimport requestsfrom bs4 import BeautifulSoup2.

Web ScrapingNow that the libraries have been imported, which songs will we scrape from the chords site, and how do we create a mechanism that scrapes many at once?Luckily, the site has a ‘most accessed’ page with the 100 most accessed songs for every genre.

In every genre, each one of the 100 songs has a link which leads to a page with its chords.

This is perfect.

All we need to do is scrape the 100 urls in the bossa nova and the jazz page, and then scrape the chords for each of the 200 songs.

url_list = []genres = ['bossa-nova','jazz']for genre in genres: url = 'https://www.

cifraclub.

com.

br/mais-acessadas/'+genre+'/' site = requests.

get(url) soup = BeautifulSoup(site.

text) songs = soup.

find('ol',{'id':'js-sng_list'}) for link in songs.

find_all('a'): links = link.

get('href') url_list.

append('http://www.

cifraclub.

com.

br'+links)With this piece of Python code, we now have a list with 200 links — the first half containing the 100 most accessed bossa nova songs, while the second half has the 100 most accessed jazz songs.

Now, let’s scrape the chords from each one of these songs.

This is analogous to what we did before, except now we’re interested in the chords’ text, and not in links.

all_chords = []genres_urls = [url_list[0:99],url_list[100:198]]for genre_urls in genres_urls: chords_list = [] genre_chords = [] for url in genre_urls: site = requests.

get(url) soup = BeautifulSoup(site.

text) chords_raw = soup.

find('pre') for chords in chords_raw.

find_all('b'): chord = str(chords).

strip('<>b/') chord = chord.

replace('-','m').

replace('<b>','').

replace('</b>','').

replace('/','') #Continue next line chord = chord.

replace('(','').

replace(')','').

replace('+','M').

replace('dim','°').

replace('maj','M') chords_list.

append(chord) genre_chords.

append(chords_list) all_chords.

append(genre_chords)The all_chords is a list of a list with all the chords in the 100 bossa nova songs as its first element, and all the chords in 100 jazz songs as its second element.

I’ve also standardized the notation, by removing any <b> or </b> that might have come from the html.

Also, I’ve removed parentheses and slashes and changed pluses and minuses to ‘M’ and ‘m’, so that the algorithm considers both A7/9 and A7(9), and D7+ and D7M the same thing.

I had a little problem here, because, for some unknown reason, 1 out of the 200 urls led to a page with no chords, but rather a guitar software file.

Since it was just one url, I decided to remove it.

It was the 200th song, so I just changed url_list[100:199] to url_list[100:198].

3.

Making a Python DictionaryNow that we’ve web scraped and standardized every chord in every song in both genres, let’s find out how many times each chord comes up.

def freq_chords(chords_list): dic = {} count = 0 for unique_chord in [chords for chords in set(chords_list)]: for total_chords in chords_list: if unique_chord==total_chords: count += 1 dic[unique_chord]=count count=0 return dicbossa_dic = freq_chords(all_chords[0][0])jazz_dic = freq_chords(all_chords[1][0])4.

Pandas Data FrameNow, let’s turn each dictionary into a Pandas’ DataFrame.

For aesthetics’ sake, I’ve renamed the occurrences column ‘Count’.

bossa_df = pd.

DataFrame.

from_dict(bossa_dic,orient='index')bossa_df.

columns = ['Count']jazz_df = pd.

DataFrame.

from_dict(jazz_dic,orient='index')jazz_df.

columns = ['Count']5.

Data AnalysisNow, let’s answer the questions in the beggining of the post.

What are the most common chords in each genre?bossa_df.

sort_values(by='Count',ascending=False,inplace=True)jazz_df.

sort_values(by='Count',ascending=False,inplace=True)f, axes = plt.

subplots(1,2,sharey=True,figsize=(10,5))ax1 = sns.

barplot(x=bossa_df[:10].

index,y=bossa_df[:10].

Count,ax=axes[0])ax1.

set_title('Bossa Nova')ax2 = sns.

barplot(x=jazz_df[:10].

index,y=jazz_df[:10].

Count,ax=axes[1])ax2.

set_title('Jazz')for ax in f.

axes: plt.

sca(ax) plt.

xticks(rotation=90)plt.

show()Most common chords in each genreInsight 1) As we can see, the most common chords in Bossa Nova are minor 7th chords.

In Jazz, they are mostly natural major chords;Insight 2) Furthermore, the most common chord in Jazz is almost twice as frequent as the most common chord in Bossa Nova;How many different chords does each genre have?print(bossa_df.

count())print()print(jazz_df.

count())print(bossa_df.

std())print()print(jazz_df.

std())Insight 3) Bossa Nova songs have more chords than Jazz songs;Insight 4) Jazz chords have a much bigger standard deviation.

This is because some chords are so frequent in Jazz — the most common chords in Jazz are almost twice as common as the ones in Bossa Nova — , that other chords’ occurrences deviate much more from the mean;What chords occur more in jazz and less in bossa nova?.And vice-versa?Let’s create a new DataFrame which will be the difference between the bossa nova chord occurrences and the jazz chord occurences.

We’ll drop any NA values, because they represent chords which only exist in one genre.

dif_df = bossa_df – jazz_dfdif_df.

columns = ['Difference']dif_df.

sort_values(by='Difference',inplace=True)dif_df = dif_df.

dropna()We now have a column in the DataFrame which tells us how more frequent is one chord in Bossa Nova rather than in Jazz.

The closer to 0, the more they occur just as much in both genres.

f, axes = plt.

subplots(1,2,figsize=(10,5),sharey=True)sns.

barplot(dif_df[171:].

index,dif_df[171:].

Difference,ax=axes[1])sns.

barplot(dif_df[:10].

index,dif_df[:10].

Difference,ax=axes[0])for ax in axes: plt.

sca(ax) plt.

xticks(rotation=90)Here we have the top and the bottom 10 chords.

The graph on the left tells us what chords occur the most in Jazz and the least in Bossa Nova.

On the right, we have the opposite.

Insight 5) Jazz not only uses more natural chords by itself, but also a lot more than Bossa Nova.

Insight 6) Bossa Nova almost never uses natural minor chords, but rather prefers to always throw in an extension (7th, 9th, etc.

).

What chords are common to Jazz and Bossa Nova?I’ll impose a 10 occurrences tolerance, i.

e, one chord can occur no more than 10 times more in either Jazz or Bossa Nova.

common = dif_df[dif_df[‘Difference’].

isin(range(-10,10))]So, the amount of chords which fit this condition is pretty big.

The DataFrame has 107 rows (songs), so we need to find a way to either filter or categorize these chords.

Let’s try to separate them into chords with 6th, 7th, 9th, 11th, 13th, and diminished chords, and see the data through this lens.

contains6 = common[common.

index.

str.

contains('6')]contains7 = common[common.

index.

str.

contains('7')]contains9 = common[common.

index.

str.

contains('9')]contains11 = common[common.

index.

str.

contains('11')]contains13 = common[common.

index.

str.

contains('13')]containsdim = common[common.

index.

str.

contains('°')]I’ll now add up the amount of occurrences of chords with each of the extensions (6th, 7th, etc.

).

Keep in mind that a chord, say G7/9, will be both in ‘7th’ and in ‘9th’.

sns.

barplot(x=['6th','7th','9th','11th','13th','Dim'], y=[len(contains6),len(contains7),len(contains9),len(contains11),len(contains13),len(containsdim)])Insight 7) The 7th is by far the most frequent extension in these chords which Jazz and Bossa Nova have in common.

What about inversions?.How many of these chords in common are inversions?inversions = []for i in range(0,len(common)): for letter in ['A','B','C','D','E','F','G']: if common.

iloc[i].

name.

find(letter)!=-1: for letter2 in ['A','B','C','D','E','F','G']: if common.

iloc[i].

name.

find(letter2)!=-1: if letter != letter2: if common.

iloc[i].

name not in inversions: inversions.

append(common.

iloc[i].

name)Let’s plot the graph again, adding the inversions.

sns.

barplot(x=['6th','7th','9th','11th','13th','Dim','Inv.

'], y=[len(contains6),len(contains7),len(contains9),len(contains11),len(contains13),len(containsdim),len(inversions)])Insight 8) Both Jazz and Bossa Nova really like inversions!ConclusionDespite both genres being known for complex harmonies, clearly bossa nova seems to have not only more chords, but also more chords with extensions.

I doubt this means bossa nova is harmonically more complex, at least not significantly.

I’d bet the chord names in jazz are just not as deterministic as in bossa nova.

You are free to play D7 or a D7/9 in place of a D; the composer just won’t dictate you on which one.

If true, this makes sense, as jazz has very strong improvisation component, which isn’t as present in bossa nova.

More over, when choosing between a G and a G9, or a G7/13, bossa nova composers will apparently tend to refuse just a plain G.

Of all the extensions in bossa nova, the minor 7th is clearly the most popular, while plain minor chords are very unpopular.

If you play Am, and not Am7, it will most likely sound strange.

Minor 7ths are as common as they are expected in bossa nova.

Finally, inversions are also very popular both among jazz and bossa nova.

They are great for giving different presentations for a chord.

D and D/F# are essentially the same, but are ‘heard’ differently.

This is useful for both genres, which look for ways to reinterpret chords.

Entire code.

. More details

Leave a Reply