Ranking Programming Languages by Wikipedia Page Views

Ranking Programming Languages by Wikipedia Page ViewsA demonstration of using APIs and plotting in pythonBryant CrockerBlockedUnblockFollowFollowingApr 6There are many different ways to rank the popularity of programming languages.

The Tiobe Index is a popular site that ranks programming languages by popularity.

I’ve also seen articles where people look at counts of job listings or counts of Stack Overflow questions to rank the popularity of programming languages.

Today I am going to make use of the Wikimedia API to rank a list of programming languages by there total page views since the start of 2018.

Luckily, python comes with the handy, built in, requests library, which makes getting data from APIs extremely easy.

I defined a simple function to get pageviews data from pageviews API.

API request function:import requestsimport numpy as npimport pandas as pddef wikimedia_request(page_name, start_date, end_date = None): ''' A fucntion that makes requests to the wikimedia pagecveiws APIParameters ———- page_name : string A string containing the name of the wikipeida page you would like pageviews for start_date : string a date string YYYY/MM/DD indicating the first date that the request should return end_date : string a date string YYYY/MM/DD indicating the last date that the request should return.

defaults to system date Returns ——- df : pandas DataFrame A dataframe with the article name and the number of pageviews.

''' # get rid of the / in the date sdate = start_date.

split("/") # join together the text that was split sdate = ''.

join(sdate) # if an end date is not specified if end_date == None: #get the current date end_date = str(datetime.

datetime.

now())[0:10].

split("-") edate = ''.

join(end_date) else: # use date from end_date argument edate = end_date.

split("/") edate = edate[0] + edate[1] + edate[2] # use these elements to make an api request r = requests.

get( "https://wikimedia.

org/api/rest_v1/metrics/pageviews/per-article/en.

wikipedia.

org/all-access/all-agents/{}/daily/{}/{}".

format(page_name,sdate, edate) ) # get the json result = r.

json() # convert to dateframe df = pd.

DataFrame(result['items']) # the wikimedia api returns 2 extra zeros at the end of the timestamp for some reason df['timestamp'] = [i[:-2] for i in df.

timestamp] # convert to datetime df['timestamp'] = pd.

to_datetime(df['timestamp']) # set timestamp as index df.

set_index('timestamp', inplace = True) # return the article and views columns return df[['article', 'views']]Once this function is defined it can easily be used in conjunction with a list comprehension to create a dataframe of page views for multiple pages.

I define a list of 20 programming languages that I am interested in (admittedly a somewhat random list, this post is mostly to demonstrate a technique), and then do exactly that.

names = ['Python (programming language)', 'R (programming language)', 'Java (programming language)', 'Scala (programming_language)', 'JavaScript', 'Swift (programming language)', 'C++', 'C (programming language)', 'Clojure', 'C Sharp (programming language)', 'F Sharp (programming language)', 'Julia (programming language)', 'Visual Basic .

NET', 'Perl', 'Haskell (programming language)', 'Go (programming language)', 'Ruby (programming language)', 'PHP', 'Bash (Unix shell)', 'TypeScript']dfs = pd.

concat([wikimedia_request(x, start_date) for x in names])I personally love python list comprehensions because they allow for clean, readable code.

Most of the things in this article would take much more code in other languages.

Graphing Popularity:Now that I have the data on programming languages, I can use pandas and matplotlib to create a simple plot to see which languages have had the most search volume over since the start of 2018.

means = dfs.

groupby('article').

mean()['views']means.

sort_values(ascending = False).

plot(kind = 'barh', color ='C0', figsize = (12,6))plt.

title('Total Wikipedia Page Views')sns.

despine()plt.

xlabel('Count of Page Views')plt.

ylabel("")plt.

show()In most language ranking lists python, java , javascript and C fight for the top spotIt looks like from my data, python is the most popularPython use has been growing over the last few yearsThis is good for me, because python is the main language I useI’m also interested in the trends in monthly search volume for each of the language’s page.

Which language are showing an upward trend in interest?.Which languages are showing a downward trend in interest.

I’d like to just make a simple regression plots to evaluate temporal trends in language interest.

This is a bit hard to do in python because seaborn’s regplot does not accept datetime objects.

I’ve adapted code from this stack overflow answer to create regression plots with time series in python.

def tsregplot(series, ax = None, days_forward = 10, color = 'C0'): ''' A fucntion that makes requests to the wikimedia pagecveiws APIParameters ———- series : Pandas datetime index Series A pandas Series with datetime index ax : matplotlib axes object A matplotlib axes obect days_forward : int An integer indicating how many days to extend the regression line.

This is set at 10 by default so that all points can show.

increasing it can be used to forecast regression line foreward color : string A matplotlib color string Returns ——- ax : matplotlib axes object returns a matplotlib axes object with regplot ''' series = series.

reset_index() series.

columns = ['date', 'value'] if ax == None: series['date_ordinal'] = pd.

to_datetime(series['date']).

apply(lambda date: date.

toordinal()) ax = sns.

regplot( data=series, x='date_ordinal', y='value', color = color ) ax.

set_xlim(series['date_ordinal'].

min() – 2, series['date_ordinal'].

max() + days_forward) ax.

set_ylim(series['value'].

min() – 1000, series['value'].

max() + 1000) ax.

set_xlabel('date') new_labels = [date.

fromordinal(int(item)) for item in ax.

get_xticks()] ax.

set_xticklabels(new_labels) else: series['date_ordinal'] = pd.

to_datetime(series['date']).

apply(lambda date: date.

toordinal()) ax = sns.

regplot( data=series, x='date_ordinal', y='value', ax = ax, color = color ) ax.

set_xlim(series['date_ordinal'].

min() – 5, series['date_ordinal'].

max() + days_forward) ax.

set_ylim(series['value'].

min() * 0.

9 , series['value'].

max()* 1.

1) ax.

set_xlabel('date') new_labels = [date.

fromordinal(int(item)).

strftime("%m/%Y") for item in ax.

get_xticks()] ax.

set_xticklabels(new_labels) return axI can create a simple for loop to plot a time series regression plot for each of the 20 languages:fig, ax = plt.

subplots(10,2, figsize = (16, 20))ax = ax.

ravel()counter = 0for i, j in zip(dfs.

article.

unique(), names): string = i selected = dfs.

query("article == '{}'".

format(string)) tsregplot(selected['views'].

resample('M').

sum()[:-1], ax = ax[counter]) ax[counter].

set_title(j) plt.

tight_layout() counter += 1plt.

savefig('trendplots.

png')A few things to note:Based on Wikipedia page views, python interest has actually been declining.

Based on Wikipedia page views, there may be an increase in Julia interested but it looks like its more than likely noiseBased on Wikipedia page views, interest in Java is growing and interest in Scala is shrinking.

Scala was created to be a more simple and modern substitute for Java.

A friend who is a big Java user has told me that there have been more and more simple, modern, libraries coming out for java.

maybe this is why.

There is a steep growth in interest in TypeScript, this has been documented in many articlesBased on Wikipedia page views, GO interest has been increasing modestly.

GO is often said to be the python killer for server languages.

I’ve played with GO a tiny bit myself and but don’t find it very useful for what I do (mostly data science).

Conclusion:There are many different ways to rank programming languages.

I currently am unable to find anyone else that has used wikipedia page views to rank programming languages (yes someone has likely already done it, but its not common enough to find articles about it).

I think that Wikipedia page views are a pretty reasonable to gauge interest in programming languages.

When I am interested in a new language, wikipedia is usually the first place I look.

I think it is also safe to assume that many others do the same.

The code can be found here.

. More details

Leave a Reply