Data Science and Political Risk: What alternative data might be telling us about Trump, Venezuela, Cuba, M. Rubio

Are things in Venezuela reaching a break-point?This is the sort of things you can analyze for hedge funds and that present a great deal of opportunities for machine learning and artificial intelligence tools.

If you look around, most finance related machine learning projects out there are just "financial technical analysis systems" rebranded as "artificial intelligence" by people with no domain expertise in finance.

But the incorporation of news analysis, court fillings, people's opinions around key political issues, activism, etc.

all have value when properly analyzed.

And in my opinion, there is a much better "return" when used to analyze debt than equity.

Anyway, since I still know a lot about what's going on in Venezuela from several traditional and non traditional sources, it looks to me that this Connecticut hedge fund might be making the right move.

In this post I will not analyze thousands of sources (like the projects I have ran in the past, etc.

) but a single source, for simplicity sake’s and due to time constrains.

From idea, to writing code, to Tableau visualizations, to graphs, to writing this story, I did not want to dedicate more than 4 hours.

The source is change.

org, which presents many advantages (and some disadvantages).


orgAccording to their web site, Change.

org is the fastest growing social change platform in the world, empowering more than 200 million people to create change in their communities.

People on Change.

org work with decision makers to find new solutions to the big and small issues that impact their lives.

According to them, they have had 33,409 victories in 196 countries.

I chose this site, because it more or less captures in a transparent way, the intent of people like you and me towards a cause.


org search.

297 results containing the word Venezuela.

Examples of unstructured data in this pageHowever, since petitions are pretty much free form in this site, a great deal of NLP needs to be done to disambiguate terms and structure the data.

For example UN = ONU (UN in Spanish).

Also, since petitions can be written in any language, we have an additional challenge.

The project is as follows:Data collection & cleaningTranslation and disambiguationLabelingVisualizationAs I learned when I thought about this over the weekend, Change.

org used to have an API, which was been deprecated.

Therefore, I had no other way to get the data but to code a Python scrapper and use BeautifulSoup.

The cleaned and disambiguated data set as of March 2, 2019 is in this GitHub repository, as well as the scrapper I wrote, which is self explanatory.

Python BeautifulSoup codeBelow is a cut & paste (looks like the formatting in Medium in not correct).

You can run this code and pass the Python dictionary for importing into a pandas data frame.

The exported data, with “Pro Maduro”, “Against Maduro”, petition relevance and disambiguated entities is in the repo.

If you know some basic NLTK and SciKit Learn, you can do your own classifier and disambiguation, which I don't cover here.

from selenium import webdriverfrom bs4 import BeautifulSoupimport pandas as pdimport requestsimport numpy as npclass Petition(): """Features in the web page.

"""def __init__(self): """Features in summary of search.

""" self.

link = "" self.

to = "" self.

body = "" self.

origin = "" self.

supporters = "" self.

created = "" self.

status = "" self.

title = "" self.

image = "" self.

creator = ""petition_list = []url_seed = 'https://www.


org'url_action = '/search?q=Venezuela'url = url_seed + url_actionpage_data = requests.

get(url)menu = BeautifulSoup(page_data.

text, 'lxml')pages_list = []for i, page in enumerate(menu.

find_all('a', class_='phxxxs js-pagination-link')): pages_list.

append(int(page['data-page-number']))pages_to_parse = np.

array(list(range(0, max(pages_list))))offsets = pages_to_parse*10pt = Petition()data_dict = {}j = 0for page in offsets: options = webdriver.

ChromeOptions() options.

add_argument(f'headless') driver_chrome = webdriver.

Chrome('path_to_your_chrome_driver', options=options) url = url_seed + url_action + '&offset=' + str(page) driver_chrome.

get(url) arepas = BeautifulSoup(driver_chrome.

page_source, 'lxml') for i, arepa in enumerate(arepas.

find_all(class_='search-result')): try: new_link = url_seed + arepa.

find('a', class_='link-block js-click-search-result')['href'] pt.

link = new_link new_to = arepa.

find('div', class_="type-s").

text new_to = new_to.

split('Petition to ')[1] pt.

to = new_to pt.

title = arepa.

find('h3', class_="mtn mbn prxs xs-mbs").

text pt.

image = arepa.

find('div', class_="flex-embed-content flex-embed-cover-image ") pt.

image = 'http:' + pt.


split("url('")[1][:-3] pt.

origin = arepa.

find('li', class_="type-ellipsis mrs").



strip() search = arepa.

findAll('ul', class_="hidden-xs list-inline type-s type-weak") for x, _ in enumerate(search[0].

findAll('li')): if x == 0: pt.

creator = _.




title() if x == 1: pt.

supporters = _.


split('supporters')[0] pt.

supporters = int(pt.


replace(',', '')) if x == 2: pt.

created = _.



strip()# bare except, I know except: continue data_dict[j] = {'link': pt.

link, 'to': pt.

to, 'title': pt.

title, 'image': pt.

image, 'origin': pt.

origin, 'creator': pt.

creator, 'supporters': pt.

supporters, 'created': pt.

created} driver_chrome.

close()data_dictdf = pd.


transpose()df = df.

sort_values(by='supporters', ascending=False).



xls')Visualization of resultsThe relevant petitions in change.

org were classified using a binary classifier, “Against Maduro” and “Pro Maduro”.

Since the corpus is small, manual tagging was preferred.

A petition to, for example, the “International Court of Justice” saying “Investigate the Crime of X”, was classified as “Against Maduro” since it is about the lack of trust in the Venezuelan judicial system, etc.

Volumes and Time SeriesThe result spans petitions from Q2 2012, to now.

Distribution of “Pro Maduro” vs “Against Maduro”If we look at the time series of the classified petitions, another insight emerges, with a strong “Pro Maduro” push showing up in 2019 that we have not seen since Q1–2015.

Distribution of “Pro Maduro” vs “Against Maduro” petitions over timeIf we look at the relative sizes of the petitions and their originators (person and location), we see an interesting insight:The petition with the most supporters in a petition “Against Maduro”, addressed to the International Court of JusticeIf we “zoom in” into the “Pro Maduro” petitions, we see something interesting:The largest petition does not have a valid originator, and the other ones are mostly originated by “Robert Naiman”, who does not sound like a Venezuelan name.

Origin of petitionsLet’s look at a summary table containing “Pro-Maduro” and “Against Maduro” data.

Pro Maduro TableOf 10 “Pro Maduro” petitions, we see that the largest was originated in Venezuela, but has an anonymous originator.

Furthermore, we see that of the remaining 9, 3 were originated in Washington D.


, by somebody called “Robert Naiman”, and account for practically all the push we see in 2019, in addition to the “Hands Off Venezuela”, with only 416 signatures and originated by someone called “Alex Campbell” in the United States.

(For a video of Maduro’s kick-start of his “Hands off Venezuela” campaign, click here.

)These petitions don’t look to me as organic, sincere petitions concerning the issues of Venezuela, but an anti-Trump political issue.

Upon further investigator, Robert Naiman appears to be some sort of lobbyist based in Washington D.


According to Wikipedia, "Robert Naiman is Policy Director at Just Foreign Policy and on the board of directors at progressive news organization Truthout.

He has master's degrees in Economics and Mathematics from the University of Illinois.

He writes on U.


foreign policy at Huffington Post and is a frequent commentator on the region's events".

In contrast, the “Against Maduro” petitions look very organic, originated mostly by Venezuelans in Venezuela and all over the world.

The charts contains many more entries than the ones displayed.

Against Maduro petitions are mostly originated in Spain, Venezuela, United States, Ecuador, Canada, Trinidad and Tobago; while the “Pro Maduro” petitions are mostly originated in Venezuela but anonymously) or in the United States via Robert NaimanTone of petitionsThis was an interesting aspect to explore, given the time constrains I set to myself.

So the easiest way I thought to generate some knowledge that captures the meaning of the petitions is to generate a word could weighted by the number of signatures.

Although I could have used custom python code to generate the word cloud, I used this service.

These are the steps.

Concatenated string title of petition times the integer of its relative frequency + 1Eliminated stop wordsEliminated the word “Venezuela”Capped words with high frequency (trial and error), until the word cloud looked fine.

This is the result.

Crime, Court, Legal, Justice, etc.

are the most relevant terms in the “Against Maduro” chart.

Almost all the words look relevant to me, but I noticed the words “Elephant” and “Ruperta which did not make sense.

But on further inspection of the content, it makes sense now(read the section “Other”, below).

Against Maduro Word Could: A cry for help evolving over time.

Words such as Crimes, Help, ICJ, Justice, Court, UN, Legal, Starving stand out.

Pro Maduro Word Cloud: A very authoritative “tone” concentrated in 2015 and 2019.

Words such as Immediately, Must, Stop, Surrender, Executive, Congers, War, etc stand out.

To whom are the petitions addressed?It looks like the US has a lot of weight in the “Pro Maduro” and “Against Maduro” side, with a classical Pareto distribution on the names below.

Screenshot of Pandas groupby.

Left, “Pro Maduro” petitions.

Right, “Against Maduro” petitionsAccording to my metrics, the “Against Maduro” weighted petitions were addressed to this group: ICJ, OAS, Secretary-General of Organization of American States, Luis Almagro, President of the United States, Senator Marco RubioPresident of the United States, U.


House of Representatives, U.


Senate, U.


Secretary of State, Australian ParliamentA picture is worth a thousand wordsImages used to promote the petitions in social media, blog, text messages, etc.

convey a lot of information that could be use the enhance the knowledge I extracted from text based information.

Although the use of computer vision to extract knowledge from this corpus in beyond the scope of this article, below are the links of images used in the top 5 “Pro Maduro” and “Against Maduro” petitions (which can give you and idea of why “International Court of Justice” is the top entity where Venezuelan are addressing their issues).

warning: Some of the images can be very graphic.

“Against Maduro” petitions (Mostly written in Spanish):Corte Penal de La Haya investigue al gobierno venezolano por crímenes de Lesa Humanidad (International Criminal Court: investigate the Venezuelan government for crimes against humanity)Ciudadanos del mundo defendiendo a Venezuela ONU/CPI/OEA (Citizens of the world defending Venezuela UN/ICJ/OAS)Venezuela-Justicia Por la Violación de Derechos y tratados en la masacre de Oscar Perez (Venezuela-Justice against the violation of human rights and treaties in the massacre of Oscar Perez)Investiguen Gobierno de Maduro por crímenes de Lesa Humanidad (Investigate government of Maduro for Crimes against humanity)Solicitud de #IntervenciónMilitarYa o #InjerenciaHumanitariaYa en Venezuela (Request for military intervention now, or humaniarian intervention now)“Pro Maduro” petitions (mostly written in English):Retiro inmediato de la Orden Ejecutiva en contra de Venezuela (Immediate withdrawal of the Executive Order against Venezuela)@SenatorDurbin: Resist Trump’s unconstitutional regime change war on VenezuelaCongress: Explicitly prohibit war in Venezuela without Congressional authorizationTell @CoryBooker & Congress to Block Trump’s Regime Change War on VenezuelaUS Hands off VenezuelaOtherAs I mentioned earlier, the words “Ruperta” and “Elephant” showed up as semantic outliers in the word cloud.

This petition, which gathered a lot of momentum when it was launched on April 3, 2017 was about helping Ruperta, a starving elephant in Caracas’ Caricuao Zoo.

All zoos in Venezuela are administered by the socialist state, and they all have reached deplorable conditions (what can you expect when people are eating out of the garbage and leftovers of the ruling government leaders?) Mr.

Webb Daulis, from Clearwater, Florida, obviously someone who cares about animals and animal rights, wanted to do something about it (You can read his petition here).

Unfortunately, Ruperta died a few days ago.

Ruperta the elephant was not the only animal suffering from starvation in socialist Venezuela.

Many have been eaten by the hungry population.

Below are some desperate calls for action in Venezuela from animal lovers.

You can’t imagine how deeply sad this makes me.

This is just criminal.

Ruperta the elephant, tigers and lions affected by the incompetent regime in VenezuelaMy conclusionsNow, I personally feel that “there is something” about mainstream traditional media in the US, when they choose to cover a topic with a lot more emphasis than another.

It feels to me that US news outlets have made a bigger deal of the US President meeting the dictator on the right, compared to the US President meeting the dictator on the left.

And to leave aside doubts about my political leaning, I voted for Barack Obama in 2008 and again in 2012About the Venezuelan issues, well, I am biased: I think Venezuela is a failed socialist experiment.

I hope this “Data Science” experiment and and its background information allows you to form your own opinion about the subject.

I am curious about what you think.

Is the US doing the right thing?I personally think that the strategies set up by John Bolton and Mike Pompeo are very smart, and the key to free and transparent elections in Venezuela, with Juan Guaido as interim President, is to exert pressure on Cuba.

This might be the key:There are a lot of things I want to cover in this post (from the data science, programming, and political point of view), but I need to keep it the content short.

As it is, it is already very long and covering 2–3 different fields.

Nevertheless, I would love to hear from you with whatever Qs you have.

Please, clap, share and comment below.

I also would like to take the seed search code in my repo and make it my first Open Source (all my repos are private) project around content understanding around political issues like this one.

You can follow activity in my private repos here.


. More details

Leave a Reply