Python Tutorial: Connect Government Data API’s By Using The Factory Pattern

Python Tutorial: Connect Government Data API’s By Using The Factory PatternSource Politician Data from Public Government Data API’s to Increase Allocation AccuracyFelix KuestahlerBlockedUnblockFollowFollowingFeb 10In this tutorial, we do mainly two stepsFirst of all, we generate some plotly diagrams out of our collected data,secondly, we connect a public government API to our program to retrieve government members and party assignment in a reliable wayAs you already know we are interested in a generalized program, which can work with multiple countries.

For that, we will introduce an abstract class and the factory pattern in the second part of our tutorial.

Plotly Chart GenerationBut let’s start with the first quite easy task of generating two charts for our tables.

We create a bar chart as well as a pie chart.

As a base, we take our plotly table CH-tw-party-list.

The code is straightforward:In the bar chart, we visualize the accumulated friends count per party.

In the pie chart, we aggregate the twitter account per party.

As one can see in the code excerpt, various configuration parameters allow you to modify the layout of a chart.

Head over to the Plotly Python Open Source Graphing Library to find out more about the multiple possibilities with charts, panda and plotly.

As one can see we have a lot of “unknowns” i.

e.

, we couldn’t identify the corresponding party by just analyzing Twitter data elements.

In the second part of this tutorial, we will connect another data source for addressing this issue.

Government Data API FactoryPhoto by Rodney Minter-Brown on UnsplashIn recent years, the availability of so-called open Government API’s exploded.

It stems from the idea that data should be open, as Wikipedia describes the term Open Data:Open data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control….

One of the most important forms of open data is open government data (OGD), which is a form of open data created by ruling government institutions.

Open government data’s importance is borne from it being a part of citizens’ everyday lives, down to the most routine/mundane tasks that are seemingly far removed from government.

A good starting point f for getting an overview of government data API’s is the programmableweb directory, which lists over 20’00 different API’s.

The government API category can be found here:https://www.

programmableweb.

com/category/government/apis?category=20094Two examples of Government API’s:The data API of the US government: https://www.

data.

gov/developers/apisOr the Swiss Government API of the Swiss Parliament: http://ws-old.

parlament.

ch/:We will use the Swiss Parliament API’s to extract personal data of the parliament members (mainly the Party allocation) to increase the accuracy of our twitter matching algorithm.

As we have already explained in our second tutorial, a public API may return a lot of information.

To control the amount of data returned in one request, the concept of a cursor or paging mechanism is used.

The Swiss Parliament API returns about 25 records in one request.

The latest request record will have an attribute attached which tells you if there are more data available ( hasMorePages=true).

In case it sets to ‘true’ you may fetch the next page by adding the query parameter: pageNumber=2 etc.

You will find such kind of information about the API usually in its user documentation, e.

g.

, the Swiss Parliament API has some parameters to control the output format, language etc.

Having now a basic understanding of the API, we can enhance program which is capable of reading data from country-specific government API’s.

Let’s dig into the code.

Enhancing the Code — the UML DiagramIntroducing the government API in a general way needs some serious design and enhancement of our program.

The UML class diagram of our enhanced program looks as follows (don’t be overwhelmed by the complexity, all the details will be explained later in this article).

A quick summary of what we have done until now:We created the GovernmentSocialMediaAnalyzer class in the second tutorial, which is capable of retrieving twitter relevant account data of politician of a country.

We used a configuration driven based approach — based on YAML — to abstract the country-specific data into a configuration fileSeveral methods were defined which allowed us to create panda data frames, as well as plotly specific tables and charts.

Now we will introduce three new classes govAPIFactory, govAPI (an abstract class) and govAPI_CH, which will build a generalized approach for connecting any kind of government API’s.

Software Design Pattern plays an important role in Software design, as described by Wikipedia:In software engineering, a software design pattern is a general, reusable solution to a commonly occurring problem within a given context in software design.

It is not a finished design that can be transformed directly into source or machine code.

It is a description or template for how to solve a problem that can be used in many different situations.

Design patterns are formalized best practices that the programmer can use to solve common problems when designing an application or system.

In our design, we will use the Factory Method Pattern to generalize the connectivity to a government API, which is explained by Wikipedia as follows:In class-based programming, the factory method pattern is a creational pattern that uses factory methods to deal with the problem of creating objects without having to specify the exact class of the object that will be created.

This is done by creating objects by calling a factory method — either specified in an interface and implemented by child classes, or implemented in a base class and optionally overridden by derived classes — rather than by calling a constructor.

Our design will be based on the strategy, to definea base class (parent — GovAPI) which is abstract anda derived class (child — GovAPI_CH), which will have the country-specific implementation (i.

e Switzerland).

in the future, we can introduce additional classes for example for the United Kingdom we would construct the implementation class: GovAPI_UKAbstract Base Class “GovAPI”govAPI is an abstract class that contains several abstract methods.

An abstract method is a method that is declared but contains no implementation.

In Python, an abstract class is derived (or inherited) from the class ABC and will have one to more methods marked with @abstractmethod.

So the abstract class provides you with a build plan, for any implementation class which inherits from it, in our class the govAPI_CH.

What kind of methods does govAPI_CH have to implement?First of all the implementation of the load_government_members() method has to take care of the politician’s data records fetching from the government API.

Each fetched record — which represents the data of a single politician — must be passed to the method add_person_record (which is already implemented by the govAPI base class)The question now is, what the heck is the add_person_record method doing?.Well, look at the code below.

The method is just preparing a target dictionary for our personal record.

I.

e.

, the attribute names defined ( lastName, firstName, council etc.

) are the names we want to use for any GovAPI implementation.

That means our retrieved record in the form of a dedicated government API implementation (i.

e.

, in the form of the Swiss Government Parliament API) has to be transformed by using a bunch of getter methods.

Each of these getter methods is either abstract or returns an empty string.

It’s the responsibility of the implementer of an inherited class (GovAPI_CH) to provide the correct getter implementation.

Implementation Class “GovAPI_CH”The getter method implementations of GovAPI_CH is shown below.

It consists of a bunch of getter methods, which will return required attribute values out of the record.

Let’s drill down into the method load_gevernment_members:Our implementation uses the python module requests which is “an elegant and simple HTTP library for human-beings.

” In the introduction section of this article, we provided an overview of the Swiss parliaments API.

The code below will fetch the data, using the paging mechanism.

The URL and its parameter we placed in our configuration YAML file.

govAPIUrl: "http://ws-old.

parlament.

ch/" govAPICouncillorsRes: "councillors" govAPIParams: – format : "json"8: The first requests.

get will fetch all the councilors overview pages http://ws-old.

parlament.

ch/councillors?format=json&pageNumber=1.

15: In case a data record is marked as active the details record will be fetched17: The second request will use the id attributed to the record and construct the URL for the details record.

I.

e.

in this example we fetch the politician record with the id ‘1358’: http://ws-old.

parlament.

ch/councillors/1358?format=json19: The retrieved detail record we pass to the method addPerson which will transform the provided data record to the target one (by using the getters we have implemented).

20: Finally, we check for the hasMorePages attribute and in case we reached the last record we will break the loop.

The above method will be called within the govAPI function create_politican_from_govapi_table (already implemented by the govAPI parent class) which will transform the list of politician records into a Panda dataframe.

Canonical ModelIt’s important to realize here that the structure of this Panda dataframe will be the same of any kind of Government API, as long as we implement a specific class based on the govAPI abstract class.

So we normalized our data so that we can work and process it afterward in a standardized way.

Again we strived an important design pattern, our target structure (or model), is known under the name Canonical Model.

As Wikipedia describes:A canonical model is a design pattern used to communicate between different data formats.

Essentially: create a data model which is a superset of all the others (“canonical”), and create a “translator” module or layer to/from which all existing modules exchange data with other modules.

Conceptually we have built a mini-data pipeline.

For each government API, we have to implement a data record fetching function and transformation rules (the getters) which will transform the data to our standardized one.

The whole pattern visualized in a UML sequence diagram.

“consume” operation is represented by step 60“transform rules” operations are represented by the steps 80–120.

“storeAs” operation is represented by step 130It’s essential that you understand the responsibility of the various classes.

govAPI and govAPI_CH (red dots) are visible to the outside world (govAPIFactory, gsma) as one class instance.

For the caller it is irrelevant who is implementing which method.

govAPIFactory classOne final thing is missing, the govAPIFactory class, which is quite straightforward.

Depending on the country_code a corresponding implementation class instance will be created and returned to the caller:As one can see, the class has just one class method.

I.

e.

GovAPIFactory doesn’t support object instances.

A factory consists only once in a program, which is called the Singleton Pattern.

In software engineering, the singleton pattern is a software design pattern that restricts the instantiation of a class to one.

This is useful when exactly one object is needed to coordinate actions across the systemWe used a derivation here, and ensure the singleton by having only a static class method.

Uff, that was quite some content to be absorbed, we introduced two significant design pattern the factory method and canonical data model, as well have shown how to generate a first pair of charts.

The lesson3.

py program will generate the table within Plotly under the name CH-govapi-member-list.

That’s it for today, in the next article we will start doing some data analysis by combining the two data sources.

ExerciseYou can find the exercise to the tutorial here: LinkSource CodeThe source code can be found here (lesson three directory): https://github.

com/talfco/clb-sentimentOriginally published at dev.

cloudburo.

net on February 11, 2019.

.

. More details

Leave a Reply