Managing geographical data: ISO3166, UN/LOCODE and GeoNames

Managing geographical data: ISO3166, UN/LOCODE and GeoNamesM.

EmmanuelBlockedUnblockFollowFollowingJan 10Recently I had the need to define a data model for handling geographical data at the international level, — i.

e.

, how to properly manage data about postal addresses when your potential address is any place around the globe — , here are the challenges and the outcome of the alternatives I found when dealing with incorporating geographical data into applications and databases.

The challenge of the regional messWhen you examine how countries are organised, you find diverse criteria, sometimes regions are quite intangible and difficult to justify, especially in the case of European countries.

Just to give an example of how different the approach between two countries can be:Let’s focus first on the United States of America, with a population of 325 million, an extension of 9,525,067 square km, and a GDP of $20.

513 trillion.

US uses a simple and efficient schema:One nation, 27 states.

There is a clear division between the first and the second administrative order.

Now, let’s have a look at Spain, the fourth economy in the Euro Zone, neither one of the main countries, not a small one either.

With a population of 46 million, an extension of 505,990 square km, and a GDP of $1.

864 trillion.

This is an example of overcomplicated organisation:One country, 17 autonomous communities and 2 autonomous cities, 50 provinces and 8124 municipalities.

As some people could immediately state that both countries are, by no means, comparable (although they are countries, so they shall be comparable), we will analyse just one US State.

We will choose Texas: it is slightly bigger than Spain in extension, it has a 26 million population, and a GDP similar to Spain ($1.

639 trillion).

Texas is administratively divided into 254 counties.

Again, there is no complexity, it is a basic and simple administration.

Postal addresses in the US do not require secondary administrative levels.

They just use the State, because the State is considered the basic main administrative unit, which would be the equivalent to not using autonomous communities (not to mention provinces) in Spain.

This scenario of overcomplicated — and expensive — regional administrations is common in certain European countries.

The example analysed is not the only one, and you can find other countries such as France with similar situation.

Harmonising postal dataIf you ever face the need of handling international data, it might be helpful to know that there are some useful international standards in place.

In general, everytime that you have to deal with large reference data, it is always a good idea to check first if there are national or international standards in place.

You will not only find the best way to harmonise data (and I said best, not perfect) but it will be also easier to find master data to populate and keep the information updated.

If someone complains about data you can additionally argue that an international standard was used.

This way you will just have to provide a way for the user to mess including custom data with his/her own ideas of how their important local regions shall be put or named on the map.

The take away here is that handling postal data is a scenario where you will never have everyone happy.

Considering this, follow an internationally accepted standard and implement an interface to allow user making changes, and you are done.

The user is now the owner of the data (as it should be).

With this premise, we will examine the standards you can use and review the pros and cons of each one.

ISO 3166–1 CountriesIf you need a coding system and a list of countries look no further: ISO 3166–1 is your friend.

Alpha-2 Codes ISO3166–1As there are not so many countries in the world, and except for few names in dispute, this is a unique and deterministic list.

I like to use the Alpha-2 codes for lookup and populating lists.

Bear in mind that once you use the country in a consolidated record (like an issued invoice), you shall include both code and name, as there might be changes in the future.

Tip: use always the codes for table lookouts in user interfaces instead of storing the code itself.

Storing the permanent values in the final documents will reflect the current names when the documents or registers were issued; this will make your database robust to future changes and will free you from dealing with historical data.

Another option would be to use GUIDs/UUIDs/IDs for the entries and having an active column reflecting if that specific record is discontinued or not.

It is relatively easy to both maintain potential updates on this list and keep an accurate track on past changes.

However, I do not like this approach as at the end you are keeping old information and making master data maintenance and migrations harder.

For the information itself, the actual standard can be bought here, although the information is freely available from different sources, such as Wikipedia.

Tip: if you are working on a data analysis project which involves historical data sets (such as macroeconomics), and by some reason you have to include information about extinct countries, keep in mind that ISO3166–3 provides you a list of the now extinct countries, such as USSR or Yugoslavia.

ISO 3166–2 States/ProvincesAs mentioned in the introduction, the situation is not that easy for the second level of administration (some countries do have a third one, even when there is no entity or extension requiring so).

The best option here is again ISO3166–2.

You will find here the second and third administrative levels.

ISO3166–2:FRAgain, you can purchase the official standard or you can use the information freely available in Wikipedia.

In order to properly process the information, you can easily scrap the information using this PHP parser.

Support of ISO-3166 in Python and JavaIf you use Python, there is a very nice package named PyCountry which already contains parsed information for you.

In Java I have not found anything similar, although I personally think that this kind of information shall be managed at the database level, so you can create pertinent CSVs with above’s Python package and import them into your Java application.

UN/LOCODEAn alternative is to process the information from UNECE.

This includes not only the subdivisions and countries as per ISO3166 but also geolocation (including coordinates) of many cities in the world.

Although it is not complete (you will not find a one to one assignment for every city) it gives additional information.

It focuses on transportation as it includes information about the availability of transportation facilities such as docks, railway stations and IATA codes.

UN/LOCODE Provides information about trading/transportation locationsUN/LOCODE Country Subdivisions ISO 3166-2 – Trade – UNECEThe table displays the list of country names (official short name in English as in ISO 3166) in alphabetical order, of…www.

unece.

orgCities and postal codesStates and provinces were slightly challenging (try to figure out what to use in France), so hold your breath and think now about dealing with cities and postal codes.

The number of available cities explodes, but before you start panicking or thinking in using empty text fields for cities and postal codes be aware that there is a quite decent and straightforward solution: the GeoNames project.

GeoNamesThe GeoNames geographical database covers all countries and contains over eleven million placenames that are available…www.

geonames.

orgGeoNames is a widely used, completely free of charge database of worldwide locations, including their relationship with the second/third administrative levels of ISO3166–2 and the postal codes.

This is the furthest you can go without integrating other external services.

Additional options to exploreAfter my research, I was wondering if Google Maps API could be an option to integrate address search capabilities.

It is probably a good option if you have to deal at street level.

In my particular scenario, that was not a need, so I decided to stay with GeoNames (which can provide the functionality without online integration with external systems).

A good option, in this case, will be probably to check with national databases.

Most developed and semi developed countries will count on free information available, either as online services or downloadable databases.

.. More details

Leave a Reply