Nature and data generated by the nature are simply very irregular and can not be accurately mapped into a mathematical relation, although every element of nature is driven by algorithms, they are very very complex to comprehend and irregular, unreliable for a machine learning model to understand.
Every adeptly trained machine learning model tries to map the feature set with the target cell in the simplest way possible.
We built a deep neural network to correctly predict earthquake, but that is still not as good as expected.
Let’s visualize the relationship between Root Mean Square Speed of the earthquake Energy Waves versus the Magnitude.
We use the Seaborn package, which is built upon Matplotlib to automatically generate linear regression plots with the highest confidence possible.
A great feature of this, it shows the deviation of the plots in terms of the regression line.
In case of the RMS Vs.
Magnitude, we see a linear relationship between the two, in spite of having minimal deviations on both sides of the line.
These deviations can be ignored due to the natural irregularities.
sns.
regplot(x= earth_quake_encoded['rms'], y= earth_quake_encoded['mag'], data = earth_quake_encoded, color = 'purple')Regression plot between Root Mean Square Speed and MagnitudeNow, let us visualize the relation between Depth in which the earthquake wave ruptured versus the magnitude.
The regression line with ignoble amount of deviation due to natural irregularities.
import seaborn as snsreg_plot_depth_mag = pd.
DataFrame(columns =['Depth', 'Magnitude'])reg_plot_depth_mag['Depth'] = earth_quake_encoded['depth'].
valuesreg_plot_depth_mag['Magnitude'] = earth_quake_encoded['mag'].
valuessns.
regplot(x=reg_plot_depth_mag['Depth'], y = reg_plot_depth_mag['Magnitude'], data = reg_plot_depth_mag, color = 'blue')Regression plot between Magnitude and Depth.
It’s time now to visualize the curve of relationship between horizontal distance to the epicenter from the recorded even versus magnitude.
The regression plot shows a linear regression line with high deviations, basically implying non-linearity between the two variables.
sns.
regplot(x= earth_quake_encoded['dmin'], y = earth_quake_encoded['mag'], data = earth_quake_encoded, color = 'crimson')Regression plot between Horizontal distance vs.
MagnitudeLet’s now validate the data-set through visualization.
Let’s take a look at the scatter plot between error in measuring magnitude versus magnitude to gain a confidence over the data-set we decide to plot the regression line as well.
The result is what we expected.
The regression line is negative in terms of its slope and is with minimal deviation, truly implying, if the error in measuring the magnitude increases, the actual magnitude decreases.
sns.
regplot(x = earth_quake_encoded['magError'], y=earth_quake_encoded['mag'], color = 'orange')Negative Regression plot between Magnitude and error in measuring magnitudeWe now want to have a look at the time-frame, basically plotting the magnitudes continuously with respect to time.
We use the Pandas Series.
from_csv to create a series object and plot it directly using the pyplot scripting layer of Matplotlib.
Before plotting, we ensure that the datatype of each of columns are correctly apprehended.
The scripting layer is intelligent enough to plot the magnitudes in a smaller scale(for the sake of better visualization) with respect to the selected period of time.
time_vs_mag_bar = pd.
DataFrame(columns = {'time', 'mag'})time_vs_mag_bar['time'] = usa_earthquakes['time']time_vs_mag_bar['mag'] = usa_earthquakes['mag']time_vs_mag_bar.
to_csv('time_vs_mag.
csv')dates = mpl.
dates.
date2num(time_vs_mag_bar['time'].
values)plt.
plot_date(dates, usa_earthquakes['mag'])Time-frame representation of magnitude vs.
timefrom pandas import Seriesseries = Series.
from_csv('usa_earth_quakes.
csv', header=0)series.
plot(style='-')plt.
show()Pandas Series plotting of continuous time-frame plotting of magnitude(in a smaller scale) vs.
time.
We now want to visualize the data-set in a three dimensional space to even better resonate the variations along each of the axis.
So, we use the Matplotlib’s mpl_toolkits.
mplot3d package’s Axes3D function to plot 3-dimensional representation of Depth Vs.
RMS Vs.
Magnitude and Angular Gap between recording stations Vs.
Number of Stations required to record the event vs Magnitude.
from mpl_toolkits.
mplot3d import Axes3Dfig = plt.
figure(figsize = (8, 6))ax = fig.
add_subplot(111, projection = '3d')ax.
set_xlabel('Depth')ax.
set_ylabel('RMS')ax.
set_zlabel('Magnitude')ax.
scatter(usa_earthquakes['depth'], usa_earthquakes['rms'], usa_earthquakes['mag'])plt.
show()3-dimensional representation of Depth vs.
RMS vs.
Magnitudefig = plt.
figure(figsize = (8, 6))ax = fig.
add_subplot(111, projection = '3d')ax.
scatter(usa_earthquakes['gap'], usa_earthquakes['nst'], usa_earthquakes['mag'])ax.
set_xlabel('Gap')ax.
set_ylabel('Nst')ax.
set_zlabel('Magnitude')plt.
show()3-dimensional representation of angular gap vs.
number of stations vs.
magnitudeTaking a look at the different maps representing the vulnerable to earthquake places in USA:From the retrieved earthquakes data, we fetch the latitude, longitude and the common names of the places (obtained using Regex pattern).
We use a python mapping library called Folium.
Folium is creates highly interactive maps using Leaflet.
js.
So, at this point we plot the latitudes and longitudes in a map centered around United States and use a Folium plugin called HeatMap to generate a heat map representing the vulnerability frequency across USA along the map.
Folium, being interactive, the heat map changes with zoom levels.
We can also use another function called FastMarkerCluster to visualize the earthquake heavy regions in an uncluttered way.
FastMarkerCluster is interactive too.
They collapse and expand with the zoom level.
heat_quake_map = folium.
Map(location=[usa_earthquakes['latitude'].
mean(), usa_earthquakes['longitude'].
mean()], zoom_start=4)latlons = usa_earthquakes[['latitude', 'longitude']].
values.
tolist()from folium.
plugins import HeatMapHeatMap(latlons).
add_to(heat_quake_map)heat_quake_mapfrom folium.
plugins import MarkerClusterfrom folium.
plugins import FastMarkerClusterusa_quake_map = folium.
Map(location=[usa_earthquakes['latitude'].
mean(), usa_earthquakes['longitude'].
mean()], zoom_start=4)usa_quake_map.
add_child(FastMarkerCluster(usa_earthquakes[['latitude', 'longitude']].
values.
tolist()))usa_quake_mapAnalyzing the Hotels Data:Earlier we saw, how we can fetch the street level data using Foursquare.
So, after we have finished fetching the hotels data, it’s now the time to analyse the returned nearby hotels data-set from Foursquare API.
We now, drop the null values, and group the values by unique hotel names just by using ‘.
unique()’ method on the data-frame’s hotel name column.
We obtain the frequency of occurrence as a nearby hotel per recorded location (‘Vulnerable Occurrence’) by simply using the ‘.
value_counts()’ method.
At the outset we return the data-frame displaying hotels with their Vulnerable Occurrence.
num = fx['Venue'].
value_counts()vulnerables = num.
rename_axis('business_name').
reset_index(name='Vulnerable Occurrence')vulnerablesPandas Dataframe showing the hotel names returned by Foursquare API which are frequently nearby to the recorded earthquake events.
Hotels with their respective Vulnerable occurrences in a horizontal bar chart.
needs_attention = vulnerablesneeds_attention.
set_index('business_name', inplace = True)needs_attention.
plot(kind = 'barh', width = 0.
8, edgecolor = 'black', color = 'tomato')fig = plt.
figure(figsize = (20, 10))plt.
show()from folium.
plugins import MarkerClusterfrom folium.
plugins import FastMarkerClustervulnerable_business_map = folium.
Map(location = [usa_earthquakes['latitude'].
mean(), usa_earthquakes['longitude'].
mean()], tiles = 'OpenStreetMap', zoom_start=4)vulnerable_business_map.
add_child(FastMarkerCluster(fd_grouped[['Venue Latitude', 'Venue Longitude']].
values.
tolist()))folium.
Circle([usa_earthquakes['latitude'].
mean(),usa_earthquakes['longitude'].
mean()], radius=1500000, color='red', opacity = 0.
6, fill=False).
add_to(vulnerable_business_map) folium.
Circle([usa_earthquakes['latitude'].
mean(),usa_earthquakes['longitude'].
mean()], radius=2000000, color='red', opacity = 0.
6, fill=False).
add_to(vulnerable_business_map) folium.
Circle([usa_earthquakes['latitude'].
mean(),usa_earthquakes['longitude'].
mean()], radius=2500000, color='red', opacity = 0.
6, fill=False).
add_to(vulnerable_business_map)from folium.
plugins import HeatMaplatlons = usa_earthquakes[['latitude', 'longitude']].
values.
tolist()HeatMap(latlons).
add_to(vulnerable_business_map)vulnerable_business_mapClustering the Hotels:Clustering is a unsupervised learning process, which segments or aggregates the unlabeled data-set based on certain similarities.
So, here we will use KMeans as a clustering algorithm, which is powerful but simple to use.
It basically picks up k number of centroids from the data-set and dynamically allocates them to process the unlabeled data into a cluster based on the average Minkowski Distance from the centroids until the algorithm converges.
Here, we specified number 5 to be number of clusters to segment the data into.
We use the package ScikitLearn like this :from sklearn.
cluster import KMeanscluster_list = fx.
set_index('Venue')cluster_listnum_clusters = 5clusters = cluster_list.
drop('Address', 1)kmeans_cls = KMeans(n_clusters = num_clusters, random_state = 0).
fit(clusters)kmeans_cls.
labels_We now drop the labels of the hotels data-set and give it to KMeans as an input to get the cluster labels.
We finally join the cluster labels with their respective Hotel names and plot them over the Folium map with color indexes to better visualize the clusters.
cluster_list['Cluster Labels'] = kmeans_cls.
labels_cluster_list.
reset_index(inplace = True)cluster_listimport matplotlib.
cm as cmimport matplotlib.
colors as colorslat_usa = 37.
09lng_usa = -95.
71mapping_earthquake_cluster = folium.
Map(location = [lat_usa, lng_usa], zoom_start = 4) cluster_in = [int(i) for i in cluster_list['Cluster Labels']] x = np.
arange(num_clusters)ys = [i + x + (i*x)**2 for i in range(num_clusters)]colors_array = cm.
rainbow(np.
linspace(0, 1, len(ys)))rainbow = [colors.
rgb2hex(i) for i in colors_array]markers_colors = []for lat, lon, poi, cluster in zip(cluster_list['Venue Latitude'], cluster_list['Venue Longitude'], cluster_list['Venue'], cluster_in): label = folium.
Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True) folium.
CircleMarker( [lat, lon], radius=10, popup=label, color=rainbow[cluster-1], fill=True, fill_color=rainbow[cluster-1], fill_opacity=0.
7).
add_to(mapping_earthquake_cluster)mapping_earthquake_clusterFolium Map showing the clustered hotels.
Business approach:From the aspect of business, we can say that, this project will be useful for the government city management departments such as Municipal Corporations or different surveyors.
The project, on the other hand will be simply useful to the stakeholders at different hotels or businesses to determine whether their hotel or business place requires attentions for upgrading to anti-quake devices or evaluation of structural strength.
To view the full code and how it was implemented, view my Google Colaboratory Notebook here.
Signing off.
Peace!“We are what our thoughts have made us; so take care about what you think.
Words are secondary.
Thoughts live; they travel far.
”~ Swami Vivekananda.
.