A Kernel for Mapping with Gaussian Processes

I ended up using this function because the computation of the geopy distance takes too long for gaussian process regression.Here is the results obtained with one of the best models obtained using this kernel.Map generated using gaussian processes and the modified kernel..The metric used with this kernel is the great-circle distance that return the distance between two points given their longitude and latitude.Did you notice how well the model is retrieved below the North Sea, the Norwegian Sea and the Mediterranean Sea?.The kernel finds consistency in the broad structure of the map..This consistency reinforces the a posteriori error, and the residuals drastically decreased.Map of the a posteriori error and map of the residuals for the model built with the corrected kernel and the great-circle distance.To finish, I calculated a score upon the residuals using different setups..The score is the mean squared error of the residuals calculated over the whole map:Where nΩ is the number of points of coordinates s on the map Ω, y is the real model and ŷ is the model obtained with gaussian processes..I ran the optimization using four different kernels:RBF kernel from sklearn with default distance (Euclidean distance),Rational quadratic kernels from sklearn with default distance (still Euclidean distance),The modified RBF kernel with geopy distance,The modified RBF kernel with the great-circle distance.I also test different values for a, that gives the expected noise level in the observations.Mean Squared error of the residuals modeled with RBF kernel, with the rational quadratic kernels from sklearn, with the modified RBF kernel using geopy distance and with the modified RBF kernel using the great-circle distance..Colors of the dots correspond to the setup used for the level of noise.The best scores are obtained when the distances on the sphere are taken into account..When the noise level is set too high, the model returns flat and the score decreases..When the level of noise is 0, the sampled data are exactly retrieved using one of the distances calculated over the sphere, whereas using the rational quadratic kernel give worse result when we decrease the noise.It makes perfect sense to use this adapted distance when mapping a large area over the Earth..Of course, with smaller region, using rational quadratic distance is going to give neat results..But isn’t it amazing how easy it is to obtain accurate mapping upon the Earth using this simple setup.To be continuedHere are some leads on how I want to continue this work:First, I need to modify the distance function to decrease the computing time..For now, it may still be more interesting to use the rational quadratic kernel because the use of modified kernel is too time consuming.Next, the two metrics I presented only allows to model from two features: latitude and longitude.. More details

Leave a Reply