Data Science Using Unsupervised Learning & Visualization of Astronomy Data

Data Science Using Unsupervised Learning & Visualization of Astronomy DataA simple visualization of a complicated data makes the science behind it seem obvious.Above is the data plot by Sir Edwin Hubble in 1929 showing that farther the galaxy, faster it is moving away from us aka Redshift.As we’ve mapped more areas of the known universe, we’ve discovered astounding structures on the largest scales. Visualizing this structure in 2 or 3 dimension maps give us intuitive grasp of the composition & properties of galaxies within the universe and the forces the creation of that structure.There is huge public data available for scientific research, like the Spitzer S4G data — a survey of stellar structure within the galaxies. Here is the snippet of spitzer dataset for some galaxies.SPITZER S4G Data of Galaxy Clustersmstar(solMass): log10(stellar mass)Using Mabs1 and Mabs2 in the calibration of Eskew et al. (2012)c31_1: r75/r25 concentration index at 3.6 micronsc42_1: 5*log10(r80/r20) concentration index at 3.6 micronsphys_size : In KPC (kilo parsecs). 1 parsec = 3.26 Light yearsmabs1 and mabs2: Absolute magnitude of light wavelength at 3.6 and 4.5 microns.Note: R75 and R25 are radii at which enclosed luminosity is 75% and 25% respectivelyFor more references please look up:S4G Catalog DefinitionsThe S4G Catalog provides photometry and model parameters derived from the IRAC images, as well as a link to the summary…irsa.ipac.caltech.eduSample description of these galaxies can be found on Wikipedia:NGC 4725 – WikipediaNGC 4725 is an intermediate barred spiral galaxy with a prominent ring structure about 40 million light-years away in…en.wikipedia.orgNGC 4707 – WikipediaNGC 4707 has a morphological type of Sm or Im, meaning that it is mostly irregular or has very weak spiral arms. The…en.wikipedia.orgDoing a scattered matrix plot can give quick relationship between above parameters like physical size and stellar mass of galaxies.Using unsupervised learning like PCA and t-SNE can further help in evaluating this data.Here is the PCA plot of this data with 6 parameters.The results are bimodal. PCA clusters the galaxies based on the type Elliptical (Red) and Spiral (Blue)Another approach would be to plot t-SNE unsupervised learning algorithm with various perplexity values. Different values tried here are 5,10,15,30,40 & 50.These plottings coraborate with the PCA analysis. We can zoom in further to select a pocket within this data. Selecting a part of the data for further analysis.Plotting mstar & phys_size of this selective data of 30 galaxies, against their morphological type code “t” (ref: https://en.wikipedia.org/wiki/Galaxy_morphological_classification) shows :Conclusion shows that galaxies that we identified in the pocket of 30 are highly concentrated and of low stellar mass.On April 25th this year, GAIA published its DR2 archive. I was going through this archive and stumbled upon this video.Some quick plotting based on the above learning gave below visualizationsVisualizing Cepheid variables based on the limited arc data. Cepheid variables are candlesticks to gauge distances in space. Since luminosity of each type of Cepheid is constant it is easier to extrapolate their distances from earth.Right Ascension and Declination are placement co-ordinates. Right Ascension is the angular distance of a particular point measured eastward along the celestial equator from the Sun March equinox.Declination is the angular distance of a point north or south of the celestial equator.Above visualization shows the placement of various Cepheid between 73 & 80 RA and -65 and -67 Decl.Using Bokeh plot in python to plot GAIA exoplanet data with radius and mass compared to earth.There are many other parameters like luminosity, temperature etc which can be visualized from this data. In my next article, I am planning to pay tribute to KEPLER by creating some of the visualizations and inferences from that data and to welcome TESS in 2019.A more consistent and deeper initiative can create a boom in collaboration between astronomers, statisticians, data scientists and information & computer professionals and there by helping to accelerate our understanding of the SPACE around us.#datascience #astronomy #GAIADhaval Mandalia enjoys data science, project management, training executives and write about management strategies. He’s also a contributing member in the management association community in Gujarat. Follow him on Twitter and Facebook.References:GAIA Archives: https://gea.esac.esa.int/archive/SPITZER Data: https://irsa.ipac.caltech.edu

Leave a Reply