Baffled by Covariance and Correlation??? Get the Math and the Application in Analytics for both the terms..

The values from PCA done using the correlation matrix are closer to each other and more uniform as compared to the analysis done using the covariance matrix.This analysis with the correlation matrix definitely, uncovers some better structure in the data and relationships between variables..The above example can be used to conclude that the results significantly differ when one tries to define variable relationships using covariance and correlation..This in turn, affects the importance of the variables computed for any further analyses..Selection of predictors and independent variables is one prominent application of such exercises.Now, let us take another example to check if standardizing the data-set before performing PCA actually gives us the same results..To showcase agility of implementation across technologies, I shall execute this example in Python..We will consider the ‘iris’ data-set for the same.This data-set will now be standardized using the inbuilt function.I have then computed 3 matrices:Covariance matrix on standardized dataCorrelation matrix on standardized dataCorrelation matrix on unstandardized dataLet us look at the results:Here, it looks like the results are similar..The similarity of results (fractional differences) reinforces the understanding that correlation matrix is just a scaled derivative of the covariance matrix..Any computation on these matrices should now yield the same or similar results.To perform a PCA on these matrices from scratch, we will have to compute the eigenvectors and eigenvalues:As expected, the results from all three relationship matrices are the same..We shall now plot the explained variance charts from the eigenvalues so obtained:The charts resemble each other..Both the charts show that PC1 has the maximum contribution of around 71%..This analysis establishes the fact that standardizing the data-set and then computing the covariance and correlation matrices will yield the same results.One can use this learning to judiciously apply the concept of variable relationships before using any predictive algorithm.. More details

Leave a Reply