# Unpacking (**PCA)

Staring with sample data as usual.rng = np.random.RandomState(1)X_raw = np.dot(rng.rand(2, 2), rng.randn(2, 200)).TX_mean = X_raw.mean(axis=0)X = X_raw – X_meanDefining the 200 *2 matrix X, we use two components for the plot’s sake.from sklearn.decomposition import PCApca = PCA(n_components=2)pca.fit(X) # Apply PCAax = plt.gca(); ax.set_xlabel('X'); ax.set_ylabel('Y')plt.scatter(X[:, 0], X[:, 1], alpha=0.3, color="#191970")plt.scatter(pca.mean_[0], pca.mean_[1], c='red', s=50)plt.axis('equal')The rest of process is the same as before.for length, vector in zip(pca.explained_variance_, pca.components_): dir_ = vector * 3 * np.sqrt(length) start = pca.mean_; end = start + dir_ arrowprops = dict(arrowstyle='->',linewidth=2, shrinkA=0, shrinkB=0, color='red', alpha=0.5) ax.annotate('', xy=end, xytext=start, arrowprops=arrowprops)PCA with Scikit-learn versionNot to mention, the result is the same.As we can see above code, we can access the largest eigenvectors via components_ and eigenvalues via explained_variance_.explained_variance_Let’s take a look at the explained_variance_Major principal component explains 97.6% [0.7625315/(0.7625315+0.0184779)] of sample data and second principal component does the rest..That means that we can almost describe original data without second principal component..Scikit-learn already calculates the explained variance radio, so we can use via explained_variance_ratio_.For better understandings, let’s use 1797 *64 matrix digits data, and reduce those dimensions 64 to 10.from sklearn.datasets import load_digitsdigits = load_digits()pca = PCA(n_components=10).fit(digits.data)Now we can see how much these new 10 components can describe original sample data.plt.plot(np.cumsum(pca.explained_variance_ratio_),'o-', c='#663399', alpha=.5)plt.xlabel('Number of components')plt.ylabel('Cumulative explained variance')Accuracy vs..number of componentsAs it turns out, it described 72~73% of sample instead of using 64 dimensions that ensures 100% of accuracy..Notice that in the left graph the first component is index 0, that’s why the graph begins from 14~15% of variance.Suppose how many of components should be needed to get 90% of accuracy then?.We can blank the PCA function and to plot the graph first.pca = PCA().fit(digits.data) # Blank inside the closed bracketplt.plot(np.cumsum(pca.explained_variance_ratio_),'o-', c='#663399', alpha=.5)plt.xlabel('Number of components')plt.ylabel('Cumulative explained variance')We would interpret around 20 components should be good enough for 90% of variance..The simpler way for this is to add the digit into the function directly.pca = PCA(.9)For further understandings, let’s inverse the transformed sample data into original data, and apply heatmap to them..We should observe that the more the number of components is, the more precise it creates sample data.Heatmap 1: original data# Heatmap 1sns.heatmap(digits.data, cbar=False)Heatmap 2: two principal components# Heatmap 2pca = PCA(2)PC=pca.fit_transform(digits.data)inversed=pca.inverse_transform(PC)sns.heatmap(inversed, cbar=FalseHeatmap 3: twenty principal componentsHeatmap 4: forty principal components# Heatmap 3pca = PCA(20)PC=pca.fit_transform(digits.data)inversed=pca.inverse_transform(PC)sns.heatmap(inversed, cbar=False)# Heatmap 4pca = PCA(40)PC=pca.fit_transform(digits.data)inversed=pca.inverse_transform(PC)sns.heatmap(inversed, cbar=False)Clearly the original has a lot more features than inversed data with two principal components..Using forty principal components, we can more precisely inverse the transformed data.We actually can show the digits data we are dealing with so far._, axes = plt.subplots(2, 8, figsize=(8, 2), subplot_kw={'xticks':[], 'yticks':[]}, gridspec_kw=dict(hspace=0.1, wspace=0.1))for i, ax in enumerate(axes.flat): ax.imshow(digits.data[i].reshape(8, 8), cmap='binary', interpolation='nearest', clim=(0, 16))Digits from original sampleLet’s make these digits images rougher, or more precisely, once perform dimensionality reduction and inverse the digits image to the original size.pca = PCA(2)PC = pca.fit_transform(digits.data)inversed = pca.inverse_transform(PC)_, axes = plt.subplots(4, 10, figsize=(10, 4), subplot_kw={'xticks':[], 'yticks':[]}, gridspec_kw=dict(hspace=0.1, wspace=0.1))for i, ax in enumerate(axes.flat): ax.imshow(inversed[i].reshape(8, 8), cmap='binary', interpolation='nearest', clim=(0, 16))Reconstructed digits imagesWe can also apply PCA to Eigenfaces, since its feature is only brightness like digits image above..In terms of reconstructing the original data, we take a slightly different step from digits reconstruction..Since we need an average face first..After inverting transformed data, we add the average face.. More details