A Simple Exercise with Cluster Analysis Using the factoextra R Package

We even created Small Business Saturday to counteract the big-box propaganda of Black Friday and steer us towards the doors of our local mom-and-pops.“Small” firms make up 99% of all businesses in the United States (US)and 48% of all employment..I did shorten the column names in R.R script loading data and changing namesSeveral of the features contained asterisks with the explanation that they did not want to expose a particular business with the result..I chose not to do this since I wanted all 50 states to appear in the visualization.R script converting data typesInitially, I ran a clustering tendency algorithm on the data..Clustering tendency refers to the likelihood of any real clusters existing in a data set..Clustering tendency matters because a clustering algorithm will find clusters no matter what — even from totally random data.Clustering tendency gives us an idea of how much weight we should give to resulting clusters..Implicitly I had assumed states in proximity would have similar economic conditions thus these observations stood out.The other two cluster estimations suggested clusters of two and one (boring)..Printing a summary of principle component analysis reveals that the percentage of employees that work for small businesses wielded the most weight in principal component one.R script summarizing principle component analysisA few features tied for second place including both export-related variables, the two of which probably correlate..I suspect the percent of employees employed by small business and the average number of employees per small business are also correlated (I know; just run a correlation analysis…not this time.)R script with principal component loadingsI enjoyed making visuals with this package set but only envision a couple practical applications for this data set in real life..Assuming this analysis could identify states experiencing similar economic conditions for small businesses, one could add features related to federal legislation to determine if any relationships exists between specific policy and the small business outcomes.. More details

Leave a Reply