A ‘Spike’ of Inactive Voters in GeorgiaAnalyzing inactive and purged voters in the Georgia Voter File with RIntroductionThe most recent midterm election prompted discussions and debates about how, when, and why voters should be purged from voter registration files..I did not end up using this attribute for this project, but it may be useful in future analyses.I used all the same functions and the “readcolumns” vector on the November 2017 Georgia voter file as well.You can find all the code for this project on my github.AnalysisTo begin my analysis, I looked at inactive voters and their “date changed” status in the September 2017 Georgia Voter Files..Georgia uses a similar method to purge voters as Ohio.Throughout the analysis, I will refer to August 9th, 2017 as “the spike”.Who is in the spike?First, I wanted to see what percentage of voters in the September 2017 voter file were in the spike..Next, I wanted to see, of all the inactive voters in the September 2017 dataset, how many were in the spike.# Inactive voters in the spikeinactive.spike <- inactive.sept17 %>% filter(DATE_CHANGED == “2017–08–09”)# voters in spike / all inactive votersspike_by_inactive <- nrow(inactive.spike) / nrow(inactive.sept17) * 100# voters in spike / all votersspike_by_total <- nrow(inactive.spike) / nrow(ga_voter.sept17) * 100spike <- tibble(Voters = c('Of Total', 'Of Inactive'), Percent = c(spike_by_total, spike_by_inactive))Figure 2: Bar chart showing percent of voters in spike among total voters and total inactive voters.The spike contains 2.1% of all voters and 22.6% of all inactive voters in the September 2017 Georgia voter file..Because white people make up the majority in the total voter file, their representation in the spike and total inactive voters is not concerning..For this part of the analysis, I decided to repeat the analysis above on three different groups:People who voted in the 2016 General ElectionPeople purged between the spike and November 2017People who voted in 2016 General Election and were removed between the spike and November 2017For each group (and as above) I examined each of the three subgroups I created earlier — (1) voters in the spike, (2) voters who are inactive, and (3) total voters in the entire September 2017 voter file.Group 1: People who voted in the 2016 General ElectionTo look at everyone in the spike who voted, I took my original subset of inactive voters and filtered it by the spike date for their date changed and by election day 2016 for their date last voted..We should also remember that the spike does account for 22.6% of all the inactive voters in the September 2017 voter file which is a year after the presidential election.Looking at the racial breakdown of each population, I repeated the process of grouping by race, creating tables of the percentages of each race by their respected population, and combined the results.# Voted by race / all votedvoted_overall.race <- all.voted %>% group_by(RACE) %>% summarise(Total = n()/nrow(all.voted) * 100) %>% arrange(desc(Total))voted_overall.race# Voted in spike by race / all voted in spikevoted_in_spike.race <- spike.voted %>% group_by(RACE) %>% summarise(Spike = n() / nrow(spike.voted) * 100) %>% arrange(desc(Spike))voted_in_spike.race# Voted inactive by race / all voted inactivevoted_by_inactive.race <- inactive.voted %>% group_by(RACE) %>% summarise(Inactive = n() / nrow(inactive.voted) * 100) %>% arrange(desc(Inactive))voted_by_inactive.race# Make a table for all results to compare proportionsoverall_inactive <- merge(x = voted_overall.race, y = voted_by_inactive.race, by = "RACE")overall_inactive_spike.1 <- merge(x = overall_inactive, y = voted_in_spike.race, by = "RACE") %>% arrange(desc(Total))format(overall_inactive_spike.1, digits=1, nsmall=2)# Using gather, we can make the data more friendlier to work with in a graphoverall_inactive_spike.2 <- overall_inactive_spike.1 %>% gather(Total, Inactive, Spike, key="Voters", value="Percent") %>% arrange(RACE)format(overall_inactive_spike.2, digits=1, nsmall=2)Figure 5: Bar graph representing the racial breakdown of 2016 General Election day voters in the spike, total inactive voters, and total voters.It seems that there is a noticeable disproportion in spike and inactive voters from total voters..This result indicates that white people’s representation in the spike and inactive voter groups is smaller than their overall representation and the opposite is true for black people — their representation appears disproportionate.Group 2: People purged between spike and November 2017After looking at people in the spike who voted, the next question I had was how recently are people removed once they become inactive?I used the November 2017 Georgia voter file and to look for people in the spike who were in the spike but not in the November 2017 voter file..This would mean that sometime between them becoming inactive and the release of the November 2017 voter file that the voter had been purged.# How many were purged from the entire voter file?purged.all <- ga_voter.sept17 %>% filter(!(ga_voter.sept17$REGISTRATION_NUMBER %in% ga_voter.nov17$REGISTRATION_NUMBER))# How many were purged from the spike?purged.spike <- inactive.spike %>% filter(!(inactive.spike$REGISTRATION_NUMBER %in% ga_voter.nov17$REGISTRATION_NUMBER))# How many were purged that were inactive?purged.inactive <- inactive.sept17 %>% filter(!(inactive.sept17$REGISTRATION_NUMBER %in% ga_voter.nov17$REGISTRATION_NUMBER))purged_by_spike <- nrow(purged.spike) / nrow(inactive.spike) * 100purged_by_inactive <- nrow(purged.spike) / nrow(inactive.sept17) * 100purged_by_total <- nrow(purged.spike) / nrow(ga_voter.sept17) * 100purged <- tibble(Voters = c('Total', 'Inactive', 'In Spike'), Percent = c(purged_by_total, purged_by_inactive, purged_by_spike))Figure 6: Bar graph of spike voters who were removed between September and November 2017 in the spike, total inactive voters, and total voters.Less than 1% of the spike was purged from the voter file..Although that is a small percent, it is still over 900 people.Next, I looked at the racial breakdown of this group — purged voters from the spike.# Purged by race / all purgedpurged_total.race <- purged.all %>% group_by(RACE) %>% summarise(Total = n() / nrow(purged.all) * 100) %>% arrange(desc(Total))purged_total.race# Purged by race in spike / purged in spikepurged_by_spike.race <- purged.spike %>% group_by(RACE) %>% summarise(Spike = n() / nrow(purged.spike) * 100) %>% arrange(desc(Spike))purged_by_spike.race# Purged by race inactive / all inactivepurged_by_inactive.race <- purged.inactive %>% group_by(RACE) %>% summarise(Inactive = n() / nrow(purged.inactive) * 100) %>% arrange(desc(Inactive))purged_by_inactive.race# Make a table for all results to compare proportionspurged_inactive <- merge(x = purged_total.race, y = purged_by_inactive.race, by = "RACE")purged_inactive_spike.1 <- merge(x = purged_inactive, y = purged_by_spike.race, by = "RACE") %>% arrange(desc(Total))format(purged_inactive_spike.1, digits=1, nsmall=2)# Using gather, we can make the data more friendlier to work with in a graphpurged_inactive_spike.2 <- purged_inactive_spike.1 %>% gather(Total, Inactive, Spike, key="Voters", value="Percent") %>% arrange(RACE)format(purged_inactive_spike.2, digits=1, nsmall=2)Figure 7: Bar graph representation of racial breakdown of purged voters between September and November 2017 in the spike, total inactive voters, and total voters.It seems that among the spike, white people are most disproportionate..I used my previous subset of purged voters and filtered it for dates last voted equal to November 8th, 2016.purged_all.voted <- purged.all %>% filter(DATE_LAST_VOTED == "2016-11-08")purged_inactive.voted <- purged.inactive %>% filter(DATE_LAST_VOTED == "2016-11-08")purged_spike.voted <- purged.spike %>% filter(DATE_LAST_VOTED == "2016-11-08")voted_purged_by_spike <- nrow(purged_spike.voted) / nrow(inactive.spike) * 100voted_purged_by_inactive <- nrow(purged_spike.voted) / nrow(inactive.sept17) * 100voted_purged_by_total <- nrow(purged_spike.voted) / nrow(ga_voter.sept17) * 100voted_and_purged <- tibble(Voters = c('Total', 'Inactive', 'In Spike'), Percent = c(voted_purged_by_total, voted_purged_by_inactive, voted_purged_by_spike))Figure 8: Bar graph of spike voters who were purged and voted on 2016 election day in the spike, total inactive voters, and total voters.We’ve reduced the data quite a bit..For now, let’s check out the racial breakdown.voted_purged_all.race <- purged_all.voted %>% group_by(RACE) %>% summarise(Total = n() / nrow(purged_all.voted) * 100) %>% arrange(desc(Total))voted_purged_all.racevoted_purged_inactive.race <- purged_inactive.voted %>% group_by(RACE) %>% summarise(Inactive = n() / nrow(purged_inactive.voted) * 100) %>% arrange(desc(Inactive))voted_purged_inactive.racevoted_purged_spike.race <- purged_spike.voted %>% group_by(RACE) %>% summarise(Spike = n() / nrow(purged_spike.voted) * 100) %>% arrange(desc(Spike))voted_purged_spike.race# Make a table for all results to compare proportionsvoted_purged_inactive <- merge(x = voted_purged_all.race, y = voted_purged_inactive.race, by = "RACE")voted_purged_inactive.1 <- merge(x = voted_purged_inactive, y = voted_purged_spike.race, by = "RACE") %>% arrange(desc(Total))format(voted_purged_inactive.1, digits=2, nsmall=2)# Using gather, we can make the data more friendlier to work with in a graphvoted_purged_inactive.2 <- voted_purged_inactive.1 %>% gather(Total, Inactive, Spike, key="Voters", value="Percent") %>% arrange(RACE)format(voted_purged_inactive.2, digits=2, nsmall=2)Figure 9: Bar graph representing the racial breakdown of purged 2016 election day voters between September and November 2017 in the spike, total inactive voters, and total voters.The disproportion still seems to effect white people the most..Maybe if I only looked at purged voters across multiple months and voter files, I might find disproportions.To wrap things up, I decided to create a horizontal bar graph displaying all total populations among the groups we have observed — (1) Everyone in the voter file, (2) People who voted in the 2016 General Election, (3) People purged between the spike and November 2017, and (4) People who voted in 2016 General Election and were removed between the spike and November 2017.# Compare total populations of each group and overalltotal_populations.1 <- tibble(Race = c("WH", "U", "OT", "HP", "BH", "AP", "AI"), Overall = total_inactive_spike.1$Total, Voted = total_inactive_spike.voted.1$Total, Purged = purged_inactive_spike.1$Total, Voted_And_Purged = voted_purged_inactive.1$Total)total_populations.1total_populations.2 <- total_populations.1 %>% gather(Overall, Voted, Purged, Voted_And_Purged, key = "Total_Type", value = "Percent") total_populations.2Figure 10: Horizontal bar graph of total proportions of each subset — Everyone in the voter file, people who voted in on the 2016 General Election Day, people who were purged between September 2017 and November 2017, and people who were purged and voted.Figure 11: Table representation of horizontal bar graph above.ChallengesMy greatest over all challenge in this project was trying not to get lost in the data.. More details