Using R to analyze movies shot in San Francisco

The Golden Gate Bridge stands tall with the city of San Francisco on the horizonUsing R to analyze movies shot in San FranciscoTanmayee WBlockedUnblockFollowFollowingMar 24The city of San Francisco is pure magic and I had an amazing time visiting this Golden City a few months back.

So when I chanced upon a very interesting data set of movies and series shot across several locations in San Francisco, I couldn’t stop myself from digging into it.

Lo and behold, I found 4 interesting insights!23 movies shot in San Francisco were released in just one year 2015!Golden Gate Bridge is the hottest film-shooting location among all the locations in the city (No surprises there!)Director Andrew Haigh has shot a whopping 11 movies in the City by the Bay!Warner Bros.

Pictures (among entertainment biggies) and Hulu (among streaming service companies) have the most number of movies/series shot in San Francisco.

In this article, I am going to show how I analyzed this data through 6 steps.

So fire up your R Studio and follow along!Step 1: Importing Librarieslibrary(dplyr)library(readxl)library(writexl) library(tidyquant)library(tidyverse)library(forcats)library(magrittr)Step 2: Loading the dataThe data for this exercise can be found on this url .

csv('Film_Locations_in_San_Francisco.

csv')Let us now eyeball the data and understand how its dimensions.

dim(film)We have 1622 rows and 11 columns.

Since the number of columns is large, this is a wide dataset and so, we will use glimpse() function to take a look at the columns all at once.

glimpse(film)We see NAs in the data but we will leave them as it is.

Step 3: Wrangling the dataI wanted to make the column names more standard by replacing “.

” in the names with “_”.

Also, I decided to re-order a few columns by putting ‘Title’, ‘Release_Year’, ‘Locations’, ‘Director’ and ‘Writer’ at the top of the list.

#Replacing ".

" in the column names with "_"film_wrangled <- film %>% set_names(names(.

)%>% str_replace_all(".

", "_")) %>%#Re-ordering the columns select(Title, Release_Year, Locations, Director, Writer, everything()) %>% glimpse()We see that the desired changes have been made.

Step 4: Data ManipulationTo deep dive into the analysis, I decided to focus on only a few columns namely ‘Release_Year’, ‘Title’, ‘Locations’, ‘Director’ and ‘Distributor’.

film_tbl <- film_wrangled %>% #Choosing the columns to focus on select(Release_Year, Title, Locations, Director, Distributor) %>% glimpse()So, the tibble ‘film_tbl’ is our go-to dataset for any analysis that would follow after this point.

Question 1: Which release years had the most number of movies shot in San Franciscopopular_by_year <- film_tbl %>% group_by(Release_Year) %>% summarize(Number_of_movies = n_distinct(Title)) %>% filter(Number_of_movies > 5) %>% arrange(desc(Number_of_movies))This can be easily understood if it is visualized.

popular_by_year %>% ggplot(aes(x = Release_Year, y = Number_of_movies)) + #Adding geometries geom_col(fill = "#18BC9C") + #Formatting geom_text(aes(label= Number_of_movies), vjust = 0 ) + theme_tq() + labs( title = "Number of movies shot in San Francisco by release year", subtitle = "The City was the clear favourite of movie-makers in 2015 ", x = "Release Year", y = "Number of Movies" )It seems like the year 2015 saw most movies shot in SF followed by the year 2016.

Before these two years, most movie shootings in SF took place in the year 1996.

Question 2: Which are the most famous locations in San Francisco to shoot a movie?popular_locations <- film_tbl %>% group_by(Locations) %>% summarize(Number_of_movies = n_distinct(Title)) %>% filter(Number_of_movies > 5) %>% filter(Number_of_movies < 54) %>% arrange(desc(Number_of_movies))Visualizing this…popular_locations %>% ggplot(aes(x = reorder(Locations, -Number_of_movies), y = Number_of_movies )) + #Adding Geometries geom_col(fill = "#C40003" ) + #Formatting theme_tq() + geom_text(aes(label= Number_of_movies), hjust = 0 ) + labs ( title = "San Francisco is a top choice of the Directors", subtitle = "Andrew Haigh directed the most number of movies shot in the City", y = "Number of Movies" ) + coord_flip()The data tells us that the most famous location to shoot a movie is Golden Gate Bridge (No surprises here!).

It is hotly contested with other popular locations such as City Hall, Fairmont Hotel at 950 Mason Street, Treasure Island, and Coit Tower.

Question 3: Which directors shot the most number of movies in San Francisco?directors <- film_tbl %>% group_by(Director) %>% summarize(Number_of_movies = n_distinct(Title)) %>% filter(Number_of_movies > 3) %>% arrange(desc(Number_of_movies))Visualizing this …directors %>% ggplot(aes(x = Director, y = Number_of_movies )) + #Adding Geometries geom_col(fill = "#0055AA" ) + #Formatting theme_tq() + geom_text(aes(label= Number_of_movies), vjust = 0 ) + labs ( title = "San Francisco is a top choice of the Directors", subtitle = "Andrew Haigh directed the most number of movies shot in the City", y = "Number of Movies" )Director Andrew Haigh has shot a whopping 11 films in this City by the Bay followed by Alfred Hitchcock, Chris Columbus, Garry Marshall and Philip Kaufman who have shot 4 films each there.

Question 4: Which distribution companies have most of their movies shot in San Francisco?film_tbl %>% select(Title, Distributor) %>% group_by(Distributor) %>% summarise(Number_of_movies = n_distinct(Title)) %>% arrange(desc(Number_of_movies))We see that Warner Bros.

Pictures is the clear winner here.

Also, it is noteworthy that popular streaming service Hulu seems to have a lot of series/movies shot in San Francisco.

Question 5: What are the names of the movies shot at the Golden Gate Bridge?Now that we have figured out that most movies usually end up including a scene or two at the famous Golden Gate Bridge, let us see their names.

film_tbl %>% select(Title, Locations, Release_Year) %>% filter(grepl("Golden Gate Bridge", Locations)) %>% arrange(desc(Release_Year))We see the movies released in recent years i.

e.

Looking (2014), Milk (2008) and the Bridge (2006), among others, shot at this very famous location.

Fantastic, isn’t it?A quick tip — Out of these movies, I found only ‘Milk’ available on Netflix! So if you want to take in the scenic beauty of the Golden Gate Bridge, you know where to head now! :)Thanks for reading this article! If you liked it, do send me claps :)The source code is hosted on my GitHub page.