Quantifying "Low-Brow" and "High-Brow" Films

I went and saw Certain Women a few months ago. I was pretty excited to see it; a blurb in the trailer calls it “Triumphant… an indelible portrait of independent women,” which sounds pretty solid to me. The film had a solid point in that it exposed the mundane, everyday ways in which women have to confront sexism. It isn't always a huge dramatic thing that is obvious to everyone—instead, most of the time sexism is commonplace and woven into the routine of our society.

The only problem is that I found the movie, well, pretty boring. Showing how quotidian sexism is in a film makes for a slow-paced, quotidian plot. A few days ago, I happened upon the Rotten Tomatoes entry for the movie. It scored very well with critics (92% liked it), but rather poorly with audiences (52%). It made me think of the divisions between critics and audiences; I thought that the biggest differences between audience and critic scores could be an interesting way to quantify what is “high-brow” and what is “low-brow” film. So what I did was got critic and audience scores for movies in 2016, plotted them against one another, and looked at where they differed most.

Method

The movies I chose to examine were all listed on the 2016 in film Wikipedia page. The problem was I needed links to Rotten Tomatoes pages, not just names of movies. So, I scraped this table, took the names of the films, and I turned them into Google search URLs by taking "https://google.com/search?q=rottentomatoes+2016+" and using paste0 to add the name of the film at the end of the string. Then, I wrote a little function (using rvest and magrittr) that takes this Google search URL and fetches me the URL for the first result of a Google search:

# function for getting first hit from google page
getGoogleFirst <- function(url) {
  url %>% 
    read_html() %>% 
    html_node(".g:nth-child(1) .r a") %>% 
    html_attr("href") %>% 
    strsplit(split="=") %>% 
    getElement(1) %>% 
    strsplit(split="&") %>% 
    getElement(2) %>% 
    getElement(1)
}

After running this through a loop, I got long vector of Rotten Tomatoes links. Then, I fed them into two functions that gets critic and audience scores:

# get rotten tomatoes critic score
rtCritic <- function(url) {
  url %>% 
    read_html() %>% 
    html_node("#tomato_meter_link .superPageFontColor") %>% 
    html_text() %>% 
    strsplit(split="%") %>% 
    as.numeric()
}
# get rotten tomatoes audience score
rtAudience <- function(url) {
  url %>% 
    read_html() %>% 
    html_node(".meter-value .superPageFontColor") %>% 
    html_text() %>% 
    strsplit(split="%") %>% 
    as.numeric()
}

The film names and scores were all put into a data frame.

Results

Overall, I collected data on 224 films. The average critic score was 56.74, while the average audience score was 58.67; while audiences tended to be more positive, this difference was small, 1.93, and not statistically significant, ,t(223) = 1.34, p = .181. Audiences and critics tended to agree; scores between the two groups correlated strongly, r = .68.

But where do audiences and critics disagree most? I calculated a difference score by taking critic - audience scores, such that positive scores meant critics liked the film more than audiences. The five biggest difference scores in both the positive and negative direction are found in the table below.

“High-Brow” Films

Film Critic Audience Difference
The Monkey King 2 100 49 51
Hail, Caesar! 86 44 42
Little Sister 96 54 42
The Monster 78 39 39
The Witch 91 56 35
Into the Forest 77 42 35

“Low-Brow” Films

Film Critic Audience Difference
Hillary's America: The Secret History of the Democratic Party 4 81 -77
The River Thief 0 69 -69
I'm Not Ashamed 22 84 -62
Meet the Blacks 13 74 -61
God's Not Dead 2 9 63 -54

Interactive Plot

Below is a scatterplot of the two scores with a regression line plotted. The dots in blue are those films in the tables above. You can hover over any dot to see the film it represents as well as the audience and critic scores:



I won't do too much interpreting of the results—you can see for yourself where the movies fall by hovering over the dots. But I would be remiss if I didn't point out the largest difference score was an anti-Hillary Clinton movie: 4% of critics liked it, but somehow 81% of the audience did. Given all of the evidence that pro-Trump bots were all over the Internet in the run-up to the 2016 U.S. presidential election, I would not be surprised if many of these audience votes were bots?

Apparently I'm a low-brow plebian; I did not see any of the five most “high-brow” movies, according to the metric. Both critics and audiences seemed to love Hidden Figures (saw it, and it was awesome) and Zootopia (still haven't seen it).

Let me know what you think of this “low-brow/high-brow” metric or better ways one could quantify the construct.