Data Scraping the Stars

Custom Function in R to Web Scrape Michelin Star Data

Custom Function in R to Web Scrape Michelin Star Data

The Project

While working on a project about the Michelin Guide and its impact on underrepresentation in fine cuisine, I wrote a function in R to scrape data from the Michelin website. This function pulls data by distinction category (1, 2, or 3 Michelin stars), and includes restaurant name, location, price tier, and cuisine

While working on a project about the Michelin Guide and its impact on underrepresentation in fine cuisine, I wrote a function in R to scrape data from the Michelin website. This function pulls data by distinction category (1, 2, or 3 Michelin stars), and includes restaurant name, location, price tier, and cuisine

The Challenge

I was first inspired to work on this project after listening to a podcast episode from The Economics of Everyday Things about Michelin Stars. I wanted to know: where are Michelin restaurants most prevalent, and which cuisines are most acknowledged by the Michelin Guide? Not finding a suitably updated data set, I decided to scrape the data directly from the Michelin Guide website.

The Solution

Using the R library called ‘rvest’ (which facilitates convenient web page scraping), and identifying the correct HTML elements associated with each variable I wanted, I could extract the data and produce a tidy data frame.

Ensuring that I was pulling the correct HTML elements was tricky at first, and compelled me to understand a bit more about HTML structure.

I was first inspired to work on this project after listening to a podcast episode from The Economics of Everyday Things about Michelin Stars. I wanted to know: where are Michelin restaurants most prevalent, and which cuisines are most acknowledged by the Michelin Guide? Not finding a suitably updated data set, I decided to scrape the data directly from the Michelin Guide website.

The Solution

Using the R library called ‘rvest’ (which facilitates convenient web page scraping), and identifying the correct HTML elements associated with each variable I wanted, I could extract the data and produce a tidy data frame.

Ensuring that I was pulling the correct HTML elements was tricky at first, and compelled me to understand a bit more about HTML structure.

The Function in R

Below is function. The input arguments are numeric distinction level (1, 2, or 3), and the number of pages in the Michelin Guide website search result.

Note: the "rename_location" function used in lines 73 and 74 is a prerequisite for this function and can be found in the R script on the GitHub repository.

Below is function. The input arguments are numeric distinction level (1, 2, or 3), and the number of pages in the Michelin Guide website search result.

Note: the "rename_location" function used in lines 73 and 74 is a prerequisite for this function and can be found in the R script on the GitHub repository.

The following "rename_location" function is used in lines 73 and 74 above (define this function before using data scraping function).