Data Scraping the Stars
Custom Function in R to Web Scrape Michelin Star Data
The Project
While working on a project about the Michelin Guide and its impact on underrepresentation in fine cuisine, I wrote a function in R to scrape data from the Michelin website. This function pulls data by distinction category (1, 2, or 3 Michelin stars), and includes restaurant name, location, price tier, and cuisine
The Challenge
I was first inspired to work on this project after listening to a podcast episode from The Economics of Everyday Things about Michelin Stars. I wanted to know: where are Michelin restaurants most prevalent, and which cuisines are most acknowledged by the Michelin Guide? Not finding a suitably updated data set, I decided to scrape the data directly from the Michelin Guide website.


The Solution
Using the R library called ‘rvest’ (which facilitates convenient web page scraping), and identifying the correct HTML elements associated with each variable I wanted, I could extract the data and produce a tidy data frame.

Ensuring that I was pulling the correct HTML elements was tricky at first, and compelled me to understand a bit more about HTML structure. 

Identifying the HTML elements in the Michelin website, in order to call them in the R function.

The Function in R
Below is function. The input arguments are numeric distinction level (1, 2, or 3), and the number of pages in the Michelin Guide website search result.

Note: the "rename_location" function used in lines 73 and 74 is a prerequisite for this function and can be found in the R script on the GitHub repository.
The following "rename_location" function is used in lines 73 and 74 above (define this function before using data scraping function).
Back to Top