Retrieving Restaurant Ratings

This text describes the different ways to gather restaurant ratings used in the post Revisiting the Michelin Stars. This last aims at analyzing whether the Michelin-starred restaurants did actually get good user reviews in community-based platforms such as Google PlacesTripAdvisor and Verema.

While the following rating gathering must be taken as a part of the restaurants study, it has been written so it can be digested standalone. Nevertheless, I encourage you to read the original post as well in order to help you get into the context.

The link with the Michelin Stars analysis is where to find the restaurants ratings. These can be compiled from three different websites that provide information of both (1) the restaurant rating and (2) the number of user ratings per restaurant:

  • Google Places: 5- scale rating & number of user ratings
  • TripAdvisor: 5- scale rating & number of user ratings
  • Verema: 10-scale rating & number of user ratings
michelinRatings_websites

Python Scripts are used to extract the ratings from the different websites

 


Information Diversity

The three sources are selected because they are all characteristic websites from different scopes; and this provides high diversity from different user communities. This allows, for example, considering rating information from different geographic areas (i.e. Google Places and TripAdvisor are worldwide databases, but Verema contributions mainly come from Spain), or contemplating different communities with different interests (i.e. Google Places is generalist, TripAdvisor is totally focused on travellers, while Verema is about gastronomy).

Thinking about diversity, in order to confirm the distinctiveness of the three data sources I have performed a small correlation analysis. Since for each restaurant there is a rating coming from each website (which yields 3 arrays of 169 restaurant ratings), the correlation can be computed by simply computing the Pearson coefficient for each pair of websites:

Correlation values between the different websites ratings

Correlation values between the different websites ratings

First, the fact that all values are positive indicates that the users’ opinions from the three platforms follow the same direction (i.e. a good restaurant is rated good in all three websites). Second, the values do not indicate a very strong correlation between the different websites (they stay far from 1). This is good news because it can be seen as an indicator of the data source variety. And third, the higher correlation is between TripAdvisor and Verema. This makes sense as, intuitively, the travel and gastronomy communities have similar interests. In conclusion, using these three websites as data sources seems a good decision.

 


Python Implementation

Following there is a brief description of the Python scripts used to retrieve the ratings from each website. They were implemented the quick way, so some refactor might be needed to improve them… :)

For the Google Places case the script just queries the Google Places API. As shown in the code snippet below, first note that the passed argument restaurants is an array of dicts where each entry represents a restaurant; and where each restaurant is described by the keys name and location. Observe how after configuring the Google Places API (lines 5-8), the code queries the API for each restaurant in the array (line 11). Since a query result is a list of places (line 16), this last one is looped until the right restaurant is found (line 19). When matching the restaurant with the places in the results, the code checks different combinations to create the name of the restaurant assuming that the word ‘restaurant’ might be at the beginning, or at the back, or… (lines 20-29). If the restaurant is found, a string is printed including the (1) the name of the restaurant, (2) the average user rating, and (3) the number of ratings (line 35). If the restaurant is not found, a string with empty values is printed instead (lines 40-44).

##### function to search google places ratings #####
def GetGooglePlacesRatings(restaurants):
	# 'restaurants': array of dicts = [{u"name":u"El Celler de Can Roca",u"location":u"Girona"},{u"name":u"Koy Shunka",u"location":u"Barcelona"}]

	# set google places api
	from googleplaces import GooglePlaces, types, lang
	google_places_api_key = 'GOOGLE-PLACES-API-KEY'
	google_places = GooglePlaces(google_places_api_key)
	
	# search restaurants
	for restaurant in restaurants:
		restaurantFound = 0
		restaurantName = restaurant["name"].lower()
		restaurantLocation = restaurant["location"].lower()

		query_result = google_places.nearby_search(keyword=restaurantName, location=restaurantLocation, radius="50000", types=[types.TYPE_FOOD], sensor="false")
		
		# loop query results
		for place in query_result.places:
			if (restaurantName == place.name.lower() or 
				'restaurant ' + restaurantName  == place.name.lower() or
				'restaurante ' + restaurantName == place.name.lower() or
				restaurantName + ' restaurant'  == place.name.lower() or
				restaurantName + ' restaurante'  == place.name.lower() or
				'restaurant ' + place.name.lower()  == restaurantName or
				'restaurante ' + place.name.lower() == restaurantName or
				place.name.lower() + ' restaurant'  == restaurantName or
				place.name.lower() + ' restaurante'  == restaurantName
				):
				place.get_details()
				if 'user_ratings_total' in place.details:
					restarurantNumRatings=place.details['user_ratings_total']
				else:
					restarurantNumRatings='-'
				print place.name, '\t', place.rating, '\t', restarurantNumRatings
				restaurantFound = 1
				break
		
		# restaurant not found, print query results for further manual inspection
		if(restaurantFound==0):
			print restaurant["name"], '\t-\t-\t(',
			for place in query_result.places:
				print place.name + ' - ',
			print ')'
	return

While for using the Google Places API was straightforward, a different approach was taken to obtain the ratings from TripAdvisor and Verema. As for TripAdvisor, their API does not allow free access for ‘research’ studies; and Verema does not provide any API-based access. Luckily, when you look for a restaurant in Google, the results page includes the ratings for both websites. It is then just a matter of parsing the HTML code of the Google search results.

In here note that the restaurants in the list passed are also looped onw by one (line 9). For each one we send a search query to Google (line 14) and we then use the library BeautifulSoup [LINK] to parse the results page: jumping to the right div tags we reach the average rating and the number of ratings (lines 19-33). Observe that we check the same name combinations as before (lines 35-46).

### function to search tripadvisor ratings parsing the google search results page ###
def GetTripadvisorRatingsFromGoogleSearch(restaurants):
	# 'restaurants': array of dicts = [{u"name":u"El Celler de Can Roca",u"location":u"Girona"},{u"name":u"Koy Shunka",u"location":u"Barcelona"}]

	import requests # for url downloading
	import bs4 # for html parsing
	from unidecode import unidecode

	# search restaurants
	for restaurant in restaurants:
		restaurantFound = 0
		restaurantName = restaurant["name"].lower()

		# get google search page
		response = requests.get('https://www.google.es/search?q='+restaurantName+'+tripadvisor')
		soup = bs4.BeautifulSoup(response.text)
		
		searchResults = soup.select('li.g')
		for result in searchResults:
			# parse title
			titleString = result.h3.findAll('b')
			if(len(titleString)>0):
				title = titleString[0].text
				tripadvisorInTitle = titleString[len(titleString)-1].text
			
			# parse ratings
			ratingsDivs=result.findAll('div',attrs={'class':'f'})
			if(len(ratingsDivs) > 0):
				ratingsStringA=ratingsDivs[0].text
				ratingsStringB=ratingsStringA[14:] # strip out beginning
				ratingsStringC=ratingsStringB[:-8] # strip out ending
				ratingsString=unidecode(ratingsStringC).split(' - ')
			else:
				ratingsString=''
			
			if((unidecode(restaurantName) == title.lower() or 
				'restaurant ' + unidecode(restaurantName)  == title.lower() or
				'restaurante ' + unidecode(restaurantName) == title.lower() or
				unidecode(restaurantName) + ' restaurant'  == title.lower() or
				unidecode(restaurantName) + ' restaurante'  == title.lower() or
				'restaurant ' + title.lower()  == unidecode(restaurantName) or
				'restaurante ' + title.lower() == unidecode(restaurantName) or
				title.lower() + ' restaurant'  == unidecode(restaurantName) or
				title.lower() + ' restaurante'  == unidecode(restaurantName) ) and
				tripadvisorInTitle.lower() == 'tripadvisor' and
				len(ratingsString) == 2
				):
				print restaurant["name"], '\t', ratingsString[0], '\t', ratingsString[1] 
				restaurantFound = 1
				break

		if(restaurantFound==0):
			print restaurant["name"], '\t-\t-'
			
	return
### function to search verema ratings parsing the google search results page ###
def GetVeremaRatingsFromGoogleSearch(restaurants):
	# 'restaurants': array of dicts = [{u"name":u"El Celler de Can Roca",u"location":u"Girona"},{u"name":u"Koy Shunka",u"location":u"Barcelona"}]

	import requests # for url downloading
	import bs4 # for html parsing
	from unidecode import unidecode

	# search restaurants
	for restaurant in restaurants:
		restaurantFound = 0
		restaurantName = restaurant["name"].lower()

		# get google search page
		response = requests.get('https://www.google.es/search?q='+restaurantName+'+verema')
		soup = bs4.BeautifulSoup(response.text)
		
		searchResults = soup.select('li.g')
		for result in searchResults:
			# parse title
			titleString = result.h3.findAll('b')
			if(len(titleString)>0):
				title = titleString[0].text
				veremaInTitle = titleString[len(titleString)-1].text
			
			# parse ratings
			ratingsDivs=result.findAll('div',attrs={'class':'f'})
			if(len(ratingsDivs) > 0):
				ratingsStringA=ratingsDivs[0].text
				ratingsStringB=ratingsStringA[14:] # strip out beginning
				ratingsStringC=ratingsStringB[:-6] # strip out ending
				ratingsString=unidecode(ratingsStringC).split(' - ')
			else:
				ratingsString=''
			
			if((unidecode(restaurantName) == title.lower() or 
				'restaurant ' + unidecode(restaurantName)  == title.lower() or
				'restaurante ' + unidecode(restaurantName) == title.lower() or
				unidecode(restaurantName) + ' restaurant'  == title.lower() or
				unidecode(restaurantName) + ' restaurante'  == title.lower() or
				'restaurant ' + title.lower()  == unidecode(restaurantName) or
				'restaurante ' + title.lower() == unidecode(restaurantName) or
				title.lower() + ' restaurant'  == unidecode(restaurantName) or
				title.lower() + ' restaurante'  == unidecode(restaurantName) ) and
				veremaInTitle.lower() == 'verema' and
				len(ratingsString) == 2
				):
				print restaurant["name"], '\t', ratingsString[0].split('/')[0], '\t', ratingsString[1] 
				restaurantFound = 1
				break

		if(restaurantFound==0):
			print restaurant["name"], '\t-\t-'
			
	return

  1 comment for “Retrieving Restaurant Ratings

Leave a Reply

Your email address will not be published. Required fields are marked *