Are Sydney’s Top Restaurants Really Worth It?

Let’s have a look at one:/cities This function retrieves details about each city.

To search for Sydney we input the following:user-key as your API keyq = ‘Sydney’count = 1{ "location_suggestions": [ { "id": 260, "name": "Sydney, NSW", "country_id": 14, "country_name": "Australia", "country_flag_url": "https://b.

zmtcdn.

com/images/countries/flags/country_14.

png", "should_experiment_with": 0, "discovery_enabled": 0, "has_new_ad_format": 1, "is_state": 0, "state_id": 128, "state_name": "New South Wales", "state_code": "NSW" } ], "status": "success", "has_more": 0, "has_total": 0}It’s returned us JSON content which looks to be a dict.

We’ll visit how to extract the data points we want later in the article.

Now let’s get back to answering the question:Breaking it down…Are Sydney’s Top Restaurants Really Worth It?This is a fairly general question and can be interpreted in many ways, so as we begin sourcing our data points it’s important we think about:What do we classify as Top Restaurants?.By what means are we ranking them?Are we considering all types of restaurants?.Dinner?.Cafe?What is worth it?.Do we need to consider the price of the restaurant?.And how that compares to it’s rating?Properly defining our problem, prior to designing the solution will help to save plenty of time.

(Trust me, I’ve run into that problem before)Without further ado, let’s begin our first step!1.

Extract data using the Zomato APIWe will primarily be using the search function, which returns details on restaurants, allowing us to get a maximum of 100 restaurants, with 20 results per call.

Below is a python script that pulls Zomato restaurant data and dumps it into a file for use later on:The parameters entered for search were found using other functions such as:/locations: To source the latitude and longitude coordinates for “Sydney”/category: To source the Dinner category id = 10In addition to that, I chose to sort the restaurants by rating in desc order.

Based on my parameters you can see that I’ve answered some of the questions in breaking it down (earlier on in the article — in case you missed it) i.

e.

Top Restaurants will be based on their ratingWe’ll consider only Dinner restaurant typesThe return value of search is the same as cities in that it returns a dict.

Given, we’d like to extract information on 100 restaurants but are only limited to 20 per call, what can we do?.We grab each dict returned and append it to a list until we get 100 restaurants.

Once we have the list, we write it into a file so we can use it later on.

Our JSON file Top100SydneyDinners is now ready to get transformed!2.

Transform the dataBefore we do this, we should first know what our data looks like.

To start let’s return to our cities output earlier on and understand how that’s structured.

Do you want to have a guess?It’s a dict, where one of the keys i.

e.

“location_suggestions” has a value of type list and that list contains a dict with multiple key pairsBit confusing, I know!.But what’s good about it, is it means it is a combination of dicts and lists, and we can use their corresponding methods to get the data we want.

Now…copying the Top100SydneyDinners over here would blow out this article, so here is a link to that file on my GitHub account.

It might also be helpful to copy it’s contents into a JSON Formatter so it’s more readable.

See any interesting data points there?Our goal here is to get the data we want out of JSON format and into a lovely, clean table for analysis.

To start let’s load in our JSON file, define the columns we want in our DataFrame and then create it!I did create more columns than needed (sorry, got a bit greedy with my data), but to answer our main question, the focal points should be:average_cost_for_twoaggregate_rating : This is the restaurant ratingNext, we need a way to populate our currently empty DataFrame.

To do so, we select the interested pieces of data from each restaurant (e.

g.

name, cuisines, etc.

) and form that into a single list.

Each restaurant will then have it’s own list, and that will be appended as a row to our DataFrame.

Let’s have a look at the code below…Remember, the JSON file consists of a list of dicts that were appended via the python script earlier on when extracting the data.

Hopefully the code above is clear enough to understand, but if not I try to explain the nested for loop below:It may help to have the JSON formatted output opened, as you walk through these steps.

As the file is a list of dicts (allRestaurants), we need to iterate through the list to get each dict (restaurantSet).

In each dict (restaurantSet), we want the value from key — "restaurants"This value is also a list of dicts (restaurants) , so again we need to iterate through the list to get each dict (restaurant).

From this dict (restaurant), we want the value from key — "restaurant"This value is also a dict, and finally holds the data we can now grab.

e.

g.

name, cuisines, etc.

(although some values are also dicts e.

g.

“location”)Voila!.We have our Zomato restaurant data set!3.

Make sense of our dataQuick question recap:Are Sydney’s Top Restaurants Really Worth It?We’ve retrieved the Top 100 restaurants for dinner based on their rating, but what we haven’t done is look at their cost.

So let’s analyse average_cost_for_twoWe start by calculating the mean.

Average Cost for Two = $ 107.

9Percentage of restaurants below the mean is = 66.

0 %We get an average of $107.

90 for two people, where 66% of the total restaurants cost less than this.

Does that seem right to you?.Does the mean appropriately describe the centre of our average costs?.Personally, I doesn’t seem so.

Let’s visualise our data in a histogram to get a better idea.

You see the data is skewed to the right (i.

e.

tail on the right side is longer than the left side), which indicates it would be more appropriate to use the median as our measure of central tendency rather than the mean.

This is because the mean is sensitive to outliers, such as those values near 440 and 480, whereas the median is not.

So, shall we calculate the median?The median cost is = $ 80.

0The percentage of values below the median is 48.

0 %Cool!.So now we have the median cost as $80.

00 and only 48% of restaurants lie below this value.

That seems more reasonable, as our chosen measure of central tendency is now getting closer towards 50% of values being above and below it.

However…if you return to the histogram there are restaurants with an average cost for two around the $20 mark.

Unless they’re doing some sort of crazy special, I find this hard to believe.

so let’s do some more digging.

3.

1 Investigate outliersWe grab the 10 lowest average_cost_for_two restaurants and view them in the DataFrame.

Oops!. More details

Leave a Reply