NYC-Toursim, Web Scraping

NYC-Toursim, Web Scraping

Posted by Evin

Updated: Jul 23, 2020


New York City is one of the world’s most visited cities, making tourism one of its major income sources. As someone who moved to New York City over ten years ago, I thought it would be interesting to run a short analysis to see how tourism has grown over the past years. Let’s dive in…

Scrapping data 

The two sites that were used for scrapping are and Tripadvisor is a great resource when it comes to travel information lookup, effectively making it one of the most used sites in this category. The dataset I scrapped down contains four columns consisting of the attraction name, type of attraction, number of reviews and the rating it received. I was interested to find out which attractions were the most popular out of the hundred or thousands in the city. This dataset shows an overall sense of how popular each attraction site is. The following is a peek of the dataset:

Dataset 1 - Tripadvisor

The second site that I scrapped was an affiliate of Baruch College. It's a site containing data and information mostly for educational purposes -- including  the data that I scrapped of New York City’s tourism over the past 13 years. It contains information for four categories: Domestic and International Visitors, Economic Impact of Tourism on New York City's Economy, New York City (NYC) Hotel Market and International Visitors to New York City By Major Countries and Regions.  

Dataset 2 -NYCData, Baruch College


While analyzing the dataset of Tripadvisor, I was interested in finding out what the top ten attractions in the city and how popular they actually are. The steps I took was simply grouping the data and looking for the max value in the number ratings or reviews. The result is shown in the below captures. 

Top ten attractions in NYC
Top ten attraction categories
Top ten attraction categories - Graph

After the results from analyzing the first dataset, I went on to inspect the second dataset. There are four categories in this dataset which I have introduced before. In the Domestic and International Visitors, we are looking at the total number of visitors in both international and domestic. We can see in the graphs below that there were over 710 million visitors from 2004 to 2017. Among them, 569 millions were domestic and the rest were international. It's quite obvious that NYC is popular to both domestic and international visitors, and it has been growing.

The next category I inspected was Economic Impact of Tourism on New York City's Economy. It contains data for the total spending of visitors, taxes, wages for local workers and also jobs created by tourism. On average, both domestic and international visitors generated over 32 billion dollars each year from 2004 to 2017, 8 billion dollars each year in taxes, 18 billions dollars in wages each year and created over 34,000 jobs each year. Moreover, tourism helps NYC locals by creating more job opportunities and more financial resources. The relationship between total spending of visitors and wages are positive, meaning more people visiting NYC would help generate more job opportunities and dollars for the city.

The next category is the hotel market, and the two variables I looked at were daily room rate and average hotel occupancy. Being one of the most popular cities for tourism, the hotel market in NYC has been one of the biggest and most profitable one. As the first graph shows, the daily room rate in NYC has always been above $200 and the occupancy rate has been stable above 0.8 or 80% on average.


Future work

Hopefully to get more data and provide a more comprehensive analysis of NYC tourism business.




Wei(Evin) Lin is a certified data scientist with with a bachelor’s in Finance and a bachelor’s in Statistics. He has 3+ years of Finance and accounting internship experience across sale and trading, accounting and general finance fields. He...

View all articles

Topics from this blog: Student Works python web scraping

Interested in becoming a Data Scientist?

Get Customized Course Recommendations In Under a Minute