Natural Language Processing of Wine Reviews
Posted by Ira Villar
Updated: Jun 29, 2020
If there's one thing that shares a wide variety of granular flavors with subtle intricacies and nuanced flavor notes, it's wine. Each person has their own taste and preference and with that, their own take and perception of various wines available all throughout the world. This field seemed like a principal playground and battlefield for natural language processing or NLP for short.
To start off, I found a data set with over 130,000 wine reviews from around the world. Other than the word clouds I created on Jupyter Notebook, I also created an interactive dashboard to find certain word clouds from various regions and locations.
The first thing I did to the dataset was to isolate the descriptions and take out extra spaces as well as any punctuation. I then removed stop words such as 'if, the, and 'from' since they added no value to the word clouds I was about to make. The next steps included lemmatization and stemming. This is to get the root word needed to grasp the true intention of the reviews. An example could be 'feet' to mean 'foot'.
Here are just some examples of some word clouds I was able to create, separated by category or variety of wine.
and Cabernet Sauvignon
I also wanted to make an interactive tool that could be used to find various word clouds to get an idea of the wines per location and region.
To be able to properly load the data set into tableau I'd have to make a column for each of the words in the descriptions (yes, per row), and only then would I be able to pivot the information to create and outrageously larger data set. I think it was worth it in the end.
A tableau dashboard that shows unique wordclouds per location, region, or even specific wine chosen.
NLP looks a bit complicated at first but surprisingly if you follow the methodology, it all just makes sense. Hopefully this tool can assist others in choosing wine that they would enjoy.
Ira is currently a Data Science Fellow at the NYC Data Science Academy. He has nearly a decade of experience in film directing and production. This gives him a unique insight and perspective when it comes to data analysis and interpretation.View all articles
Topics from this blog: Student Works