Twitter Analysis of Presidential Candidates 2016

title

Introduction

The 2016 Presidential Election is fast approaching and President Barack Obama’s second term is about to come to an end. Both the Republican and Democratic parties have nominated their candidate for president, which sees Donald Trump battling Hillary Clinton. Public sentiment plays an important role in influencing who becomes the future leader of the United States. Both candidates have strong followers on Twitter with Clinton having 9.21 million to Trump’s 11.9 million followers. For this Shiny project, I developed an application which analyzes tweets in real time made by the Presidential candidates themselves, and tweets by general public directed at the candidates, to gain some insight. This application provides information about the sentiment and frequently used words in the tweets.  Sentiment towards the candidates fluctuates quickly as interviews, debates, responses to global events, and other issues occur.

This application is divided into 2 sections. The first part focuses on the tweets made by the candidates while the second section focuses on tweets directed at the candidates. The app allows a user to search the tweets based on dates and the number of tweets the user wants to view. It will analyze the sentiments of the searched tweets based on the NRC sentiment which is explained later. It also displays the commonly used words by the candidates and the general public in a Word Cloud, and shows the number of tweets posted by the candidates on particular calendar date. Similarly, this app gives information about users who tweeted the most about the candidates. On September 26th 2016, the first Presidential Debate was held at Hofstra University in New York. This app analyzed the tweets and the reaction of the people after the debate.

1

 Fig: Table displaying all the retrieved tweets

Section 1: Tweets made by Presidential Candidate (Trump vs Clinton)

Both candidates have a twitter account with Trump’s handle being “realDonaldTrump” and Clinton’s being “HillaryClinton”. The maximum number of recent tweets the application was able to access was only 1084 for Trump and 158 for Clinton due to the limits of Twitter’s API. Therefore, to have a fair analysis of the tweets, 150 recent tweets from both candidates were analyzed. They were gathered on September 28, 2016, two days after the First Presidential Debate was held.

1.1 Word Cloud

A Word Cloud is a powerful text mining method that highlights the most frequently used keywords in a paragraph of texts. In our case, it highlights the most frequently used words in the tweets. The commonly used keywords stand out better in a word cloud. The tweet texts are loaded using the Corpus function, and they then needs to be cleaned and transformed. Only the top 200 most frequent words are displayed, and each word must occurs at least three times.

t_tweet  h_tweets

Fig: Word Cloud for Donald Trump (left) and Hillary Clinton (right)

Comparing the two word clouds, we can see the most commonly used words by each candidate. The higher the frequency of the word, the large it will appears in the word cloud.

1.2 Sentiment Analysis

For sentiment analysis, I used the “get_nrc_sentiment” function from the Syuzhet package in R. This function implements the NRC Emotion Lexicon which was developed by Dr. Saif Mohammad and his team. The NRC Emotion Lexicon is very popular and has been widely used for sentiment analysis. It consists of list of words which have been associated with eight emotions which are "anger", "anticipation", "disgust", "fear", "joy", "sadness", "surprise" and "trust", with additional two sentiments “negative” and “positive”.

t_sentiments

Fig: Sentimental Analysis of Tweets of Donald Trump

sentiments_h

Fig: Sentimental Analysis of Tweets of Hillary Clinton

Analyzing the tweets of the candidates, we can see a higher frequency of positive sentiments is common to both. However, the frequency of negative for Trump is much higher than Clinton’s. This suggests that tweets made by Donald Trump were more negative than those from Hillary Clinton.

1.3 Tweet Calendar

The tweet calendar feature gives information about the number of retrieved tweets that were posted on particular day of the calendar. The different shades of blue color shows the number of tweets that were posted on that particular day with darker shades depicting higher numbers of tweets posted. In the figure below, we can see that both candidates tweeted the most a day after the first presidential debate which was held on September 26th.

t_cal

Fig: Tweets Calendar of Donald Trump

calendar_hillary

Fig: Tweets Calendar of Hillary Clinton

Section 2: People’s tweets about Presidential candidate (Trump vs Clinton)

In this section, we are analyzing the tweets made by the general public towards the Presidential Candidates. Donald Trump and Hillary Clinton faced off for the first time in the first Presidential Debate, clashing over policies and attacking each other on the issues. The tweets were analyzed after the debate to see how people reacted towards each candidate. For the analysis, more than 3000 tweets mentioning Trump and Clinton were taken into consideration.

2.1 Word Cloud

Looking at both word clouds, we can see that “debate” was most frequently used words since it was the first debate between Trump and Clinton. Similarly other commonly used words are also highlighted in the word clouds that are bigger in size and have colors other than the dark green.

ppl_trump ppl_wc_hillary

Fig: Word Cloud for Donald Trump (left) and Hillary Clinton (right)

2.2 Sentiment Analysis

Looking at the sentiment analysis plot, it was surprising to see that people had more negative than positive reactions towards the candidates. Hillary Clinton and Donald Trump are both historically unpopular, but large numbers of Americans who can't stand them will likely vote for one of them anyway by choosing what they consider to be the lesser of two evils. In this light, the sentiment analysis plot is simply a reflection of their low popularity numbers among the left, right and neutral groups.

ppl_senti_trump

Fig: Sentimental Analysis of Tweets of Donald Trump

ppl_senti_hillary

 

Fig: Sentimental Analysis of Tweets of Hillary Clintons

2.3 Users who tweets most about the candidates

The application also displays the top ten users who tweeted most often about the presidential candidates in the retrieved tweets.

usermost_t

Fig: Users who tweeted most about Donald Trump

usermost_h

Fig: Users who tweeted most about Hillary Clinton

Conclusion

Twitter has played an important role in this election campaign. It has given a platform, not only for the presidential candidates, but also for the people to express their views. It is the new form of personal marketing which has allowed the candidates to engage with followers and make emotional connections with the voters – especially younger audiences. The app revealed that the tweets’ sentiments changed with time. Hillary Clinton consistently seemed to be more positive in her tweets than Donald Trump. After analyzing tweets from the general public following the first Presidential Debate, it was surprising to see that the sentiment toward both Hillary Clinton and Donald Trump were more negative than positive. However, sentiment tends to fluctuate quickly as the candidates move along the road to November fourth. This app is still under construction and more features will be added in the future. Stay tuned.

Github Links: https://github.com/sam648/TwitterAnalysis

 

 

Samriddhi Shakya
Samriddhi Shakya
Samriddhi comes from a Remote Sensing and Geographic Information Systems (GIS) background. He has a Master’s degree in Geography from Auburn University and Bachelors of Engineering degree in Geomatics from Kathmandu University. During his Masters at Auburn University, he was involved in two separate projects for the Environmental Protection Agency (EPA) and the Office of Water Resources (OWR) in Alabama. For the first project, he classified isolated wetlands from aerial imageries using Object Based Image Analysis methods and for the second, he estimated water irrigation rate on agricultural lands from satellite images. He also built a website Alabamaview.org, a consortium to promote research and education. The website allows users to download GIS files and imageries for different counties of Alabama. After graduating from Auburn University, he joined Sun IT solutions where he underwent training in Big Data (Hadoop, Spark) and Data Warehousing Technologies (Informatica, Teradata).He recently graduated from NYCDSA where he enhanced his data science skills in data manipulation, data analysis, machine learning, and visualization. Moreover, he completed five projects which included machine learning and building web applications. He built interactive Shiny apps, one of which focused on analyzing sentiments of twitter feeds and another helped potential renters search for apartments in Manhattan. For his Capstone project, he built an app which displayed Topic Models created from reviews of Yelp dataset using Natural Language Processing (NLP) to help restaurant owners improve their business.

Leave a Reply

Your email address will not be published. Required fields are marked *