Sentiment analysis is a natural language processing (NLP) technique used to determine the sentiment or opinion expressed in a piece of text. It involves analyzing the text to classify its polarity as positive, negative, or neutral. This analysis can be applied to various forms of text, such as product reviews, social media posts, customer feedback, user experience and more.
Python Sentiment Library - VADER
VADER
VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool specifically designed for analyzing sentiments in social media texts.
Lexicon-based: VADER uses a lexicon (dictionary) that maps words to sentiment scores (positive, negative, neutral). Each word in the lexicon is rated based on its sentiment polarity.
Rule-based: It incorporates rules to handle intensifiers, degree modifiers, negations, and punctuation in sentiment scoring.
VADER - Sentiment Scores
VADER produces sentiment scores with 4 values for each text input:
pos: the probability of the sentiment to be positive
neu: the probability of the sentiment to be neutral
neg: The probability of the sentiment to be negative
compound: the normalized compound score which calculates the sum of all lexicon ratings and takes values from -1 to 1
VADER - Sentiment Scores
The compound score is particularly useful when we need a single measure of sentiment. The typical threshold values for the compound score are as follows:
positive: compound score >=0.05
neutral: compound score between -0.05 and 0.05
negative: compound score <=-0.05
For the purpose of dividing the sentiment into only two categories, the threshold values for the compound score can also be as follows:
positive: compound score > 0
negative: compound score <= 0
Perform Sentiment Analysis with VADER
Get Sentiment Values
!pip install nltk matplotlibimport pandas as pdimport nltkimport matplotlib.pyplot as plt
Requirement already satisfied: nltk in /opt/anaconda3/envs/py311/lib/python3.11/site-packages (3.8.1)
Requirement already satisfied: matplotlib in /opt/anaconda3/envs/py311/lib/python3.11/site-packages (3.8.4)
Requirement already satisfied: click in /opt/anaconda3/envs/py311/lib/python3.11/site-packages (from nltk) (8.1.7)
Requirement already satisfied: joblib in /opt/anaconda3/envs/py311/lib/python3.11/site-packages (from nltk) (1.4.2)
Requirement already satisfied: regex>=2021.8.3 in /opt/anaconda3/envs/py311/lib/python3.11/site-packages (from nltk) (2023.10.3)
Requirement already satisfied: tqdm in /opt/anaconda3/envs/py311/lib/python3.11/site-packages (from nltk) (4.66.4)
Requirement already satisfied: contourpy>=1.0.1 in /opt/anaconda3/envs/py311/lib/python3.11/site-packages (from matplotlib) (1.2.0)
Requirement already satisfied: cycler>=0.10 in /opt/anaconda3/envs/py311/lib/python3.11/site-packages (from matplotlib) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in /opt/anaconda3/envs/py311/lib/python3.11/site-packages (from matplotlib) (4.51.0)
Requirement already satisfied: kiwisolver>=1.3.1 in /opt/anaconda3/envs/py311/lib/python3.11/site-packages (from matplotlib) (1.4.4)
Requirement already satisfied: numpy>=1.21 in /opt/anaconda3/envs/py311/lib/python3.11/site-packages (from matplotlib) (1.26.4)
Requirement already satisfied: packaging>=20.0 in /opt/anaconda3/envs/py311/lib/python3.11/site-packages (from matplotlib) (24.1)
Requirement already satisfied: pillow>=8 in /opt/anaconda3/envs/py311/lib/python3.11/site-packages (from matplotlib) (10.3.0)
Requirement already satisfied: pyparsing>=2.3.1 in /opt/anaconda3/envs/py311/lib/python3.11/site-packages (from matplotlib) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7 in /opt/anaconda3/envs/py311/lib/python3.11/site-packages (from matplotlib) (2.9.0.post0)
Requirement already satisfied: six>=1.5 in /opt/anaconda3/envs/py311/lib/python3.11/site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)
Get Sentiment Values
# Import SentimentIntensityAnalyzerfrom nltk.sentiment.vader import SentimentIntensityAnalyzer# Ensure vader_lexicon is downloadednltk.download('vader_lexicon')# Initialize VADER sentiment analyzeranalyzer = SentimentIntensityAnalyzer()# Example texttext ="VADER is a great tool for sentiment analysis!"# Get sentiment scoresscores = analyzer.polarity_scores(text)# Display sentiment scoresprint(scores)
Replace the text with what you are interested, and use VADER to get the sentiment value.
# YOUR CODE IS HERE
Sentiment Over Time
How to display the sentiment over time?
# Import needed librariesimport pandas as pdfrom nltk.sentiment.vader import SentimentIntensityAnalyzerimport matplotlib.pyplot as plt# Example datasetdata = {'timestamp': ['2021-02-10', '2021-02-11', '2021-02-12'],'text': ["What common core equation do they use to arrive at this prediction. Can't wait to rub it in their face when that are wrong AGAIN!","I can't help but think that the dumbing down of the American demographic is a direct result of 'common core'; 'no child left behind' as ways of allowing 'good enough for govt work' to be passed off to the next person 'responsible.' Now the passed off own false entitlement","I just got an 'explainer' on how to read my child's report card. Now, if I can get an explainer on how to do second-grade common core math, I'll really be in good shape. #hopeful." ]}# Create DataFramedf = pd.DataFrame(data)# Convert timestamp to datetimedf['timestamp'] = pd.to_datetime(df['timestamp'])# Get sentiment valuesanalyzer = SentimentIntensityAnalyzer()# Define a function to get sentiment scoresdef get_sentiment_scores(text): scores = analyzer.polarity_scores(text)return scores['compound'] # Using compound score for overall sentiment# Apply sentiment analysis to each rowdf['compound_score'] = df['text'].apply(get_sentiment_scores)# Resample by time unit and calculate the mean sentiment score (example: daily sentiment average)sentiment_daily = df.set_index('timestamp').resample('D')['compound_score'].mean()# Plot sentiment over time plt.figure(figsize=(10, 6))plt.plot(sentiment_daily.index, sentiment_daily, marker='o', linestyle='-', color='b')plt.title('Sentiment Analysis Over Day')plt.xlabel('Date')plt.ylabel('Average Sentiment Score')plt.grid(True)plt.xticks(rotation=45)plt.tight_layout()plt.show()
Your Turn
Instead of sentiment over days, display sentiment over month.