Sentiment Analysis with Python

Lab 2: Code-Along

Agenda

  1. Sentiment Analysis

  2. Python Sentiment Library - VADER

  3. Perform Sentiment Analysis with VADER Library

    • Import and Preprocess Data (Lab 1.2; Lab 1.3)

    • Get Sentiment Values

    • Sentiment Value Summaries and Visualization

      • Sentiment Counts (Lab 1.4)

      • Single Sentiment Value (Lab 1.4)

      • Sentiment Over Time

Sentiment Analysis

What is sentiment analysis?

Sentiment analysis is a natural language processing (NLP) technique used to determine the sentiment or opinion expressed in a piece of text. It involves analyzing the text to classify its polarity as positive, negative, or neutral. This analysis can be applied to various forms of text, such as product reviews, social media posts, customer feedback, user experience and more.

Python Sentiment Library - VADER

VADER

VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool specifically designed for analyzing sentiments in social media texts.

  • Lexicon-based: VADER uses a lexicon (dictionary) that maps words to sentiment scores (positive, negative, neutral). Each word in the lexicon is rated based on its sentiment polarity.

  • Rule-based: It incorporates rules to handle intensifiers, degree modifiers, negations, and punctuation in sentiment scoring.

VADER - Sentiment Scores

VADER produces sentiment scores with 4 values for each text input:

  • pos: the probability of the sentiment to be positive

  • neu: the probability of the sentiment to be neutral

  • neg: The probability of the sentiment to be negative

  • compound: the normalized compound score which calculates the sum of all lexicon ratings and takes values from -1 to 1

VADER - Sentiment Scores

The compound score is particularly useful when we need a single measure of sentiment. The typical threshold values for the compound score are as follows:

  • positive: compound score >=0.05

  • neutral: compound score between -0.05 and 0.05

  • negative: compound score <=-0.05

For the purpose of dividing the sentiment into only two categories, the threshold values for the compound score can also be as follows:

  • positive: compound score > 0

  • negative: compound score <= 0

Perform Sentiment Analysis with VADER

Get Sentiment Values

!pip install nltk matplotlib
import pandas as pd
import nltk
import matplotlib.pyplot as plt
Requirement already satisfied: nltk in /opt/anaconda3/envs/py311/lib/python3.11/site-packages (3.8.1)
Requirement already satisfied: matplotlib in /opt/anaconda3/envs/py311/lib/python3.11/site-packages (3.8.4)
Requirement already satisfied: click in /opt/anaconda3/envs/py311/lib/python3.11/site-packages (from nltk) (8.1.7)
Requirement already satisfied: joblib in /opt/anaconda3/envs/py311/lib/python3.11/site-packages (from nltk) (1.4.2)
Requirement already satisfied: regex>=2021.8.3 in /opt/anaconda3/envs/py311/lib/python3.11/site-packages (from nltk) (2023.10.3)
Requirement already satisfied: tqdm in /opt/anaconda3/envs/py311/lib/python3.11/site-packages (from nltk) (4.66.4)
Requirement already satisfied: contourpy>=1.0.1 in /opt/anaconda3/envs/py311/lib/python3.11/site-packages (from matplotlib) (1.2.0)
Requirement already satisfied: cycler>=0.10 in /opt/anaconda3/envs/py311/lib/python3.11/site-packages (from matplotlib) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in /opt/anaconda3/envs/py311/lib/python3.11/site-packages (from matplotlib) (4.51.0)
Requirement already satisfied: kiwisolver>=1.3.1 in /opt/anaconda3/envs/py311/lib/python3.11/site-packages (from matplotlib) (1.4.4)
Requirement already satisfied: numpy>=1.21 in /opt/anaconda3/envs/py311/lib/python3.11/site-packages (from matplotlib) (1.26.4)
Requirement already satisfied: packaging>=20.0 in /opt/anaconda3/envs/py311/lib/python3.11/site-packages (from matplotlib) (24.1)
Requirement already satisfied: pillow>=8 in /opt/anaconda3/envs/py311/lib/python3.11/site-packages (from matplotlib) (10.3.0)
Requirement already satisfied: pyparsing>=2.3.1 in /opt/anaconda3/envs/py311/lib/python3.11/site-packages (from matplotlib) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7 in /opt/anaconda3/envs/py311/lib/python3.11/site-packages (from matplotlib) (2.9.0.post0)
Requirement already satisfied: six>=1.5 in /opt/anaconda3/envs/py311/lib/python3.11/site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)

Get Sentiment Values

# Import SentimentIntensityAnalyzer
from nltk.sentiment.vader import SentimentIntensityAnalyzer

# Ensure vader_lexicon is downloaded
nltk.download('vader_lexicon')

# Initialize VADER sentiment analyzer
analyzer = SentimentIntensityAnalyzer()

# Example text
text = "VADER is a great tool for sentiment analysis!"

# Get sentiment scores
scores = analyzer.polarity_scores(text)
# Display sentiment scores
print(scores)
{'neg': 0.0, 'neu': 0.577, 'pos': 0.423, 'compound': 0.6588}

Your Turn

Replace the text with what you are interested, and use VADER to get the sentiment value.

# YOUR CODE IS HERE

Sentiment Over Time

How to display the sentiment over time?

# Import needed libraries
import pandas as pd
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import matplotlib.pyplot as plt

# Example dataset
data = {
    'timestamp': ['2021-02-10', '2021-02-11', '2021-02-12'],
    'text': [
        "What common core equation do they use to arrive at this prediction. Can't wait to rub it in their face when that are wrong AGAIN!",
        "I can't help but think that the dumbing down of the American demographic is a direct result of 'common core'; 'no child left behind' as ways of allowing 'good enough for govt work' to be passed off to the next person 'responsible.' Now the passed off own false entitlement",
        "I just got an 'explainer' on how to read my child's report card. Now, if I can get an explainer on how to do second-grade common core math, I'll really be in good shape. #hopeful."
    ]
}

# Create DataFrame
df = pd.DataFrame(data)

# Convert timestamp to datetime
df['timestamp'] = pd.to_datetime(df['timestamp'])

# Get sentiment values
analyzer = SentimentIntensityAnalyzer()

# Define a function to get sentiment scores
def get_sentiment_scores(text):
    scores = analyzer.polarity_scores(text)
    return scores['compound']  # Using compound score for overall sentiment

# Apply sentiment analysis to each row
df['compound_score'] = df['text'].apply(get_sentiment_scores)

# Resample by time unit and calculate the mean sentiment score (example: daily sentiment average)
sentiment_daily = df.set_index('timestamp').resample('D')['compound_score'].mean()

# Plot sentiment over time 
plt.figure(figsize=(10, 6))
plt.plot(sentiment_daily.index, sentiment_daily, marker='o', linestyle='-', color='b')
plt.title('Sentiment Analysis Over Day')
plt.xlabel('Date')
plt.ylabel('Average Sentiment Score')
plt.grid(True)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

Your Turn

Instead of sentiment over days, display sentiment over month.

# YOUR CODE IS HERE