JustToThePoint English Website Version
JustToThePoint en español
JustToThePoint in Thai

Natural Language Processing II. Sentiment Analysis. Spelling check. TextBlob

Sentiment Analysis.

Sentiment Analysis is a sub-field of NLP that tries to identify and extract ideas and opinions of an event, product, etc. within a given text across blogs, reviews, tweets, etc.

TextBlob is a Python library for processing textual data. It provides a consistent API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, and more. Installation:

pip install -U textblob
python -m textblob.download_corpora

Textblob sentiment analyzer returns two properties for a given input sentence:

from textblob import TextBlob

Our input sentence is taken from the article, The Matrix revolutions, https://www.rogerebert.com/reviews/the-matrix-revolutions-2003

sentence = '''Still, in a basic and undeniable sense, this is a good movie, and fans who have earned their credit hours with the first two will want to see this one. "The Matrix Revolutions" is a terrific action achievement.'''

# Firstly, we create a TextBlob object.
analysis = TextBlob(sentence)
# Secondly, we show its sentiment property.
print(analysis.sentiment)

Sentiment(polarity=0.21..., subjectivity=0.43...)

Let’s see other examples:

sentence = '''The movie was awful and crappy but on such a painful level, so we walked out before it was over.'''
Sentiment(polarity=-0.56..., subjectivity=0.79...)

sentence = '''The boiling point of water is 212 degrees Fahrenheit or 100 degrees Celsius. Water has three states solid as ice, liquid as water, and gas as vapor.'''
Sentiment(polarity=0.0, subjectivity=0.1)

Sentiment Analysis with Twitter

from textblob import TextBlob
# The textblob.sentiments module contains two sentiment analysis implementations: PatternAnalyzer and NaiveBayesAnalyzer (an NLTK classifier trained on a movie reviews corpus).
from textblob.sentiments import NaiveBayesAnalyzer
# Tweepy is an easy-to-use Python library for accessing the Twitter API. 
import tweepy, credentials

Credentials.py is a file that stores our API keys, passwords, URLs, etc. You should add credentials.py to the gitignore file.

api_key = 'XXXXXXXXXXXXXXXXXXXXXXXXX' 
api_secret = 'XXXXXXXXXXXXXXXXXXXXXXXXX' 
access_token = 'XXXXXXXXXXXXXXXXXXXXXXXXX' 
access_secret = 'XXXXXXXXXXXXXXXXXXXXXXXXX'

The first step of Authentication is to create an OAuthHandler instance. We need to pass to the constructor our consumer and secret key.

auth = tweepy.auth.OAuthHandler(credentials.api_key, credentials.api_secret)
auth.set_access_token(credentials.access_token, credentials.access_secret)
api = tweepy.API(auth, wait_on_rate_limit=True) # auth is the authentication handler to be used. It is a good idea to set wait_on_rate_limit to True as twitter may block you from using its API if it finds that you are exceeding some user limit.

positive_tweets = 0
positive_total = 0
negative_tweets = 0
negative_total = 0

api.search_tweets returns a collection of relevant tweets matching a specified query. Parameters:

for tweet in api.search_tweets(q="covid restrictions", lang="en", count=50):
    print(tweet.text)
    # We apply the "NaiveBayesAnalyzer sentiment" analysis implementation.
    blob_object = TextBlob(tweet.text, analyzer=NaiveBayesAnalyzer())
    # The NaiveBayesAnalyzer returns its result as a namedtuple of the form: Sentiment(classification, p_pos, p_neg).
    analysis = blob_object.sentiment
    print(analysis)
    if analysis.classification=='pos':
        positive_tweets += 1
        positive_total += analysis.p_pos
    else:
        negative_tweets += 1
        negative_total += analysis.p_neg

print(f"Positive tweets: {positive_tweets}. Average positive sentiment: {positive_total/positive_tweets}")
print(f"Negative tweets: {negative_tweets}. Average negative sentiment: {negative_total/negative_tweets}")

RT @BBCSport: Festive sport in Wales will be held behind closed doors as the Welsh government continues its fight against Covid-19’s Omicro… Sentiment(classification=‘pos’, p_pos=0.99…, p_neg=0.00…)
RT @VanIslandHealth: To help keep people safe over the holidays, new provincial health officer orders are in effect as of December 20 until… Sentiment(classification=‘pos’, p_pos=0.96…, p_neg=0.034…) […]
Positive tweets: 42. Average positive sentiment: 0.7959131642223822 Negative tweets: 8. Average negative sentiment: 0.8266023849542835

Observe that most of the tweets are positive. Their average sentiment is quite close to 1.

Sentiment Analysis on Movie Reviews

We are going to try to assess the sentiment of movie reviews. Large Movie Review Dataset v1.0 and the IMDB Movie Reviews Dataset consist of thousands of movie reviews each of which is categorized as a positive or negative.

import pandas as pd
from textblob import TextBlob

def sentiment(reviews, positive):

sentiment has two parameters: reviews (a list of movie reviews) and positive (it indicates if the reviews are positive or negative). It will return the average polarity of correctly classified reviews by the TextBlob’s sentiment analysis implementation and the percentage of correctly classified reviews.

    sumPolarity = correct = 0
    for review in reviews: # It loops through the reviews.
        analysis = TextBlob(review) # It creates a TextBlob object. The constructor takes as a parameter a review.
        
        if positive and analysis.sentiment.polarity > 0:
            correct += 1
            sumPolarity += analysis.sentiment.polarity
        elif not positive and analysis.sentiment.polarity <= 0:
            correct += 1
            sumPolarity += analysis.sentiment.polarity
        
    return sumPolarity/correct, correct/len(reviews)*100.0 

def sentimentSubjectivity(reviews, positive):

sentimentSubjectivity has two parameters: reviews (a list of movie reviews) and positive (it indicates if the reviews are positive or negative). It will return the average polarity of correctly classified reviews by the TextBlob’s sentiment analysis implementation and the percentage of correctly classified reviews but it will only consider those reviews which have already been classified as very subjective or opinionated (analysis.sentiment.subjectivity > 0.7)

    count = correct = sumPolarity = 0
    for review in reviews:
        analysis = TextBlob(review)
        if analysis.sentiment.subjectivity > 0.7:
            if positive and analysis.sentiment.polarity > 0:
                correct += 1
                sumPolarity += analysis.sentiment.polarity
            elif not positive and analysis.sentiment.polarity < 0:
                correct += 1
                sumPolarity += analysis.sentiment.polarity
            count += 1
        
    return sumPolarity/correct, correct/count*100.0 

# pd.read_csv reads a comma-separated values (csv) file, our dataset, into a DataFrame.
data = pd.read_csv('dataset.csv')
print(data.head())
print(len(data))

# We initialize our positive and negative reviews lists.
positiveReviews=[]
negativeReviews=[]

# The dataset.csv file has two columns: text which consists of movie reviews and label which consists of the values 1 or 0. 1 stands for positive sentiment and 0 stands for negative sentiment. We are going to loop through the DataFrame to separate positive and negative reviews.
for i in range(0,len(data)-1):
    if data.label[i]==1:
        positiveReviews.append(data.text[i])
        
    else:
        negativeReviews.append(data.text[i])
        
avgPolarity, correct = sentiment(positiveReviews, True)
avgPolarity2, correct2 = sentimentSubjectivity(positiveReviews, True)

print(f"Total positive reviews {len(positiveReviews)}. Positive accuracy: {correct:.2f}. Average sentiment: {avgPolarity:.2f}. Accuracy considering subjectivity: {correct2:.2f}. Average sentiment considering subjectivity: {avgPolarity2:.2f}.")

avgPolarity, correct = sentiment(negativeReviews, False)
avgPolarity2, correct2 = sentimentSubjectivity(negativeReviews, False)      

print(f"Total negative reviews {len(negativeReviews)}. Negative accuracy: {correct:.2f}. Average sentiment: {avgPolarity:.2f}. Accuracy considering subjectivity: {correct2:.2f}. Average sentiment considering subjectivity: {avgPolarity2:.2f}.")

Total positive reviews 2505. Positive accuracy: 94.05. Average sentiment: 0.21. Accuracy considering subjectivity: 95.54. Average sentiment considering subjectivity: 0.36.
Total negative reviews 2494. Negative accuracy: 42.82. Average sentiment: -0.11. Accuracy considering subjectivity: 72.60. Average sentiment considering subjectivity: -0.24.

Spelling check using Python

There are various ways and libraries to perform spelling and grammar checking using Python.

from textblob import TextBlob 
from gingerit.gingerit import GingerIt # It provides correcting spelling and grammar mistakes based on the context of complete sentences. It is a wrapper around the gingersoftware.com API. Installation: pip install gingerit.

def spellcheck(text):
    txt = TextBlob(text) # First, we create a TextBlob object.
    correct_txt = txt.correct() # Then, we use the correct() method to attempt spelling correction. 
    parser = GingerIt()

Next, we use the GingerIt library. The Python library pyspellchecker is also designed to be easy to use to get basic spell checking.

    return correct_txt, parser.parse(text)['result']
 
def main():    
    text = "The smelt of fliwers bring back memories. He love playing fotball."
    print("Original Text: " + text)  
    sc1, sc2 = spellcheck(text)
    print("Spell Check results:")
    print(sc1)
    print(sc2)

if __name__ == "__main__":
    main()

Original Text: The smelt of fliwers bring back memories. He love playing fotball.
Spell Check results: The smelt of flowers bring back memories. He love playing football.
The smell of flowers brings back memories. He loves playing football.

Translation with Python.

from textblob import TextBlob # TextBlob relies on Google Translate's API. In other words, it requires an active internet connection for performing translations.
blob = TextBlob("I swear I couldn’t love you more than I do right now, and yet I know I will tomorrow.") # We create a TextBlob object.
print(blob.translate(to='th')) # We use the TextBlob translate() method. It accepts two arguments from_lang and to. The from_lang is automatically set depending on the language it detects.
print(blob.translate(to='es'))
print(blob.translate(to='hi'))
print(blob.translate(to='ja'))

ฉันสาบานว่าไม่สามารถรักคุณมากไปกว่าตอนนี้ แต่ฉันรู้ว่าฉันจะทำในวันพรุ่งนี้
Te juro que no podría amarte más de lo que te amo en este momento y, sin embargo, sé que lo haré mañana.
मैं कसम खाता हूं कि मैं अभी जितना प्यार करता हूं उससे ज्यादा प्यार नहीं कर सकता, और फिर भी मुझे पता है कि मैं कल करूंगा। 私は今よりもあなたを愛することができなかったと誓いますが、それでも私は明日になることを知っています。

An alternative is Translate Python. Translate is a simple but powerful translation tool written in python with support for various translation providers such as Microsoft Translation API and Translated MyMemory API.

from translate import Translator # pip install translate

translator= Translator(to_lang="th")
print(translator.translate("You know you’re in love when you can’t fall asleep because reality is finally better than your dreams."))

คุณรู้ว่าคุณกำลังมีความรักเมื่อคุณนอนไม่หลับเพราะในที่สุดความจริงก็ดีกว่าความฝันของคุณ

Bitcoin donation

JustToThePoint Copyright © 2011 - 2024 Anawim. ALL RIGHTS RESERVED. Bilingual e-books, articles, and videos to help your child and your entire family succeed, develop a healthy lifestyle, and have a lot of fun. Social Issues, Join us.

This website uses cookies to improve your navigation experience.
By continuing, you are consenting to our use of cookies, in accordance with our Cookies Policy and Website Terms and Conditions of use.