top of page

Classifying Amazon Reviews: Sentiment Analysis with NLTK and 🤗 Transformers




Motivation

Customer reviews have always played an instrumental part in the development of businesses. With the help of Machine Learning and Large Language Models (LLMs), the ability to identify customer sentiment from these reviews on a large scale is certainly beneficial to companies to act upon appropriately.


In this article, we will explore three different approaches to analyze sentiment using Amazon's Fine Food Reviews dataset. We will also learn more about Hugging Face and the renowned open-sourced Transformers library it offers.




Project Overview

  • Analyzed sentiment of Amazon reviews using three approaches: nltk's VADER Sentiment Scoring , roBERTa pretrained model, and Hugging Face's pipeline

  • Reviewed and compared results between approaches



Resources and Tools

Language: Python 3.8

Packages: pandas, numpy, matplotlib, seaborn, nltk, transformers

Platform: Kaggle


 

Table of Contents:

a. Transformers Library

b. Pipeline


a. NLTK's VADER Sentiment Scoring

b. roBERTa Pre-trained Model

c. Hugging Face's Pipeline


 

I. Background on Hugging Face


Hugging Face is a prominent platform and company in the field of Machine Learning and Deep Learning. The company is widely recognized for its contributions to the development and democratization of state-of-the-art Generative AI models and tools.

Some innovative contributions that Hugging Face has made to the ML community are:

  • Transformers Library: A comprehensive collection of pre-trained models for various NLP tasks. These models are based on transformer architecture and include popular ones like BERT, GPT, and more.

  • Model Hub: The Hugging Face Model Hub serves as a central repository for sharing, discovering, and using pre-trained models. It enables researchers and developers to access a diverse range of models for tasks such as text classification, language translation, and sentiment analysis.

  • Tokenizers: Hugging Face provides efficient tokenization tools for processing text data, essential for preparing input for NLP models. The tokenizers are designed to work seamlessly with the models available in the Transformers library.

... and others.



a. Transformers Library:

Hugging Face's Transformers Library is specifically designed to facilitate the the use of transformer-based models, which have been proven to be highly effective in a variety of NLP tasks.

The library provides APIs and tools to download and train pre-trained models: ranging from popular architectures like BERT (Bidirectional Encoder Representations from Transformers), GPT (Generative Pre-trained Transformer), to newer innovations. These models are pre-trained on large datasets and fine-tuned for specific NLP tasks.

The tasks can include text classification, named entity recognition, question answering, summarization, translation, and more. This allows users to leverage state-of-the-art models without the need for extensive training on task-specific datasets.


The pipeline() method from the Transformers library makes it very simple to use any model from the Model Hub for inference on any language, computer vision, speech, and multimodal tasks.


Examples of pipeline usage from Hugging Face's official documentation:

  • Specify inference task:


  • Specify model:




II. Sentiment Analysis


Now that we know how Hugging Face and its innovation can help us, let's analyze some customer reviews!

As mentioned in the motivation, we are going to use three methods to analyze the sentiment: (1) NLTK's VADER Sentiment Scoring, (2) roBERTa Pretrained Model, and (3) Hugging Face's pipeline() method



a. Basic NLTK & VADER Sentiment Scoring


Before using VADER Sentiment Scoring, we will use the NLTK library to do some basic processing. We will follow these steps:

  • Tokenize the text

  • Perform parts-of-speech (POS) tagging

  • Group POS into meaningful chunks


The dataset has 9 columns, with the 'Text' column being the reviews.


We will extract an example from the column and tokenize it:


Next, we will perform parts-of-speech tagging. Essentially, we will assign a particular part of a speech to each word based on definition and context.

To understand the POS acronyms, check out this documentation.


Next, we will group these POS parts into chunks.


VADER Sentiment Scoring


Now we can move on to analyze sentiment using VADER or Valence Aware Dictionary and sEntiment Reasoner). The key feature of VADER is the ability to handle sentiment analysis in both polarity (positive, negative, or neutral) and intensity of the sentiment.


VADER will assign a sentiment score to each word in a given text. The score will then be aggregated into a compound score. The text will also be given scores by polarity (positive, negative or neutral) - the higher the score, the more intense that sentiment is.


Without further ado, let's dive into the code:





Now we will plot the results

From the three bar charts at the bottom, we see that, for reviews that have been identified as 'positive', the score for 4 & 5-star reviews are high. In contrast, for negatively identified reviews, the score for 1 & 2-star reviews are high. This means that the model has done a decent job in classifying the reviews.

b. roBERTa Pretrained Model


Let's move on to the second approach. We will be using a pre-trained model cardiffnlp/twitter-roberta-base-sentiment to classify the sentiment. This model has been trained on a huge amount of data and can not only accounts for the sentiment of the words but also the context of the entire sentence.


We can easily load the model from Hugging Face:


We will run roBERTa on our previous extracted example and then on the entire dataset:



Below are the results:



c. pipeline()


As mentioned in the background section, the pipeline() method belongs to the Transformers library. It is a quick and easy way to run sentiment predictions. You can specify the task or the model you want, and pipeline() will take care of the rest.


Let's give it a try:


And just like that, we were able to predict the sentiment in just a few lines of code.


III. Compare Results

Let's now compare the results between VADER Sentiment Scorer and roBERTa model. We can visualize the results using seaborn's pairplot.




Focusing on the plots along the diagonal line, we can see the performance of the two models:

  • VADER: For vader_pos, the distribution of 5-star reviews (in purple) looks like centering in the middle.

  • roBERTa: For roberta_pos, the distribution of 5-star reviews centers clearly in the right.

Thus, we can see that roBERTa did a better job identifying the sentiments of the reviews.



In addition, we also want to look at some examples where the two models perform poorly i.e. classify positive while it's actually negative or vice versa.


What we can conclude from these examples is that both negative and positive words are present in the sentence, but the models were confused as to which sentiment is more prominent, thus leading to the error.



Conclusion


In conclusion, we have analyzed Amazon customer reviews using three approaches. Out of the three, the roBERTa Pretrained Model outperformed thanks to the large amount of pretraining data.


The code shown throughout this article is available in this notebook!

 

bottom of page