Evaluating Sentiment Analysis Mechanism for Labelled Amazon Reviews



Journal Title

Journal ISSN

Volume Title



Sentiment analysis has become increasingly important in understanding customer opinions, feedback, and preferences towards products and services, particularly on marketplaces like Amazon. Researchers have proposed various techniques and algorithms for sentiment analysis. However, there still lacks a good guidance that can systematically direct data scientists to select appropriate algorithms and models, although a few efforts have been made. This thesis aims to fill the gap by presenting a comprehensive evaluation on different sentiment analysis mechanisms for labeled Amazon reviews. To achieve the above goal, we first prepare an accurately labelled Amazon review dataset through manually labeling. This builds a solid foundation for our evaluation. Then, we evaluate the effectiveness of popular mechanisms used in sentiment analysis, including both data preprocessing techniques such as Bag of Words (BOW), Term Frequency- Inverse Document Frequency (TF-IDF) weighting, spell correction, stemming, and lemmatization, and various sentiment analysis models such as K-Nearest Neighbors (KNN), Logistic Regression (LR), Support Vector Machine (SVM), Artificial Neural Network (ANN), Long Short-Term Memory (LSTM), and Bidirectional Encoder Representations from Transformers (BERT). These mechanisms were selected based on their prominence in the field of sentiment analysis, their potential to yield high-accuracy results, and their representation of different designs. We conducted five experiments using a combination of above data preprocessing techniques and analysis models. Through these experiments, we aim to identify a set of optimal combinations of preprocessing techniques and classification models that demonstrate superior performance in sentiment analysis of labeled Amazon reviews. The experiment results show that the use of BERT with BOW, TF-IDF, Spell Correction, and Lemmatization achieved the highest accuracy of 98.99%, outperforming other combinations. The addition of TF-IDF weighting, spell correction, stemming, and lemmatization improves the accuracy of four analysis models by about 6%, i.e., from 87.34% to 93.4% for KNN, from 86.6% to 94.22% for SVM, from 90.68% to 96.87% for ANN, and from 92.87% to 97.95% for LSTM. However, LR shows a comparatively lower accuracy ranging from 74.32% to 81.09% regardless different preprocessing techniques due to its limitations as a linear model, which may struggle to capture complex patterns and non-linear relationships in the sentiment data. This work provides insights into the effectiveness of different data processing and analysis mechanisms for sentiment analysis of labeled Amazon reviews. The findings can be applied to improve the effectiveness of customer review analysis to help achieve higher level of customer satisfaction, which can be essential in areas such as product and business strategy development.



Sentiment Analysis, Amazon Review Dataset, Evaluation, Manual Labelling, Text pre-processing