Sentiment Analysis Project

Author: Chunduri Aditya

Project Overview

This project involves building a text classification model to analyze the sentiment of movie reviews. The goal is to classify the reviews as either positive or negative using different deep learning architectures. The models are developed using Keras and Python, and their performance is evaluated based on training and testing accuracy.

Project Structure

The project is divided into the following sections:

Data Exploration and Pre-processing
Word Embeddings
Modeling:
- Multi-Layer Perceptron (MLP)
- Convolutional Neural Network (CNN)
- Long Short-Term Memory (LSTM)
Evaluation and Results

Dataset

The dataset consists of movie reviews divided into two categories:

Positive Reviews: Located in the Datasets/Project_data/pos/ directory.
Negative Reviews: Located in the Datasets/Project_data/neg/ directory.

Each review is stored in a text file. Files with numbers 0-699 are used for training, and files with numbers 700-999 are used for testing.

Data Exploration and Pre-processing

Text Cleaning: Removal of punctuation, numbers, and stopwords.
Tokenization: Conversion of text into sequences of integers based on the frequency of words.
Review Length Analysis: Calculation of average and standard deviation of review lengths.
Padding/Truncating: Reviews are truncated or padded to a fixed length (based on the 90th percentile of review lengths).

Word Embeddings

Vocabulary Size: Limited to the top 2500 words.
Embedding Dimension: Set to 32.
Embedding Layer: Converts integer sequences into dense word vectors.

Modeling

1. Multi-Layer Perceptron (MLP)

Architecture:
- Embedding Layer
- Flatten Layer
- Three Dense Layers with 50 ReLU units each
- Dropout layers to prevent overfitting
- Sigmoid output layer
Optimizer: Adam
Loss Function: Custom Binary Cross-Entropy
Accuracy:
- Training: 49%
- Testing: 53.5%

2. Convolutional Neural Network (CNN)

Architecture:
- Embedding Layer
- Conv1D Layer with 32 filters and kernel size of 3
- MaxPooling1D Layer
- Flatten Layer
- Three Dense Layers with 50 ReLU units each
- Dropout layers to prevent overfitting
- Sigmoid output layer
Optimizer: Adam
Loss Function: Custom Binary Cross-Entropy
Accuracy:
- Training: 55.8%
- Testing: 67.6%

3. Long Short-Term Memory (LSTM)

Architecture:
- Embedding Layer
- LSTM Layer with 32 units
- Dense Layer with 256 ReLU units
- Dropout layers to prevent overfitting
- Sigmoid output layer
Optimizer: Adam
Loss Function: Custom Binary Cross-Entropy
Accuracy:
- Training: 80.2%
- Testing: 79.5%

Evaluation and Results

The project compares the performance of three different models (MLP, CNN, LSTM) in terms of training and testing accuracy. Among the models, the LSTM model showed the highest performance, indicating its effectiveness in handling sequential data for sentiment analysis.

References

Usage

To run the project, ensure you have the necessary libraries installed and execute the script in a Python environment.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Data		Data
Notebook		Notebook
Final Project.pdf		Final Project.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sentiment Analysis Project

Project Overview

Project Structure

Dataset

Data Exploration and Pre-processing

Word Embeddings

Modeling

1. Multi-Layer Perceptron (MLP)

2. Convolutional Neural Network (CNN)

3. Long Short-Term Memory (LSTM)

Evaluation and Results

References

Usage

About

Uh oh!

Releases

Packages

Languages

Chunduri-Aditya/Sentimental-Analysis

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis Project

Project Overview

Project Structure

Dataset

Data Exploration and Pre-processing

Word Embeddings

Modeling

1. Multi-Layer Perceptron (MLP)

2. Convolutional Neural Network (CNN)

3. Long Short-Term Memory (LSTM)

Evaluation and Results

References

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages