This repository contains my coursework for CSE849, a graduate-level Deep Learning course completed as part of my Master’s in Computer Science and Engineering. It includes four projects and one theoretical assignment, demonstrating my proficiency in designing, implementing, and analyzing deep learning models. The projects emphasize PyTorch, Convolutional Neural Networks (CNNs), Transformers, Natural Language Processing (NLP), and NumPy, showcasing my readiness for machine learning engineering roles.
Description: Developed a PyTorch pipeline from scratch for a regression task, generating a synthetic dataset and training a linear model, then applied a multi-layer perceptron (MLP) to a provided dataset (hw0/CSE_849___Project_0.pdf).
Approach: Created a custom Dataset class to generate synthetic data using the formula 1D -> 10D -> 10D -> 1D) using AdamW, visualizing predictions across multiple seeds.
Tools: PyTorch, NumPy, Matplotlib.
Results: Achieved low training and validation MSE for the synthetic dataset, with visualized loss curves (hw0/results/q2_plot.png). For the provided dataset, tuned hyperparameters (batch size, learning rate) to optimize validation performance, with predictions plotted for seeds 1-5 (hw0/results/q3_plot.png).
Key Skills: PyTorch fundamentals, PyTorch pipeline development, custom dataset creation, NumPy data generation, MLP implementation.
Description: Implemented gradient descent to optimize a 2D spiral function and built a multi-layer perceptron (MLP) with backpropagation to predict annual rainfall in Michigan from 2D coordinates (hw1/cse849_hw1.pdf).
Approach: For gradient descent, optimized a 2D tensor to minimize 2D → 100D → 100D → 1D) with ReLU activations, implementing custom Linear, ReLU, and MSELoss classes in PyTorch. Trained the MLP to predict rainfall, minimizing mean squared error via backpropagation.
Tools: PyTorch, NumPy, Matplotlib.
Results: Produced trajectory plots showing convergence to the origin for the spiral function across learning rates (hw1/results/plots/). For the MLP, achieved low training and validation MSE, with predictions saved (hw1/results/q2_ytest.txt) and loss curves plotted.
Key Skills: Backpropagation, gradient descent, PyTorch module implementation, NumPy for gradient computation.
Description: Derived gradients for batch normalization to understand its role in stabilizing deep learning models (hw2/CSE 849 Deep Learning HW2.pdf).
Approach: Provided mathematical proofs for the forward and backward passes of batch normalization by deriving each component.
Tools: None (theoretical assignment).
Results: Successfully derived gradients, enhancing understanding of normalization techniques critical for CNNs and other architectures.
Key Skills: Mathematical foundations of deep learning, batch normalization theory.
Description: Implemented a 5-layer CNN to classify composite images (CIFAR-10 sub-image top-left, MNIST sub-image bottom-right) into 10 classes based on the CIFAR-10 label, avoiding shortcut learning from MNIST label correlations in the training set (hw3/CSE_849___Project_2.pdf).
Approach: Designed a CNN in PyTorch with five convolutional layers (16, 32, 48, 64, 80 output channels), each followed by batch normalization, ReLU, and max pooling (except the last), plus adaptive average pooling and a linear layer (128D → 10D). Used torchvision’s ImageFolder for train/validation and a custom dataset for test. Applied preprocessing (random flips, normalization, and augmentations (GaussianNoise, RandomErasing). Trained with cross-entropy loss, tuning hyperparameters to focus on CIFAR-10 features. Visualized first-layer filters and computed classwise activation norms for the first and fifth layers to analyze filter behavior.
Tools: PyTorch, NumPy, torchvision, Matplotlib.
Results: Achieved robust validation accuracy by mitigating shortcut learning, with test predictions saved (hw3/results/q1_test.txt). Visualized 16 first-layer filters as RGB images (hw3/results/q2_filters/) and plotted 96 bar plots of classwise activations (hw3/results/q3_filters/), revealing low-level edge detection in early layers and class-specific patterns in later layers.
Key Skills: CNN architecture design, PyTorch implementation, data augmentation, filter visualization, model analysis.
Description: Developed sequence models for two NLP tasks: predicting Yelp review ratings (1-5 stars) using an RNN and translating English to Pig Latin using a Transformer (hw4/CSE_849___Project_3.pdf).
Approach: For review rating prediction, implemented a 2-layer RNN in PyTorch with 50D hidden vectors, using fine-tuned 50D GloVe embeddings (glove/modified_glove_50d.pt). Processed variable-length reviews with a custom collate function to create packed sequences of embeddings. Fed RNN outputs to a linear classifier, trained with cross-entropy loss. For Pig Latin translation, built a Transformer with 2 encoder and 2 decoder layers, 2 attention heads, and 100D embeddings for a 30-character vocabulary (alphabets, space, <SOS>, <EOS>, <PAD>). Added positional encodings and trained with cross-entropy and MSE losses, using autoregressive decoding for inference. Saved checkpoints for both tasks.
Tools: PyTorch, NumPy, Matplotlib, Seaborn
Results: For review rating, achieved high validation accuracy with clear loss curves and confusion matrices (hw4/results/plots/), and saved test predictions (hw4/results/q1_test.txt). For Pig Latin, generated accurate translations, with test outputs saved (hw4/results/q2_test.txt) and strong validation performance reported (>99.0%).
Key Skills: RNNs, Transformers, NLP, GloVe embeddings, PyTorch sequence modeling.
Implemented diffusion models for unconditional and conditional sample generation on the "States" dataset, a 2D synthetic dataset of 5,000 points forming outlines of five U.S. states (Ohio, Wisconsin, Oklahoma, Idaho, Michigan). The project included three tasks: unconditional generation, training a classifier for state labels, and conditional generation guided by the classifier.
- Task 1: Unconditional Generation:
- Developed a denoising MLP (
MLP, 4 layers, 256 units each) to estimate noiseε̂from noisy samplesxtand timestept, using PyTorch. - Implemented forward diffusion with a linear
βschedule (β0=1e-4,βT=0.02,T=500), computingα,α_bar, and noisy samplesxt = √α_bar_t * x0 + √(1-α_bar_t) * ε. - Trained the denoiser with MSE loss (
||ε - ε̂||2^2), batch size 10,000, 300 epochs, AdamW optimizer (lr=1e-3,weight_decay=1e-7), and StepLR scheduler (step=2,γ=0.99). - Sampled 5,000 points from
pinit = N(0,I), denoising over 500 steps to generatepdatasamples, evaluating negative log-likelihood (NLL) with Gaussian KDE.
- Developed a denoising MLP (
- Task 2: Classifier Training:
- Built a classifier MLP (
MLP, 3 layers: 100, 200, 500 units) to predict state labels (5 classes) from noisy samplesxtand timestept. - Trained with cross-entropy loss, batch size 10,000, 50 epochs, Adam optimizer (
lr=1e-3,weight_decay=1e-4), and ReduceLROnPlateau scheduler (factor=0.5,patience=3). - Generated a prediction map visualizing classifier outputs on a grid, using
ListedColormapfor state-specific colors.
- Built a classifier MLP (
- Task 3: Conditional Generation:
- Reused the unconditional denoiser and trained classifier, loading weights from
denoiser.ptandclassifier.pt. - Implemented guided sampling by computing gradients of the classifier’s log-softmax output for a target label, adjusting the denoising step with
eps_hat = eps - √(1-α_bar_t) * cls_grad. - Generated 5,000 samples per state (5 batches of 1,000), evaluating NLL and saving scatter plots.
- Reused the unconditional denoiser and trained classifier, loading weights from
- PyTorch: Built and trained MLPs for denoising and classification, managed GPU operations.
- NumPy: Handled data preprocessing and sampling.
- Matplotlib: Visualized scatter plots, training curves, and classifier prediction maps.
- SciPy: Computed NLL using
gaussian_kde. - Python: Integrated data loading, model training, and sampling pipelines.
- Unconditional Generation:
- Generated 5,000 samples resembling the States dataset’s distribution, saved as
uncond_gen_samples.pt. - Produced training loss and NLL curves, saved as
train_logs.png, showing convergence. - Saved per-epoch scatter plots in
outputs/plots/unconditional_generation/steps/.
- Generated 5,000 samples resembling the States dataset’s distribution, saved as
- Classifier Training:
- Achieved low cross-entropy loss, with training curve saved as
train_logs.png. - Generated a prediction map (
classifier_predictions.png), color-coding state classifications with decision boundaries.
- Achieved low cross-entropy loss, with training curve saved as
- Conditional Generation:
- Generated 5,000 samples per state, saved as
cond_gen_samples_{label}.npy. - Produced state-specific scatter plots (
label_{0-4}.png), visually matching state outlines. - Reported NLL for each state, indicating quality of conditional samples.
- Generated 5,000 samples per state, saved as
- Output: Saved models (
denoiser.pt,classifier.pt), plots, and samples inoutputs/plots/andcheckpoints/.
- Diffusion model implementation.
- Probabilistic generative modeling.
- MLP design and training.
- Classifier-guided sampling.
- Visualization of 2D data distributions.
Deep Learning:
- Designed and trained advanced architectures, including Convolutional Neural Networks (CNNs) for robust image classification, Transformers for sequence-to-sequence translation, RNNs for text classification, MLPs for regression tasks, and Diffusion models for unconditional and conditional generative tasks.
- Implemented gradient descent and backpropagation manually, and explored batch normalization theoretically, ensuring a strong foundation in neural network mechanics.
PyTorch Proficiency:
- Leveraged PyTorch extensively to build, train, and evaluate models across all projects.
- Used PyTorch’s tensor operations and autograd to implement custom datasets and linear models in Project 0, developed custom Linear, ReLU, and MSELoss modules for gradient descent and backpropagation in Project 1, constructed CNNs with convolutional, batch normalization, and pooling layers in Project 2, implemented RNNs with packed sequences and Transformers with multi-head attention and positional encodings in Project 3, and implemented MLPs for denoising and classification components of a generative diffusion model in Project 4.
- Utilized PyTorch’s optimizers (SGD, Adam, AdamW) and loss functions (MSE, cross-entropy) to optimize model performance, achieving low errors and high accuracy.
Libraries and Tools:
- NumPy: Applied for data preprocessing and computation, including synthetic data generation with noise in Project 0, gradient calculations for the spiral function in Project 1, image preprocessing in Project 2, text indexing in Project 3, and processing 2D data and performing sampling operations in Project 4.
- torchvision: Employed for dataset loading (e.g., ImageFolder for composite images) and image transformations (e.g., normalization, augmentations) in Project 2, and data utilities in Project 0 and Project 3.
- Matplotlib: Created visualizations like loss curves and prediction plots in Project 0, trajectory plots for gradient descent in Project 1, filter visualizations and activation bar plots in Project 2, loss curves and confusion matrices in Project 3, and prediction maps for model evaluation in Project 4.
- SciPy: Evaluated sample quality with KDE-based NLL in Project 4.
NLP Capabilities:
- Developed RNNs for review rating prediction using fine-tuned GloVe embeddings and Transformers for Pig Latin translation with learned character embeddings and positional encodings.
- Handled variable-length sequences with custom collation and autoregressive decoding, achieving strong performance in classification and translation tasks.
Technical Proficiency:
- Combined theoretical insights (e.g., batch normalization derivations, analytical gradients) with practical implementation, building robust ML pipelines.
- Demonstrated ability to preprocess diverse data types (synthetic, spatial coordinates, composite images, text), mitigate biases like shortcut learning, and analyze models via visualizations, aligning with machine learning engineering demands.
- Applied probabilistic modeling concepts (e.g., forward/reverse diffusion, score-matching) to practical tasks.
- Managed large-scale training (e.g., 2.5M samples, 10K batches) with memory-efficient preprocessing.
- Delivered well-documented code and visualizations, suitable for research and engineering roles.










