Hannah Lee lee-H1208

🦆 Hi, I'm Hannah!

About Me

🏫 Junior majoring in Computer Science and Applied Mathematics and Statistics at the Honors College @ SBU
🌠 Passionate about all things machine learning
🗽 Based in Queens, New York

I’m driven, detail-oriented, and care deeply about the quality of my work, especially when it comes to research and machine learning projects. I'm grateful to be part of Break Through Tech AI, a program that supports women and other underrepresented groups in STEM. It's helped me build confidence in my technical skills and grow as a collaborator. Through this program, I was matched with Snowflake to work as a fellow in a group project on a real-world ML project. At the McKinnon-Rosati Lab, I’m currently leading a small project group working on a doublet detection model. These experiences have been both challenging and rewarding, and it’s deepened my interest in applying machine learning to real-world problems!

🧪 Featured Project: Doublet Detection

What We Did: Developed a lightweight machine learning pipeline combining unsupervised clustering and supervised classification (XGBoost) to detect doublets in single-cell RNA sequencing data. Preprocessed biological datasets, generated artificial doublets, extracted co-expression features, and trained models to distinguish singlets from doublets.

Tools: Python, scikit-learn, XGBoost, Leiden clustering, PCA, Jupyter notebooks

Result: Achieved an overall accuracy of 86.5%, with balanced precision (57%) and recall (49.5%) across multiple benchmark datasets, demonstrating competitive performance and improved doublet detection reliability.

🚌 Featured Project: Brooklyn Transit Demand Dashboard

What We Did: Developed a machine learning pipeline to forecast transit demand by ZIP code in Brooklyn based on projected population increases. Preprocessed real-world datasets (OSM, MTA, population data), calculated demand scores, and implemented a Streamlit dashboard to visualize demand patterns and transportation deserts.

Tools: Python, Pandas, scikit-learn, HistGradientBoostingRegressor, Snowflake, PyDeck, Streamlit, Jupyter notebooks

Result: Achieved a cross-validation R² of 0.739, highlighting areas of high transit demand. The dashboard provides actionable insights for urban planners and local authorities to identify underserved neighborhoods and allocate resources effectively.

Tech Stack

Other Projects

🤖 Image Classification on CIFAR-10 Dataset

What We Did: Built and trained a convolutional neural network (CNN) using TensorFlow on the CIFAR-10 dataset for image classification. Preprocessed data by normalizing pixel values, designed a 4-layer CNN with batch normalization and dropout, and optimized hyperparameters using grid search to improve model accuracy.

Tools: Python, TensorFlow, Keras, CIFAR-10 dataset

Result: Achieved ~81.5% training accuracy and ~80.5% testing accuracy, demonstrating effective generalization with minimal overfitting on a multi-class image classification task.

📊 Doublet Detection Analysis

What We Did: Compared and evaluated six doublet detection methods on single-cell RNA sequencing datasets. Oversaw integration of code for multiple tools, designed experiments to generate artificial doublets, and conducted re-analyses to assess method biases and robustness.

Tools: Python, R, Jupyter Notebooks, scDblFinder, Scrublet, COMPOSITE, DoubletDetection, and data visualization libraries.

Results: Identified strengths and limitations of current doublet detection methods, highlighting areas for improvement. Findings will help guide users and inform future development of more accurate and robust doublet detection tools.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hannah Lee lee-H1208

Organizations

Block or report lee-H1208