Skip to content
View lee-H1208's full-sized avatar

Organizations

@SciBorgs

Block or report lee-H1208

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
lee-H1208/README.md

πŸ¦† Hi, I'm Hannah!

About Me

  • 🏫 Junior majoring in Computer Science and Applied Mathematics and Statistics at the Honors College @ SBU
  • 🌠 Passionate about all things machine learning
  • πŸ—½ Based in Queens, New York

I’m driven, detail-oriented, and care deeply about the quality of my work, especially when it comes to research and machine learning projects. I'm grateful to be part of Break Through Tech AI, a program that supports women and other underrepresented groups in STEM. It's helped me build confidence in my technical skills and grow as a collaborator. Through this program, I was matched with Snowflake to work as a fellow in a group project on a real-world ML project. At the McKinnon-Rosati Lab, I’m currently leading a small project group working on a doublet detection model. These experiences have been both challenging and rewarding, and it’s deepened my interest in applying machine learning to real-world problems!

πŸ§ͺ Featured Project: Doublet Detection

What We Did: Developed a lightweight machine learning pipeline combining unsupervised clustering and supervised classification (XGBoost) to detect doublets in single-cell RNA sequencing data. Preprocessed biological datasets, generated artificial doublets, extracted co-expression features, and trained models to distinguish singlets from doublets.

Tools: Python, scikit-learn, XGBoost, Leiden clustering, PCA, Jupyter notebooks

Result: Achieved an overall accuracy of 86.5%, with balanced precision (57%) and recall (49.5%) across multiple benchmark datasets, demonstrating competitive performance and improved doublet detection reliability.

🚌 Featured Project: Brooklyn Transit Demand Dashboard

What We Did: Developed a machine learning pipeline to forecast transit demand by ZIP code in Brooklyn based on projected population increases. Preprocessed real-world datasets (OSM, MTA, population data), calculated demand scores, and implemented a Streamlit dashboard to visualize demand patterns and transportation deserts.

Tools: Python, Pandas, scikit-learn, HistGradientBoostingRegressor, Snowflake, PyDeck, Streamlit, Jupyter notebooks

Result: Achieved a cross-validation RΒ² of 0.739, highlighting areas of high transit demand. The dashboard provides actionable insights for urban planners and local authorities to identify underserved neighborhoods and allocate resources effectively.

Tech Stack

Other Projects

πŸ€– Image Classification on CIFAR-10 Dataset

What We Did: Built and trained a convolutional neural network (CNN) using TensorFlow on the CIFAR-10 dataset for image classification. Preprocessed data by normalizing pixel values, designed a 4-layer CNN with batch normalization and dropout, and optimized hyperparameters using grid search to improve model accuracy.

Tools: Python, TensorFlow, Keras, CIFAR-10 dataset

Result: Achieved ~81.5% training accuracy and ~80.5% testing accuracy, demonstrating effective generalization with minimal overfitting on a multi-class image classification task.


πŸ“Š Doublet Detection Analysis

What We Did: Compared and evaluated six doublet detection methods on single-cell RNA sequencing datasets. Oversaw integration of code for multiple tools, designed experiments to generate artificial doublets, and conducted re-analyses to assess method biases and robustness.

Tools: Python, R, Jupyter Notebooks, scDblFinder, Scrublet, COMPOSITE, DoubletDetection, and data visualization libraries.

Results: Identified strengths and limitations of current doublet detection methods, highlighting areas for improvement. Findings will help guide users and inform future development of more accurate and robust doublet detection tools.

Contact Me

LinkedIn Gmail

Pinned Loading

  1. doublet-detection doublet-detection Public

    Interpretable and efficient machine learning algorithm for doublet detection in single-cell RNA sequencing, including preprocessing, clustering, and XGBoost classification.

    Jupyter Notebook

  2. Snowflake-1A-BreakThroughTech/AI-Studio-Project Snowflake-1A-BreakThroughTech/AI-Studio-Project Public

    Predicts transit demand by Brooklyn ZIP code using ML and visualizes transportation deserts with an interactive Streamlit dashboard.

    Python 2

  3. image-classification image-classification Public

    Simple CNN for CIFAR-10 with data preprocessing and hyperparameter tuning.

    Jupyter Notebook

  4. doublet-detection-analysis doublet-detection-analysis Public

    URECA Symposium Poster Source Code

    Jupyter Notebook

  5. The-Den The-Den Public

    Study / Productivity Webpage Project for HopperHacks 2024

    CSS