- π« Junior majoring in Computer Science and Applied Mathematics and Statistics at the Honors College @ SBU
- π Passionate about all things machine learning
- π½ Based in Queens, New York
Iβm driven, detail-oriented, and care deeply about the quality of my work, especially when it comes to research and machine learning projects. I'm grateful to be part of Break Through Tech AI, a program that supports women and other underrepresented groups in STEM. It's helped me build confidence in my technical skills and grow as a collaborator. Through this program, I was matched with Snowflake to work as a fellow in a group project on a real-world ML project. At the McKinnon-Rosati Lab, Iβm currently leading a small project group working on a doublet detection model. These experiences have been both challenging and rewarding, and itβs deepened my interest in applying machine learning to real-world problems!
What We Did: Developed a lightweight machine learning pipeline combining unsupervised clustering and supervised classification (XGBoost) to detect doublets in single-cell RNA sequencing data. Preprocessed biological datasets, generated artificial doublets, extracted co-expression features, and trained models to distinguish singlets from doublets.
Tools: Python, scikit-learn, XGBoost, Leiden clustering, PCA, Jupyter notebooks
Result: Achieved an overall accuracy of 86.5%, with balanced precision (57%) and recall (49.5%) across multiple benchmark datasets, demonstrating competitive performance and improved doublet detection reliability.
What We Did: Developed a machine learning pipeline to forecast transit demand by ZIP code in Brooklyn based on projected population increases. Preprocessed real-world datasets (OSM, MTA, population data), calculated demand scores, and implemented a Streamlit dashboard to visualize demand patterns and transportation deserts.
Tools: Python, Pandas, scikit-learn, HistGradientBoostingRegressor, Snowflake, PyDeck, Streamlit, Jupyter notebooks
Result: Achieved a cross-validation RΒ² of 0.739, highlighting areas of high transit demand. The dashboard provides actionable insights for urban planners and local authorities to identify underserved neighborhoods and allocate resources effectively.
What We Did: Built and trained a convolutional neural network (CNN) using TensorFlow on the CIFAR-10 dataset for image classification. Preprocessed data by normalizing pixel values, designed a 4-layer CNN with batch normalization and dropout, and optimized hyperparameters using grid search to improve model accuracy.
Tools: Python, TensorFlow, Keras, CIFAR-10 dataset
Result: Achieved ~81.5% training accuracy and ~80.5% testing accuracy, demonstrating effective generalization with minimal overfitting on a multi-class image classification task.
What We Did: Compared and evaluated six doublet detection methods on single-cell RNA sequencing datasets. Oversaw integration of code for multiple tools, designed experiments to generate artificial doublets, and conducted re-analyses to assess method biases and robustness.
Tools: Python, R, Jupyter Notebooks, scDblFinder, Scrublet, COMPOSITE, DoubletDetection, and data visualization libraries.
Results: Identified strengths and limitations of current doublet detection methods, highlighting areas for improvement. Findings will help guide users and inform future development of more accurate and robust doublet detection tools.