About Me
Hi, I’m Mia. I am currently a PhD student at Columbia University. I hold a MS degree in Applied Statistics from New York University and a BS degree in Applied Statistics from Qingdao University. Before working and stuyding at Columbia, I was an adjunct instructor at New York University and worked as a research assistant for Professor Marc Scott. My research interest lies in
- Statistical machine learning and its implementation in biology and bioimage analysis.
- Sequence analysis.
- Bootstrap methods and fiducial methods.
- I’m also open to explore other research areas.
In my spare time, I enjoy cycling and playing tennis. A fun fact about me: I survived a summer in the Bay Area relying solely on my bicycle for transportation! I also expand my passion for machine learning and computer vision into my daily life. Alan Cha and I are building a wigglegram camera using Raspberry Pi. Some wigglegrams can be accessed here.
Research Experience
New York University
Research Assistant
September 2023 - February 2024
- Assistant to Professor Marc Scott improving frequency plots by employing Hamming distance to cluster analogous sequences in sequence analysis
Qingdao University
Research Assistant
September 2020 - August 2021
- Developed four interval estimation methods for zero-inflated gamma distributions based on fiducial inference, Box-Cox transformation, parametric bootstrap, and the method of variance of estimates recovery (MOVER); designed and conducted computer experiment to compare the performance of the four estimation methods (code can be found here)
- Designed confidence intervals constructed using a pivot quantity for semiparametric single-index models; carried out simulations for these new methods in R
- Established a novel addable ordinary differential equation model with the group MCP penalty for reconstruction of gene regulatory networks with data from DREAM Systems Biology Challenges
Teaching and Professional Experience
Columbia University
Teaching Assistant
- 2024 Fall UN2103 Applied Linear Regression Analysis
- 2025 Spring UN2104 Applied Categirical Data Analysis
Red Snapper Camera LLC
Co-Founder and Software Engineer
February 2024 - 2024 September
- Constructed a camera with four lenses, supported by a Raspberry Pi chip. The camera is capable of capturing four pictures simultaneously and processing them into a GIF-format wiggle image
- Utilized Haar Cascade classifiers in OpenCV to detect faces in images, relocated pictures around detected faces to generate wiggle images using the Pillow package in Python
- Built a graphical user interface (GUI) using Tkinter in Python, providing an easier and more user-intuitive image processing experience
New York University
Adjunct Instructor
August 2023 - February 2024
- Course assistant for Multilevel Modeling for 60 Master’s and PhD students; explained hierarchical linear model concepts and LMER syntax in mini lab sessions during office hours
- Lab instructor for Statistical Computing class; prepared lab activities focusing on practicing statistical analysis, graphic representations, and statistical simulation using R
Coluxa Inc.
Algorithm Engineering Intern
June 2022 - August 2022
Coluxa Inc. is developing an innovative microscope system to support life science research and disease diagnosis.
- Developed innovative deep learning algorithms to analyze fluorescent markers in human cells, revamping an otherwise labor-intensive manual process and develop a new AI-assisted microscope
- Automated bioimage processing and analysis by tuning algorithm parameters and implemented using TensorFlow
- Processed image data with OpenCV and sped up model training by 40% by optimizing data delivery using CUDA
- Built visualizations with Seaborn to evaluate and enhance algorithm performance and analyze scanned objects
- Slides
Keyanshe
Data Analyst Intern
April 2021 - June 2021
- Optimized and automated data collection processes by 70% (in time) by building web scrapers with Python urllib and parsed real estate market data with Beautiful Soup
- Conducted market research and collected 500+ pieces of real estate data in collaboration with stakeholders
- Cleaned and synthesized messy data using Pandas; developed internal PostgreSQL databases by collaborating with the data and consulting departments
Education
Columbia University
Ph.D. in Statistics
2024 September - Current
New York University
M.S. in Statistics, GPA: 3.97/4
September 2021 - May 2023
- Coursework: Algorithms, Machine Learning, Causal Inference, Multi-Level Modeling, Experimental Design
- Leadership Activity: Society for Statistics Club Board Member (American Statistical Association)
Qingdao University
B.S. in Statistics, GPA: 3.87/4
September 2017 - June 2021
- Math Coursework: Real Analysis, Algebra, Mathematical Analysis, Mathematical Optimization, Differetial Equation
- Statistics Coursework: Probability, Time Series, Stochastic Process
- Computer Science Coursework: Algorithms, C++, Matlab, Statistical Computing
- Finance Coursework: Microeconomics, Macroeconomics, Life Insurance Actuarial Science
Projects
Python: GeoPandas, Pandas, Sklearn, Geojson
- Processed 2 million geographic collision data and traffic data coming from NYC Open Data, using geopandas for data set merging and geoplot for geographic data plotting; standardized and modularized the code in Python
- Performed supervised machine learning algorithms (e.g. random forest, KNN) on the merged tables to predict injury and mortality risks and achieved a recall score of 0.8
R; Python: Tweepy, Json
- Constructed Twitter bots using Tweepy API in Python and improved efficiency by involving exceptions and OOP; obtained and processed tweets from several influential accounts in JSON objects
- Performed text analysis with tidytext to identify cryptocurrency related influential tweets and built time series models to predict the daily returns of Ethereum based on related tweets and several other assets; achieved 1.9 MSE
Student Record Management System
C++, MySQL
- Designed a login and register system for students and administrators to maintain school records with C++
- Developed a database using MySQL and connected the management system to the MySQL database