Xinyue (Camellia) Rui

Xinyue (Camellia) Rui

PhD student in Biostatistics

University of Southern California

Biography

I am a PhD student in Biostatistics at University of Southern California, working on developing machine learning and statistical methods on elucidating genetic architecture of complex diseases with Prof. Nick Mancuso and Prof. Steven Gazal. During my undergraduate at USC, I served as a research assistant at center of genetic epidemiology, working on developing statistical analysis pipeline to assess the genotyping imputation quality of diverse populations worldwide with Prof. Charleston Chiang.

I’m passionate about applying cutting edge machine learning/statistical methods in industry. Actively looking for 2025 summer intern opportunties! Happy to connect via LinkedIn, and may be reached at crui@usc.edu.

Interests

  • Statistical Genetics
  • Population Genetics
  • Machine Learning

Education

  • Biostatistics Ph.D, 2022-Present

    University of Southern California

  • Biostatistics M.S., 2020-2022

    University of Southern California

  • Mathematics B.A., 2019-2022

    University of Southern California

Skills

Python

R

Statistics

Experience

 
 
 
 
 

Research Assistant - PerturbVI

with Prof. Nicholas Mancuso

Mar 2024 – Present Los Angeles, California
  • Helped developing a machine learning method PerturbVI that discovered gene regulatory networks with CRISPR perturbation data and single-cell RNA-seq data using Variational Inference and Jax in a team of three
  • Simulated model misspecification of latent variables using Python and improved 6.5% sensitivity compared to existing methods
  • Enabled ultra-fast inference speed with an average convergence time of 70x faster on the largest scale perturbation matrix (310,385 x 8563) than the existing method
 
 
 
 
 

Research Assistant - Single Cell Fine Mapping

with Prof. Nicholas Mancuso & Prof. Steven Gazal

Aug 2022 – Present Los Angeles, California
  • Developed a machine learning method SCFM that identifies gene-to-disease associations on the largest-scale single-cell RNA-seq data (4.1GB), utilizing coordinate ascent variational inference
  • Achieved an average of 32% improvement in sensitivity and discovered an average of 15% more genetic variants when benchmarking against the existing method through extensive simulations
  • Built a new Python package implementing SCFM framework with Jax to achieve ultra-fast computing speed with an average inference time 15x faster than the existing method (1.3s vs 20s)
  • Enabled robustness on calibration and model misspecification over 4000+ simulation scenarios and benchmarked the method against baseline and other published models
  • Accepted as the first-author abstract to a top-tier conference American Society of Human Genetics
 
 
 
 
 

Research Assistant - Worldwide Imputation Analysis

with Prof. Charleston Chiang

May 2020 – May 2022 Los Angeles, CA

Responsibilities include:

  • Built a statistical analysis pipeline using Python and R and conducted the experiments for accessing genotype imputation quality over 123 populations

  • Discovered that the imputation quality fell short 6.5%–42% in imputation R square among minority populations compared to European controls

  • Raised minority awareness by presenting research results during the undergraduate poster session and awarded Provost Research Fellowship twice (in fall 2020 and fall 2021)

  • Findings were published in the top tier journal (AJHG, IF=12.6)

Accomplish­ments

Keck School of Medicine/Graduate School Fellowship

For incoming PhD students whose combination of background and training will make a substantive, documentable, and unique contribution to the program as assessed by faculty

Jennifer Battat Scholarship

Recognized exceptional transfer students majoring in Economics or Mathematics, honor academic and personal achievements and continued contributions as an outstanding member of the USC Dornsife student community

Provost’s Research Fellowship

Awarded 100 students at USC on excellent independent research project with a faculty member

Teaching Experience

Graduate Teaching Assistant

PM 520: Advanced Statistical Computing
University of Southern California, Spring 2024
Professor: Nicholas Mancuso

  • This course introduces students to theory and hands-on programming underlying advanced statistical computing. Topics include numerical stability (e.g., solving linear systems, “logsumexp” trick), optimization techniques (e.g., gradient descent, natural gradient descent), automatic differentiation, and scalability for large datasets (e.g., variational inference).

Teaching Assistant

LA’s Best: Los Angeles Biostatistics and Data Science Summer Training Program
University of Southern California, Summer 2024
Professor: Kelly Street

  • Assisted research projects in biostatistics and data science for future graduate students, helping them explore research interests.

Recent Publications

Imputation accuracy across global human populations

Genotype imputation is now fundamental for genome-wide association studies but lacks fairness due to the underrepresentation of …

Estimating heritability explained by local ancestry and evaluating stratification bias in admixture mapping from summary statistics

The heritability explained by local ancestry markers in an admixed population provides crucial insight into the genetic architecture of …