I am a Ph.D. student in the Human–Computer Interaction Institute (HCII) within the School of Computer Science at Carnegie Mellon University. I am fortunate to be advised by Ken Holstein and Steven Wu.
I develop tools for measuring the capabilities, risks, and limitations of AI systems. I study statistical approaches for evaluating AI systems themselves, as well as frameworks for understanding the broader sociotechnical context in which humans operate and interact with AI systems. My work bridges ideas from ML, Statistics, Human–Computer Interaction, and the Quantitative Social Sciences to advance an emerging interdisciplinary science of AI evaluation.
My work is generously supported by an NSF Graduate Research Fellowship, the Center for Advancing Safety of Machine Intelligence, and the National Institute for Standards and Technology (NIST).
Recent News
| Jan 2026 | Our paper Doubly-Robust LLM-as-a-Judge: Externally Valid Estimation with Imperfect Personas was accepted at ICLR 2026. See you in Rio! |
|---|---|
| Jan 2026 | I presented at the Bridging Prediction and Intervention Problems in Social Systems Workshop at the Simons Institute for the Theory of Computing. |
| Dec 2025 | I am excited to present Validating LLM-as-a-Judge Systems under Rating Indeterminacy at NeurIPS. Check out the code release, tutorial, and blog post about the work to learn more. |
| Dec 2025 | I presented at the Evaluating Evaluations and FAR.AI Alignment workshops in San Diego. |
| Aug 2025 | I am organizing a webinar series on AI Evaluation Science, hosted through the AI Measurement Science & Engineering Research Center at CMU. Watch the recordings here! |
| Jun 2025 | Our paper Measurement as Bricolage: How Data Scientists Construct Target Variables for Predictive Modeling Tasks was accepted at CSCW 2025. |
Selected Work
(*) Co-first Author; (**) Co-senior Author- Doubly-Robust LLM-as-a-Judge: Externally Valid Estimation with Imperfect Personas International Conference on Learning Representations (ICLR), 2026 [arXiv]
- Measurement as Bricolage: How Data Scientists Construct Target Variables for Predictive Modeling Tasks Conference on Computer-Supported Cooperative Work and Social Computing (CSCW), 2025 [arXiv]
- Training Towards Critical Use: Learning to Situate AI Predictions Relative to Human Knowledge Conference on Collective Intelligence, 2023
- Counterfactual Prediction Under Outcome Measurement Error Conference on Fairness, Accountability, and Transparency (FAccT), 2023 [PDF] [Video] [Code] Best Paper Award