I am a Ph.D. student in the Human–Computer Interaction Institute (HCII) within the School of Computer Science at Carnegie Mellon University. I am fortunate to be advised by Ken Holstein and Steven Wu.

I develop tools for measuring the capabilities, risks, and limitations of AI systems. I study statistical approaches for evaluating AI systems themselves, as well as frameworks for understanding the broader sociotechnical context in which humans operate and interact with AI systems. My work bridges ideas from ML, Statistics, Human–Computer Interaction, and the Quantitative Social Sciences to advance an emerging interdisciplinary science of AI evaluation.

My work is generously supported by an NSF Graduate Research Fellowship, the Center for Advancing Safety of Machine Intelligence, and the National Institute for Standards and Technology (NIST).

Luke Guerdan
lguerdan [at] andrew.cmu.edu

Recent News


Jan 2026 Our paper Doubly-Robust LLM-as-a-Judge: Externally Valid Estimation with Imperfect Personas was accepted at ICLR 2026. See you in Rio!
Jan 2026 I presented at the Bridging Prediction and Intervention Problems in Social Systems Workshop at the Simons Institute for the Theory of Computing.
Dec 2025 I am excited to present Validating LLM-as-a-Judge Systems under Rating Indeterminacy at NeurIPS. Check out the code release, tutorial, and blog post about the work to learn more.
Dec 2025 I presented at the Evaluating Evaluations and FAR.AI Alignment workshops in San Diego.
Aug 2025 I am organizing a webinar series on AI Evaluation Science, hosted through the AI Measurement Science & Engineering Research Center at CMU. Watch the recordings here!
Jun 2025 Our paper Measurement as Bricolage: How Data Scientists Construct Target Variables for Predictive Modeling Tasks was accepted at CSCW 2025.

Selected Work

(*) Co-first Author; (**) Co-senior Author
  1. Doubly-Robust LLM-as-a-Judge: Externally Valid Estimation with Imperfect Personas Luke Guerdan*, Justin Whitehouse*, Kimberly Truong*, Kenneth Holstein, and Zhiwei Steven Wu International Conference on Learning Representations (ICLR), 2026 [arXiv]
  2. Validating LLM-as-a-Judge Systems under Rating Indeterminacy Luke Guerdan, Solon Barocas, Kenneth Holstein, Hanna Wallach, Zhiwei Steven Wu, and Alexandra Chouldechova Advances in Neural Information Processing Systems (NeurIPS), 2025 [arXiv] [Blog] [Code]
  3. Bridging Prediction and Intervention Problems in Social Systems Lydia T. Liu*, Inioluwa Deborah Raji*, Angela Zhou*, Luke Guerdan, Jessica Hullman, Daniel Malinsky, and 30 Additional Authors arXiv, 2025 [arXiv]
  4. Measurement as Bricolage: How Data Scientists Construct Target Variables for Predictive Modeling Tasks Luke Guerdan*, Devansh Saxena*, Stevie Chancellor**, Zhiwei Steven Wu**, and Kenneth Holstein** Conference on Computer-Supported Cooperative Work and Social Computing (CSCW), 2025 [arXiv]
  5. Predictive Performance Comparison of Decision Policies Under Confounding Luke Guerdan, Amanda Coston, Kenneth Holstein, and Zhiwei Steven Wu International Conference on Machine Learning (ICML), 2024 [arXiv] [Code]
  6. Training Towards Critical Use: Learning to Situate AI Predictions Relative to Human Knowledge Anna Kawakami, Luke Guerdan, Yang Cheng, Kate Glazko, Matthew Lee, Scott Carter, Nikos Arechiga, Haiyi Zhu, and Kenneth Holstein Conference on Collective Intelligence, 2023
  7. Counterfactual Prediction Under Outcome Measurement Error Luke Guerdan, Amanda Coston, Kenneth Holstein, and Zhiwei Steven Wu Conference on Fairness, Accountability, and Transparency (FAccT), 2023 [PDF] [Video] [Code]  Best Paper Award
  8. Ground(less) Truth: A Causal Framework for Proxy Labels in Human-Algorithm Decision-Making Luke Guerdan, Amanda Coston, Zhiwei Steven Wu, and Kenneth Holstein Conference on Fairness, Accountability, and Transparency (FAccT), 2023 [PDF] [Video]