Luke Guerdan

I am a Ph.D. student in the Human–Computer Interaction Institute (HCII) within the School of Computer Science at Carnegie Mellon University. I am fortunate to be advised by Ken Holstein and Steven Wu.

I develop tools for measuring the capabilities, risks, and limitations of AI systems. I study statistical approaches for evaluating AI systems themselves, as well as frameworks for understanding the broader sociotechnical context in which humans operate and interact with AI systems. My work bridges ideas from ML, Statistics, Human–Computer Interaction, and the Quantitative Social Sciences to advance an emerging interdisciplinary science of AI evaluation.

My work is generously supported by an NSF Graduate Research Fellowship, the Center for Advancing Safety of Machine Intelligence, and the National Institute for Standards and Technology (NIST).

lguerdan [at] cs.cmu.edu

Recent News

Apr 2025	I gave a talk on Validating LLM-as-a-Judge Systems in the Absence of Gold Labels at the Amazon Responsible AI Science Meeting.
Mar 2025	I am excited to share a new preprint on Validating LLM-as-a-Judge Systems in the Absence of Gold Labels, based on internship work at Microsoft Research. Comments and feedback welcome.
Oct 2024	I presented A Methodological Framework for Human-Algorithm Performance Comparisons Under Uncertainty at INFORMS 2024.
Oct 2024	I gave a talk Towards Principled Evaluation Under Imperfect Labels at the MILA/McGill NLP reading group.
May 2024	I gave a talk at the workshop on Bridging Prediction and Intervention Problems in Social Systems at Banff International Research Station.
May 2024	This summer, I will intern with Alexandra Chouldechova, Solon Barocas and Hanna Wallach in the Fairness, Accountability, Transparency and Ethics (FATE) group at Microsoft Research NYC.
May 2024	New work on Predictive Performance Comparison of Decision Policies Under Confounding accepted at ICML 2024.
Feb 2024	I gave a talk Human-Algorithm Decision-Making Under Imperfect Proxy Labels at the 2024 Lecture Series on Network Inequality at CSH Vienna.

Selected Work

Validating LLM-as-a-Judge Systems in the Absence of Gold Labels Luke Guerdan, Solon Barocas, Kenneth Holstein, Hanna Wallach, Zhiwei Steven Wu, and Alexandra Chouldechova Under Review, 2025 [arXiv]
Measurement as Bricolage: How Data Scientists Construct Target Variables for Predictive Modeling Tasks Luke Guerdan*, Devansh Saxena*, Stevie Chancellor**, Zhiwei Steven Wu**, and Kenneth Holstein** Proceedings of the ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW), 2025 [arXiv]
Predictive Performance Comparison of Decision Policies Under Confounding Luke Guerdan, Amanda Coston, Kenneth Holstein, and Zhiwei Steven Wu Proceedings of the International Conference on Machine Learning (ICML), 2024 [arXiv] [Code]
Counterfactual Prediction Under Outcome Measurement Error Luke Guerdan, Amanda Coston, Kenneth Holstein, and Zhiwei Steven Wu Proceedings of the ACM Conference on Fairness, Accountability, and Transparency (FAccT), 2023 [PDF] [Video] [Code] Best Paper Award
Ground(less) Truth: A Causal Framework for Proxy Labels in Human-Algorithm Decision-Making Luke Guerdan, Amanda Coston, Zhiwei Steven Wu, and Kenneth Holstein Proceedings of the ACM Conference on Fairness, Accountability, and Transparency (FAccT), 2023 [PDF] [Video]
Under-reliance or Misalignment? How Proxy Outcomes Limit Measurement of Appropriate Reliance in AI-assisted Decision-Making Luke Guerdan, Kenneth Holstein, and Zhiwei Steven Wu ACM CHI 2022 Workshop on Trust and Reliance in AI-Human Teams (CHI TRAIT), 2022 [PDF] [Video] Spotlight Talk