Alexander Koujianos Goldberg

Improving Decision-Making from Distributed Human Evaluations

Abstract

Many consequential decisions rely on aggregating noisy judgments from distributed human evaluators, often without access to an objective ground truth. Scientific peer review and grant funding are canonical examples, but similar challenges arise in hiring, admissions, and evaluation of AI models. The goal of this thesis is to understand and mitigate errors in distributed human evaluation to make better decisions. Toward this end, we conduct controlled experiments in real review processes and develop principled algorithms with provable guarantees.

In the first part of the thesis, we present experimental evidence on sources of error and the effectiveness of interventions in scientific peer review. We present results from two human-subjects experiments that we conducted at large-scale ML/AI conferences. The first studies whether meta-evaluation can improve review quality; the second studies a live deployment of a large language model assistant for authors. Together, these experiments demonstrate both the promise and limits of interventions aimed at improving review quality.

In the second part of the thesis, we develop algorithms for making selection decisions from error-prone evaluations. Randomized selection has been gaining adoption in scientific funding. However, existing designs of these peer review "lotteries" are often ad hoc. We formalize the desiderata motivating the use of randomized decisions in peer review and show that prior methods fail to meet their intended goals. We then develop efficient algorithms with provable guarantees that better address the motivations for randomization.

Finally, in the third part of the thesis, we study how to release data about evaluation processes without compromising participant privacy. Outside scrutiny of high-stakes evaluation systems requires data about reviews and outcomes, but releasing such data can re-identify participants or leak sensitive personal information. We examine this tension in pseudonymous time-series data and fraud-detection graph data, demonstrating practical privacy attacks and developing provably sound mechanisms for sharing useful data while protecting anonymity.

Thesis Committee

Keywords

Thesis Document