Benchmarking LLMs’ Judgments with No Gold Standard

Published in Proceedings of the Thirteenth International Conference on Learning Representations (ICLR 2025), 2025

We introduce an evaluation metric for LLM-generated judgments in settings without gold-standard references.

Recommended citation: Shengwei Xu, Yuxuan Lu, Grant Schoenebeck, and Yuqing Kong. (2025). "Benchmarking LLMs' Judgments with No Gold Standard." Proceedings of the Thirteenth International Conference on Learning Representations (ICLR 2025).
Download Paper