Benchmarking LLMs’ Judgments with No Gold Standard

Published in Proceedings of the Thirteenth International Conference on Learning Representations (ICLR 2025), 2025

We introduce an evaluation metric for LLM-generated judgments in settings without gold-standard references.

Recommended citation: Shengwei Xu, Yuxuan Lu, Grant Schoenebeck, and Yuqing Kong. (2025). "Benchmarking LLMs' Judgments with No Gold Standard." Proceedings of the Thirteenth International Conference on Learning Representations (ICLR 2025).
Download Paper

Share on

Twitter Facebook LinkedIn