Benchmarking LLMs’ Judgments with No Gold Standard
Published in Proceedings of the Thirteenth International Conference on Learning Representations (ICLR 2025), 2025
We introduce an evaluation metric for LLM-generated judgments in settings without gold-standard references.
Recommended citation: Shengwei Xu, Yuxuan Lu, Grant Schoenebeck, and Yuqing Kong. (2025). "Benchmarking LLMs' Judgments with No Gold Standard." Proceedings of the Thirteenth International Conference on Learning Representations (ICLR 2025).
Download Paper
