The Amazing Agent Race: Strong Tool Users, Weak Navigators Paper • 2604.10261 • Published 6 days ago • 4
Benchmarking Cognitive Biases in Large Language Models as Evaluators Paper • 2309.17012 • Published Sep 29, 2023 • 3
Improving Iterative Text Revision by Learning Where to Edit from Other Revision Tasks Paper • 2212.01350 • Published Dec 2, 2022 • 1
Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models Paper • 2504.20157 • Published Apr 28, 2025 • 37