Adaptive Evaluations Collection Datasets for our paper, Adaptively profiling models with task elicitation (EMNLP 2025). • 6 items • Updated about 13 hours ago
BrachioLab/dist-defense-traces-taskname-split-augmented-plus-synth-v15 Viewer • Updated 4 days ago • 367k • 16
BrachioLab/dist-defense-traces-taskname-split-augmented-plus-synth-v15 Viewer • Updated 4 days ago • 367k • 16
BrowserArena: Evaluating LLM Agents on Real-World Web Navigation Tasks Paper • 2510.02418 • Published Oct 2, 2025 • 2
Adaptive Evaluations Collection Datasets for our paper, Adaptively profiling models with task elicitation (EMNLP 2025). • 6 items • Updated about 13 hours ago