Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players Paper • 2605.28816 • Published 7 days ago • 414
Learning POMDP World Models from Observations with Language-Model Priors Paper • 2605.13740 • Published 21 days ago • 6
SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise Paper • 2602.12783 • Published Feb 13 • 246
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence Paper • 2605.12882 • Published 21 days ago • 269
KernelBench-X: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels Paper • 2605.04956 • Published 28 days ago • 7
From Context to Skills: Can Language Models Learn from Context Skillfully? Paper • 2604.27660 • Published about 1 month ago • 166
ImplicitMemBench: Measuring Unconscious Behavioral Adaptation in Large Language Models Paper • 2604.08064 • Published Apr 9 • 8