Submitted by red-fox-yj 65 ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development OpenMOSS 24 4
Submitted by unilm 39 Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge Microsoft Research 109 4
Submitted by JiayuJeff 30 NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems · 10 authors 24 2
Submitted by ChongCong 15 Medical SAM3: A Foundation Model for Universal Prompt-Driven Medical Image Segmentation · 10 authors 99 4
Submitted by taesiri 12 The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models · 5 authors 91 2
Submitted by BiaoGong 8 CoDance: An Unbind-Rebind Paradigm for Robust Multi-Subject Animation · 8 authors 4
Submitted by BounharAbdelaziz 8 YaPO: Learnable Sparse Activation Steering Vectors for Domain Adaptation MBZUAI-IFM Paris Lab 10 3
Submitted by rzdiversity 7 Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs · 8 authors 10 3
Submitted by wanng 7 SIN-Bench: Tracing Native Evidence Chains in Long-Context Multimodal Scientific Interleaved Literature Tsinghua IIGroup 4 3
Submitted by ralfroemer 4 CLARE: Continual Learning for Vision-Language-Action Models via Autonomous Adapter Routing and Expansion Technical University of Munich 25 3