view article Article IBM and UC Berkeley Diagnose Why Enterprise Agents Fail Using IT-Bench and MAST 12 days ago • 16
view article Article From Golden Gate Bridge to Broken JSON: Why Anthropic's SAE Steering Fails for Structured Output 22 days ago • 21
view article Article Community Evals: Because we're done trusting black-box leaderboards over the community +5 26 days ago • 82
view article Article Nemotron ColEmbed V2: Raising the Bar for Multimodal Retrieval with ViDoRe V3’s Top Model 26 days ago • 28
view article Article Training Design for Text-to-Image Models: Lessons from Ablations 27 days ago • 65
view article Article Llasa Goes RL: Training LLaSA with GRPO for Improved Prosody and Expressiveness Nov 5, 2025 • 12
Nemotron ColEmbed V2 Collection State-of-the-Art Late Interaction Vision-Language Embedding Models • 3 items • Updated 6 days ago • 10
view article Article AssetOpsBench: Bridging the Gap Between AI Agent Benchmarks and Industrial Reality Jan 21 • 31
view article Article LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family Jan 19 • 85
LightOnOCR-2 🦉 Collection LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family • 12 items • Updated 11 days ago • 22