view article Article GGML and llama.cpp join HF to ensure the long-term progress of Local AI +4 6 days ago • 432
TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents Paper • 2602.07274 • Published 19 days ago • 204
view article Article Community Evals: Because we're done trusting black-box leaderboards over the community +5 22 days ago • 80
view article Article Open ASR Leaderboard: Trends and Insights with New Multilingual & Long-Form Tracks +2 Nov 21, 2025 • 26
DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints Paper • 2601.18137 • Published about 1 month ago • 27
OpenEnv Environment Hub v0.2.0 Collection A collection of OpenEnv-spec Environments • 11 items • Updated 3 days ago • 24
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning Paper • 2502.01100 • Published Feb 3, 2025 • 21
view article Article AssetOpsBench: Bridging the Gap Between AI Agent Benchmarks and Industrial Reality Jan 21 • 31
deployed-models Collection Models that are currently deployed by the hf-inference provider • 1470 items • Updated 4 days ago • 34
view article Article Tokenization in Transformers v5: Simpler, Clearer, and More Modular +4 Dec 18, 2025 • 120
view article Article The Open Evaluation Standard: Benchmarking NVIDIA Nemotron 3 Nano with NeMo Evaluator Dec 17, 2025 • 47
view article Article Phare LLM benchmark V2: Reasoning models don't guarantee better security Dec 16, 2025 • 10
view article Article Tensor Parallelism (TP) in Transformers: 5 Minutes to Understand Dec 4, 2025 • 65
SciCode: A Research Coding Benchmark Curated by Scientists Paper • 2407.13168 • Published Jul 18, 2024 • 17
view article Article Transformers v5: Simple model definitions powering the AI ecosystem +2 Dec 1, 2025 • 302
MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents Paper • 2404.10774 • Published Apr 16, 2024 • 6
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization Paper • 2402.13249 • Published Feb 20, 2024 • 15