Nathan Habib's picture

Building on HF

Nathan Habib PRO

SaylorTwift

huggingface

·

AI & ML interests

Evals

Recent Activity

liked a model about 13 hours ago

LiquidAI/LFM2-24B-A2B

new activity about 15 hours ago

Qwen/Qwen3.5-27B:Add evaluation results

liked a model about 15 hours ago

Qwen/Qwen3.5-27B

View all activity

Organizations

upvoted a collection about 17 hours ago

Qwen3.5

9 items • Updated about 11 hours ago • 375

upvoted an article 5 days ago

Article

GGML and llama.cpp join HF to ensure the long-term progress of Local AI

+4

6 days ago

•

432

upvoted a paper 13 days ago

Kimi K2.5: Visual Agentic Intelligence

Paper • 2602.02276 • Published 23 days ago • 244

upvoted a paper 14 days ago

TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents

Paper • 2602.07274 • Published 19 days ago • 204

upvoted an article 20 days ago

Article

Community Evals: Because we're done trusting black-box leaderboards over the community

+5

22 days ago

•

80

upvoted an article 27 days ago

Article

Open ASR Leaderboard: Trends and Insights with New Multilingual & Long-Form Tracks

+2

Nov 21, 2025

•

26

upvoted a paper 30 days ago

DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints

Paper • 2601.18137 • Published about 1 month ago • 27

upvoted a collection about 1 month ago

OpenEnv Environment Hub v0.2.0

A collection of OpenEnv-spec Environments • 11 items • Updated 3 days ago • 24

upvoted a paper about 1 month ago

ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning

Paper • 2502.01100 • Published Feb 3, 2025 • 21

upvoted an article about 1 month ago

Article

AssetOpsBench: Bridging the Gap Between AI Agent Benchmarks and Industrial Reality

Jan 21

•

31

upvoted a collection about 2 months ago

deployed-models

Models that are currently deployed by the hf-inference provider • 1470 items • Updated 4 days ago • 34

upvoted 3 articles 2 months ago

Article

Tokenization in Transformers v5: Simpler, Clearer, and More Modular

+4

Dec 18, 2025

•

120

Article

The Open Evaluation Standard: Benchmarking NVIDIA Nemotron 3 Nano with NeMo Evaluator

Dec 17, 2025

•

47

Article

Phare LLM benchmark V2: Reasoning models don't guarantee better security

Dec 16, 2025

•

10

upvoted 2 articles 3 months ago

Article

Tensor Parallelism (TP) in Transformers: 5 Minutes to Understand

Dec 4, 2025

•

65

Article

We Got Claude to Fine-Tune an Open Source LLM

Dec 4, 2025

•

603

upvoted a paper 3 months ago

SciCode: A Research Coding Benchmark Curated by Scientists

Paper • 2407.13168 • Published Jul 18, 2024 • 17

upvoted an article 3 months ago

Article

Transformers v5: Simple model definitions powering the AI ecosystem

+2

Dec 1, 2025

•

302

upvoted 2 papers 3 months ago

MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents

Paper • 2404.10774 • Published Apr 16, 2024 • 6

TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization

Paper • 2402.13249 • Published Feb 20, 2024 • 15