Abstract
Large-scale semantic theorem retrieval system demonstrates superior performance over existing baselines using a 9.2 million theorem corpus with systematic analysis of representation context, language model choice, and embedding strategies.
Searching for mathematical results remains difficult: most existing tools retrieve entire papers, while mathematicians and theorem-proving agents often seek a specific theorem, lemma, or proposition that answers a query. While semantic search has seen rapid progress, its behavior on large, highly technical corpora such as research-level mathematical theorems remains poorly understood. In this work, we introduce and study semantic theorem retrieval at scale over a unified corpus of 9.2 million theorem statements extracted from arXiv and seven other sources, representing the largest publicly available corpus of human-authored, research-level theorems. We represent each theorem with a short natural-language description as a retrieval representation and systematically analyze how representation context, language model choice, embedding model, and prompting strategy affect retrieval quality. On a curated evaluation set of theorem-search queries written by professional mathematicians, our approach substantially improves both theorem-level and paper-level retrieval compared to existing baselines, demonstrating that semantic theorem search is feasible and effective at web scale. The theorem search tool is available at https://huggingface.co/spaces/uw-math-ai/theorem-search{this link}, and the dataset is available at https://huggingface.co/datasets/uw-math-ai/TheoremSearch{this link}.
Community
Mathematicians and math prover agents need fast and efficient theorem search. We release Theorem Search over all of arXiv, the Stacks Project, and six other sources. Our search is 2x more accurate than frontier LLMs, with only 4 second latency. Feedback is welcome!
| Model | Hit@10 |
|---|---|
| Google Search | 0.378 |
| Chat-GPT 5.2 | 0.180 |
| Gemini 3 Pro | 0.252 |
| Ours | 0.432 / 0.505 |
Blue: theorem-level results
Red: paper-level results
Mathematicians and math prover agents need fast and efficient theorem search. We release Theorem Search over all of arXiv, the Stacks Project, and six other sources. Our search is 2x more accurate than frontier LLMs, with only 4 second latency. Feedback is welcome!
| Model | Hit@10 |
|---|---|
| Google Search | 0.378 |
| Chat-GPT 5.2 | 0.180 |
| Gemini 3 Pro | 0.252 |
| Ours | 0.432 / 0.505 |
Blue: theorem-level results
Red: paper-level results
arXivLens breakdown of this paper ๐ https://arxivlens.com/PaperView/Details/semantic-search-over-9-million-mathematical-theorems-1483-4291560d
- Executive Summary
- Detailed Breakdown
- Practical Applications
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- What Should I Cite? A RAG Benchmark for Academic Citation Prediction (2026)
- ScholarGym: Benchmarking Deep Research Workflows on Academic Literature Retrieval (2026)
- SpectraQuery: A Hybrid Retrieval-Augmented Conversational Assistant for Battery Science (2026)
- SciNetBench: A Relation-Aware Benchmark for Scientific Literature Retrieval Agents (2025)
- LLM-based Semantic Search for Conversational Queries in E-commerce (2026)
- Autonomous Knowledge Graph Exploration with Adaptive Breadth-Depth Retrieval (2026)
- A-RAG: Scaling Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 1
Collections including this paper 0
No Collection including this paper

