LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training Paper • 2411.15708 • Published Nov 24, 2024
Iterative Value Function Optimization for Guided Decoding Paper • 2503.02368 • Published Mar 4, 2025 • 15
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts Paper • 2503.05447 • Published Mar 7, 2025 • 8
Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models Paper • 2503.16779 • Published Mar 21, 2025 • 1
Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts Paper • 2406.11256 • Published Jun 17, 2024
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models Paper • 2508.09834 • Published Aug 13, 2025 • 53
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models Paper • 2512.24165 • Published 4 days ago • 20
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models Paper • 2512.24165 • Published 4 days ago • 20
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models Paper • 2512.24165 • Published 4 days ago • 20
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration Paper • 2511.21689 • Published Nov 26, 2025 • 111
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer Paper • 2511.22699 • Published Nov 27, 2025 • 219
P1: Mastering Physics Olympiads with Reinforcement Learning Paper • 2511.13612 • Published Nov 17, 2025 • 134
TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models Paper • 2511.13704 • Published Nov 17, 2025 • 42
view article Article The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix Nov 3, 2025 • 53
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6, 2025 • 211
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning Paper • 2510.27492 • Published Oct 30, 2025 • 82
Spotlight on Token Perception for Multimodal Reinforcement Learning Paper • 2510.09285 • Published Oct 10, 2025 • 36
Native Hybrid Attention for Efficient Sequence Modeling Paper • 2510.07019 • Published Oct 8, 2025 • 16