-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 5.69k • 1.24k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 358 • 15 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
Collections
Discover the best community collections!
Collections including paper arxiv:2510.12693
-
StreamingVLM: Real-Time Understanding for Infinite Video Streams
Paper • 2510.09608 • Published • 50 -
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning
Paper • 2510.12693 • Published • 27 -
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence
Paper • 2510.20579 • Published • 55 -
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Paper • 2510.15870 • Published • 89
-
ARE: Scaling Up Agent Environments and Evaluations
Paper • 2509.17158 • Published • 35 -
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation
Paper • 2510.08551 • Published • 33 -
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention
Paper • 2510.04212 • Published • 23 -
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning
Paper • 2510.12693 • Published • 27
-
WorldVLA: Towards Autoregressive Action World Model
Paper • 2506.21539 • Published • 40 -
Fast and Simplex: 2-Simplicial Attention in Triton
Paper • 2507.02754 • Published • 25 -
IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction
Paper • 2507.02025 • Published • 35 -
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact
Paper • 2507.00951 • Published • 23
-
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
Paper • 2311.12631 • Published • 14 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 58 -
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
Paper • 2504.01956 • Published • 41 -
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding
Paper • 2506.23219 • Published • 7
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning
Paper • 2510.12693 • Published • 27 -
Dr.LLM: Dynamic Layer Routing in LLMs
Paper • 2510.12773 • Published • 31 -
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI
Paper • 2510.05684 • Published • 141 -
BEAR: Benchmarking and Enhancing Multimodal Language Models for Atomic Embodied Capabilities
Paper • 2510.08759 • Published • 46
-
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
Paper • 2507.01925 • Published • 39 -
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning
Paper • 2507.16746 • Published • 35 -
MolmoAct: Action Reasoning Models that can Reason in Space
Paper • 2508.07917 • Published • 44 -
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies
Paper • 2508.20072 • Published • 31
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 5.69k • 1.24k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 358 • 15 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
StreamingVLM: Real-Time Understanding for Infinite Video Streams
Paper • 2510.09608 • Published • 50 -
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning
Paper • 2510.12693 • Published • 27 -
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence
Paper • 2510.20579 • Published • 55 -
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Paper • 2510.15870 • Published • 89
-
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning
Paper • 2510.12693 • Published • 27 -
Dr.LLM: Dynamic Layer Routing in LLMs
Paper • 2510.12773 • Published • 31 -
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI
Paper • 2510.05684 • Published • 141 -
BEAR: Benchmarking and Enhancing Multimodal Language Models for Atomic Embodied Capabilities
Paper • 2510.08759 • Published • 46
-
ARE: Scaling Up Agent Environments and Evaluations
Paper • 2509.17158 • Published • 35 -
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation
Paper • 2510.08551 • Published • 33 -
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention
Paper • 2510.04212 • Published • 23 -
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning
Paper • 2510.12693 • Published • 27
-
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
Paper • 2507.01925 • Published • 39 -
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning
Paper • 2507.16746 • Published • 35 -
MolmoAct: Action Reasoning Models that can Reason in Space
Paper • 2508.07917 • Published • 44 -
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies
Paper • 2508.20072 • Published • 31
-
WorldVLA: Towards Autoregressive Action World Model
Paper • 2506.21539 • Published • 40 -
Fast and Simplex: 2-Simplicial Attention in Triton
Paper • 2507.02754 • Published • 25 -
IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction
Paper • 2507.02025 • Published • 35 -
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact
Paper • 2507.00951 • Published • 23
-
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
Paper • 2311.12631 • Published • 14 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 58 -
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
Paper • 2504.01956 • Published • 41 -
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding
Paper • 2506.23219 • Published • 7