DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer Paper • 2601.01425 • Published 5 days ago • 46
Beyond Memorization: A Multi-Modal Ordinal Regression Benchmark to Expose Popularity Bias in Vision-Language Models Paper • 2512.21337 • Published 16 days ago • 29
LongVideoAgent: Multi-Agent Reasoning with Long Videos Paper • 2512.20618 • Published 17 days ago • 53
view article Article How We Use Claude Code Skills to Run 1,000+ ML Experiments a Day Dec 8, 2025 • 48
view article Article Generative AI for Recommendation Systems: A Guide to Tokenizing User Interaction Data Mar 26, 2025 • 9
ARGenSeg: Image Segmentation with Autoregressive Image Generation Model Paper • 2510.20803 • Published Oct 23, 2025 • 9
Unified Reinforcement and Imitation Learning for Vision-Language Models Paper • 2510.19307 • Published Oct 22, 2025 • 30
LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence Paper • 2509.12203 • Published Sep 15, 2025 • 19
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published Jul 1, 2025 • 250
Intern-S1: A Scientific Multimodal Foundation Model Paper • 2508.15763 • Published Aug 21, 2025 • 259
Running on Zero Featured 181 Chat with Kimi-VL-A3B-Thinking-2506 🤔 181 Chat with images, videos, or PDFs to generate text