Glance: Accelerating Diffusion Models with 1 Sample Paper ⢠2512.02899 ⢠Published Dec 2, 2025 ⢠29
WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation Paper ⢠2511.11434 ⢠Published Nov 14, 2025 ⢠44
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation Paper ⢠2511.02778 ⢠Published Nov 4, 2025 ⢠101
UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback Paper ⢠2511.01678 ⢠Published Nov 3, 2025 ⢠35
From Charts to Code: A Hierarchical Benchmark for Multimodal Models Paper ⢠2510.17932 ⢠Published Oct 20, 2025 ⢠7
Paper2Video: Automatic Video Generation from Scientific Papers Paper ⢠2510.05096 ⢠Published Oct 6, 2025 ⢠118
V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models Paper ⢠2504.06148 ⢠Published Apr 8, 2025 ⢠13
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models Paper ⢠2503.20198 ⢠Published Mar 26, 2025 ⢠4
TPDiff: Temporal Pyramid Video Diffusion Model Paper ⢠2503.09566 ⢠Published Mar 12, 2025 ⢠45
Automated Movie Generation via Multi-Agent CoT Planning Paper ⢠2503.07314 ⢠Published Mar 10, 2025 ⢠44
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles Paper ⢠2503.03651 ⢠Published Mar 5, 2025 ⢠16
Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models Paper ⢠2503.01774 ⢠Published Mar 3, 2025 ⢠44
PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data Paper ⢠2502.14397 ⢠Published Feb 20, 2025 ⢠41
WorldGUI: Dynamic Testing for Comprehensive Desktop GUI Automation Paper ⢠2502.08047 ⢠Published Feb 12, 2025 ⢠28
TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation Paper ⢠2502.07870 ⢠Published Feb 11, 2025 ⢠45