[ICLR 2026] VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning
Ye Liu
yeliudev
AI & ML interests
Vision & Language
Recent Activity
updated
a model
2 days ago
yeliudev/VideoMind-2B-FT-QVHighlights
updated
a dataset
2 days ago
yeliudev/VideoMind-Dataset
updated
a model
2 days ago
yeliudev/VideoMind-7B
Organizations
UniPixel
[NeurIPS 2025] UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning
-
Running on Zero6
UniPixel
🔮6An MLLM for Unified Object Referring and Segmentation
-
PolyU-ChenLab/UniPixel-3B
Video-Text-to-Text • 4B • Updated • 152 • 3 -
PolyU-ChenLab/UniPixel-7B
Video-Text-to-Text • 8B • Updated • 342 • 1 -
PolyU-ChenLab/UniPixel-SFT-1M
Preview • Updated • 988 • 2
R2-Tuning
[ECCV 2024] R2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
VideoMind
[ICLR 2026] VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning
UniPixel
[NeurIPS 2025] UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning
-
Running on Zero6
UniPixel
🔮6An MLLM for Unified Object Referring and Segmentation
-
PolyU-ChenLab/UniPixel-3B
Video-Text-to-Text • 4B • Updated • 152 • 3 -
PolyU-ChenLab/UniPixel-7B
Video-Text-to-Text • 8B • Updated • 342 • 1 -
PolyU-ChenLab/UniPixel-SFT-1M
Preview • Updated • 988 • 2
E.T. Bench
[NeurIPS 2024] E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
R2-Tuning
[ECCV 2024] R2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding