6 64 128

Pu Fanyi

pufanyi

https://pufanyi.github.io

AI & ML interests

Recent Activity

liked a Space about 23 hours ago

pufanyi/video-frame-extractor

updated a Space about 24 hours ago

pufanyi/video-frame-extractor

published a Space about 24 hours ago

pufanyi/video-frame-extractor

View all activity

Organizations

liked a Space about 23 hours ago

Video Frame Extractor

🎬

Extract frames from video files

updated a Space about 24 hours ago

Video Frame Extractor

🎬

Extract frames from video files

published a Space about 24 hours ago

Video Frame Extractor

🎬

Extract frames from video files

upvoted an article 3 days ago

Article

NEO-unify: Building Native Multimodal Unified Models End to End

6 days ago

•

upvoted a paper 7 days ago

ArtHOI: Articulated Human-Object Interaction Synthesis by 4D Reconstruction from Video Priors

Paper • 2603.04338 • Published 7 days ago • 21

liked a dataset 8 days ago

Video-Reason/VBVR-Bench-Data

Viewer • Updated 16 days ago • 500 • 2.69k • 8

upvoted 2 papers 8 days ago

A Very Big Video Reasoning Suite

Paper • 2602.20159 • Published 16 days ago • 513

UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?

Paper • 2603.03241 • Published 8 days ago • 81

updated a dataset 13 days ago

LMMs-Lab-Speedrun/Data_NanoVLM

Viewer • Updated 13 days ago • 1.62M • 26

upvoted a paper 22 days ago

UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

Paper • 2602.12279 • Published 27 days ago • 20

upvoted a paper 24 days ago

OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence

Paper • 2602.08683 • Published about 1 month ago • 50

upvoted a collection about 2 months ago

NEO1_0

Collection

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale • 7 items • Updated Jan 27 • 9

upvoted a paper 2 months ago

Fewer Truncations Improve Language Modeling

Paper • 2404.10830 • Published Apr 16, 2024 • 4

liked a Space 2 months ago

Chinese Open Source Heatmap

🔥

Explore open source AI projects and their release activity over time

liked 2 models 2 months ago

Qwen/Qwen3-VL-8B-Thinking

Image-Text-to-Text • 9B • Updated Nov 26, 2025 • 186k • 191

Qwen/Qwen3-VL-235B-A22B-Thinking

Image-Text-to-Text • 236B • Updated Nov 26, 2025 • 564k • • 382

liked a Space 3 months ago

DINOv3 Web

🦖

156

Visualize rich, dense image features locally in your browser

liked 2 models 3 months ago

facebook/dinov2-large

Image Feature Extraction • 0.3B • Updated Sep 6, 2023 • 918k • 102

Qwen/Qwen3-VL-2B-Instruct

Image-Text-to-Text • Updated Oct 23, 2025 • 13M • 344

upvoted a paper 3 months ago

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

Paper • 2512.19693 • Published Dec 22, 2025 • 66

Pu Fanyi

AI & ML interests

Recent Activity

Organizations

pufanyi's activity

Video Frame Extractor

Video Frame Extractor

Video Frame Extractor

NEO-unify: Building Native Multimodal Unified Models End to End

Chinese Open Source Heatmap

DINOv3 Web