Deqing Fu PRO
deqing
AI & ML interests
None yet
Recent Activity
liked a model about 5 hours ago
google/tabfm-1.0.0-pytorch updated a model 26 days ago
deqing/convergent-llama-300M-muon-6digit-addition_6digit_custom6 upvoted a paper 28 days ago
Value-Aware Stochastic KV Cache Eviction for Reasoning ModelsOrganizations
Convergent Evolution (Addition)
-
deqing/convergent-llama-300M-muon-addition_3digit
Text Generation • 0.3B • Updated • 10 -
deqing/convergent-llama-300M-muon-addition_3digit_seed123
0.3B • Updated • 3 -
deqing/convergent-llama-300M-muon-addition
Text Generation • 0.3B • Updated • 9 -
deqing/convergent-llama-300M-adamw-addition_3digit
Text Generation • 0.3B • Updated • 10
Convergent Evolution (Data)
-
deqing/convergent-llama-300M-muon-original
Text Generation • 0.3B • Updated • 19 -
deqing/convergent-llama-300M-muon-unigram
Text Generation • 0.3B • Updated • 9 -
deqing/convergent-llama-300M-muon-isolate-1
Text Generation • 0.3B • Updated • 12 -
deqing/convergent-llama-300M-muon-swap_numbers
Text Generation • 0.3B • Updated • 8
Convergent Evolution
Convergent Evolution (Architecture and Optimizer)
-
deqing/convergent-llama-300M-muon-original
Text Generation • 0.3B • Updated • 19 -
deqing/convergent-gdn-300M-muon-original
Text Generation • 0.3B • Updated • 21 -
deqing/convergent-mamba2-300M-muon-original
Text Generation • 0.3B • Updated • 11 -
deqing/convergent-lstm-4layer-muon-original
Text Generation • 0.2B • Updated • 8
Fourier Language Model
Convergent Evolution
Convergent Evolution (Addition)
-
deqing/convergent-llama-300M-muon-addition_3digit
Text Generation • 0.3B • Updated • 10 -
deqing/convergent-llama-300M-muon-addition_3digit_seed123
0.3B • Updated • 3 -
deqing/convergent-llama-300M-muon-addition
Text Generation • 0.3B • Updated • 9 -
deqing/convergent-llama-300M-adamw-addition_3digit
Text Generation • 0.3B • Updated • 10
Convergent Evolution (Architecture and Optimizer)
-
deqing/convergent-llama-300M-muon-original
Text Generation • 0.3B • Updated • 19 -
deqing/convergent-gdn-300M-muon-original
Text Generation • 0.3B • Updated • 21 -
deqing/convergent-mamba2-300M-muon-original
Text Generation • 0.3B • Updated • 11 -
deqing/convergent-lstm-4layer-muon-original
Text Generation • 0.2B • Updated • 8
Convergent Evolution (Data)
-
deqing/convergent-llama-300M-muon-original
Text Generation • 0.3B • Updated • 19 -
deqing/convergent-llama-300M-muon-unigram
Text Generation • 0.3B • Updated • 9 -
deqing/convergent-llama-300M-muon-isolate-1
Text Generation • 0.3B • Updated • 12 -
deqing/convergent-llama-300M-muon-swap_numbers
Text Generation • 0.3B • Updated • 8