ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought Paper • 2601.23184 • Published Jan 30 • 36
happyfighting/verl_logic_kk_Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_ccpo_bce_beta0.03_lin_r_js4 Updated Sep 18, 2025
happyfighting/verl_logic_kk_Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_ccpo_bce_beta0.03_lin_r_js2 Updated Sep 18, 2025