GenPRM
/

GenPRM-7B

RyanLiu112 commited on Apr 6, 2025

Commit

0ea3fa7

verified ·

1 Parent(s): 5a703df

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -4,13 +4,15 @@ datasets:
 - GenPRM/GenPRM-MATH-Data
 base_model:
 - deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
 ---
 # Introduction
 We propose **GenPRM**, a strong generative process reward model with the following features:
-- reasoning with explicit **CoT reasoning** and **code verfication** before providing the process judgment;
 - improving Monte Carlo estimation and hard label with **Relative Progress Estimation (RPE)**;
 - supporting GenPRM **test-time scaling** in a parallel manner with majority voting;
 - supporting policy model test-time scaling with GenPRM as **verifiers** or **critics**.
@@ -106,4 +108,4 @@ Our recent work on LLM test-time scaling with PRMs:
     journal = {arXiv preprint arXiv:2502.06703},
     year    = {2025}
 }
-```

 - GenPRM/GenPRM-MATH-Data
 base_model:
 - deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
+language:
+- en
 ---
 # Introduction
 We propose **GenPRM**, a strong generative process reward model with the following features:
+- performing explicit **CoT reasoning** and **code verfication** before providing the process judgment;
 - improving Monte Carlo estimation and hard label with **Relative Progress Estimation (RPE)**;
 - supporting GenPRM **test-time scaling** in a parallel manner with majority voting;
 - supporting policy model test-time scaling with GenPRM as **verifiers** or **critics**.
     journal = {arXiv preprint arXiv:2502.06703},
     year    = {2025}
 }
+```