Finetuning Question

#1
by tikeape - opened
TeichAI org

Hello, I was wondering if you could share you notebook or script that you use with your datasets to distill the reasoning into the COT and not just responses. When I finetune using your reasoning datasets it tends to create a hybrid reasoning model with no COT but one that will sometimes reason its response for certain prompts.

TeichAI org

interesting, we don't do any sort of 'reasoning-only loss' or anything like that, we just train until loss is around 0.1-0.04 using slightly altered settings from the unsloth notebook. please add me on discord (@armand0e ) and I can share the scripts we've been using and help you debug your issue better

TeichAI org

I added you and recieved the script and I believe I was just using far too little training steps as well using the wrong notebook to train the thinking models. Anyway it is working better now. So thank you!

tikeape changed discussion status to closed

Sign up or log in to comment