Finetuning Question
Hello, I was wondering if you could share you notebook or script that you use with your datasets to distill the reasoning into the COT and not just responses. When I finetune using your reasoning datasets it tends to create a hybrid reasoning model with no COT but one that will sometimes reason its response for certain prompts.
interesting, we don't do any sort of 'reasoning-only loss' or anything like that, we just train until loss is around 0.1-0.04 using slightly altered settings from the unsloth notebook. please add me on discord (@armand0e ) and I can share the scripts we've been using and help you debug your issue better
I added you and recieved the script and I believe I was just using far too little training steps as well using the wrong notebook to train the thinking models. Anyway it is working better now. So thank you!