why is the file-size 2 gb?

#1
by phi0112358 - opened

I wonder why the file is so large when the underlying model is a 0.5B Qwen? I had expected at most 1 GB for the model in 8-bit and backend/runtime.

Mozilla.ai org

Hi @phi0112358 ! Thanks for your comment. I will verify this in the next few days, but my guess is that the data type got silently upcasted to fp32 from bf16 (which is what qwen 0.5B/duoguard was trained with) during the ONNX export step. This would explain the nearly 2x size increase (original safetensors is 988mb). bf16 support is spotty among accelerator/CPU providers, so I'm hesitant to make the canonical encoderfile in anything but fp32, but I will take a look into seeing if we can decrease the size. At the very least, weights compression is already on the docket ;) hope to have a response to you soon!

Sign up or log in to comment