high-quality Chinese training datasets
Collection
a suite of high-quality Chinese datasets, used for pretraining, fine-tuning or preference alignment. And the models trained on these datasets. • 13 items • Updated • 24
opencsg/csg-wukong-2b-chinese-fineweb-edu as base model, we fine-tune it on smoltalk-chinese for 2 epoch