whisper-large-v3-turbo finetuned - CTranslate2 format for use with faster-whisper /whisperX
Statement:
This model repository is copied from daniel_whisper_finetune_large_v3_turbo_v2 and is a fully functional subset of it. The purpose of copying this converted model is to use it directly in projects without much code modifications and without needing to download the entire repository that includes safetensors and ggml models. The original creator, Daniel, did not create a separate repository for the ct2 model but placed it in a subdirectory of the source model. Thanks to Daniel for creating this fine-tuned model.
This model is a fine-tuned version of openai/whisper-large-v3-turbo on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.2212
Model description
This is a personal fine-tune of the Whisper large-v3-turbo model, trained on approximately 1 hour of audio featuring Daniel Rosehill's voice. The training data includes domain-specific vocabulary focused on:
- Technology and software development terminology
- A few Hebrew words and phrases
This model was created as a proof of concept for fine-tuning Whisper models for personal use and improved transcription accuracy on domain-specific content.
Training Infrastructure
Fine-tuning was performed using Modal GPU inference infrastructure.
Converted Formats
This repository includes converted model formats of
- CTranslate2 format (
converted/ctranslate2/): For use with faster-whisper- Highly optimized inference engine (4x faster than OpenAI Whisper)
- Excellent CPU and GPU (CUDA) support
- Lower memory usage with 8-bit and 16-bit quantization
Intended uses & limitations
This model is optimized for:
- Transcribing Daniel Rosehill's voice
- Technical and software development content
- Mixed English with occasional Hebrew terms
Limitations:
- Performance may degrade on voices significantly different from the training data
- Limited to the vocabulary and accent patterns in the training set
- Best suited for personal use rather than general-purpose transcription
Training and evaluation data
Training dataset consisted of approximately 1 hour of recorded audio featuring:
- Technical discussions and software development content
- Mixed English with occasional Hebrew vocabulary
- Single speaker (Daniel Rosehill)
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 50
- training_steps: 400
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 0.1955 | 1.3158 | 50 | 0.2107 |
| 0.0622 | 2.6316 | 100 | 0.1896 |
| 0.0332 | 3.9474 | 150 | 0.1602 |
| 0.0202 | 5.2632 | 200 | 0.1994 |
| 0.0063 | 6.5789 | 250 | 0.2209 |
| 0.0022 | 7.8947 | 300 | 0.2114 |
| 0.001 | 9.2105 | 350 | 0.2216 |
| 0.0015 | 10.5263 | 400 | 0.2212 |
Framework versions
- Transformers 4.57.1
- Pytorch 2.9.1+cu128
- Datasets 4.4.1
- Tokenizers 0.22.1
- Downloads last month
- 46
Model tree for iBoostAI/daniel-whisper-large-v3-turbo-finetune-ct2
Base model
openai/whisper-large-v3