whisper-large-v3-turbo finetuned - CTranslate2 format for use with faster-whisper /whisperX

Statement:

This model repository is copied from daniel_whisper_finetune_large_v3_turbo_v2 and is a fully functional subset of it. The purpose of copying this converted model is to use it directly in projects without much code modifications and without needing to download the entire repository that includes safetensors and ggml models. The original creator, Daniel, did not create a separate repository for the ct2 model but placed it in a subdirectory of the source model. Thanks to Daniel for creating this fine-tuned model.

This model is a fine-tuned version of openai/whisper-large-v3-turbo on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.2212

Model description

This is a personal fine-tune of the Whisper large-v3-turbo model, trained on approximately 1 hour of audio featuring Daniel Rosehill's voice. The training data includes domain-specific vocabulary focused on:

Technology and software development terminology
A few Hebrew words and phrases

This model was created as a proof of concept for fine-tuning Whisper models for personal use and improved transcription accuracy on domain-specific content.

Training Infrastructure

Fine-tuning was performed using Modal GPU inference infrastructure.

Converted Formats

This repository includes converted model formats of

CTranslate2 format (converted/ctranslate2/): For use with faster-whisper
- Highly optimized inference engine (4x faster than OpenAI Whisper)
- Excellent CPU and GPU (CUDA) support
- Lower memory usage with 8-bit and 16-bit quantization

Intended uses & limitations

This model is optimized for:

Transcribing Daniel Rosehill's voice
Technical and software development content
Mixed English with occasional Hebrew terms

Limitations:

Performance may degrade on voices significantly different from the training data
Limited to the vocabulary and accent patterns in the training set
Best suited for personal use rather than general-purpose transcription

Training and evaluation data

Training dataset consisted of approximately 1 hour of recorded audio featuring:

Technical discussions and software development content
Mixed English with occasional Hebrew vocabulary
Single speaker (Daniel Rosehill)

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 50
training_steps: 400
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
0.1955	1.3158	50	0.2107
0.0622	2.6316	100	0.1896
0.0332	3.9474	150	0.1602
0.0202	5.2632	200	0.1994
0.0063	6.5789	250	0.2209
0.0022	7.8947	300	0.2114
0.001	9.2105	350	0.2216
0.0015	10.5263	400	0.2212

Framework versions

Transformers 4.57.1
Pytorch 2.9.1+cu128
Datasets 4.4.1
Tokenizers 0.22.1

Downloads last month: 46

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for iBoostAI/daniel-whisper-large-v3-turbo-finetune-ct2

Base model

openai/whisper-large-v3

Finetuned

openai/whisper-large-v3-turbo

Finetuned

(422)

this model