Skip to content

Whisper Large V3 Turbo Slim by CompactifAI

Model ID

cai-whisper-large-v3-turbo-slim

Base Architecture

Whisper Large V3 Turbo

  • Quantum-inspired compression delivers ~50% model size reduction while maintaining strong accuracy. Average WER change is -0.40 pp overall, with +0.68 pp in English and -1.83 pp in Spanish versus the full turbo baseline, offering strong accuracy per dollar for production speech workloads.
  • Significantly faster inference: compared to the full turbo baseline, the slim model is ~2.5× faster in Real Time Factor (RTF), enabling quicker transcription and higher throughput.
  • Excellent performance in English and Spanish, delivering high-quality ASR results.
  • Optimized for long-form audio and robust to background noise and accents.
SpecificationValue
Parameters~0.4B
Log-Melspectrogram with 128 Mel bins, computed at a 16 kHz sampling rate, using an FFT window of 400 samples (~25 ms) and a hop length of 160 samples (~10 ms)
Max Audio Duration~30 seconds per chunk (sliding window supported)
Supported file typesflac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm
  • High-fidelity meeting and podcast transcription.
  • Real-time captioning with batch segmentation.
  • Multilingual customer-support call analytics.