Whisper Large V3 Turbo Slim by CompactifAI

Model Overview

Model ID

cai-whisper-large-v3-turbo-slim

Base Architecture

Whisper Large V3 Turbo

Quantum-inspired compression delivers ~50% model size reduction while maintaining strong accuracy. Average WER change is -0.40 pp overall, with +0.68 pp in English and -1.83 pp in Spanish versus the full turbo baseline, offering strong accuracy per dollar for production speech workloads.
Significantly faster inference: compared to the full turbo baseline, the slim model is ~2.5× faster in Real Time Factor (RTF), enabling quicker transcription and higher throughput.
Excellent performance in English and Spanish, delivering high-quality ASR results.
Optimized for long-form audio and robust to background noise and accents.

Specification	Value
Parameters	~0.4B
Log-Mel	spectrogram with 128 Mel bins, computed at a 16 kHz sampling rate, using an FFT window of 400 samples (~25 ms) and a hop length of 160 samples (~10 ms)
Max Audio Duration	~30 seconds per chunk (sliding window supported)
Supported file types	flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm