Skip to content

Whisper Large V3

Model ID

whisper-large-v3

Base Architecture

Whisper Large V3

  • Handles automatic speech recognition across 50+ languages with improved accuracy over earlier Whisper releases.
  • Supports English speech translation and zero-shot transcription for low-resource languages.
  • Optimized for long-form audio; robust to accents, background noise, and code-switching.
SpecificationValue
Parameters~1.55B
Encoder-Decoder Layers32 encoder / 32 decoder
Audio Frontend80-channel log-Mel spectrogram (25 ms window, 10 ms stride)
Max Audio Duration~30 seconds per chunk (sliding window supported)
  • High-fidelity meeting and podcast transcription.
  • Real-time captioning with batch segmentation.
  • Multilingual customer-support call analytics.