Skip to content

Speech to Text

CompactifAI’s speech-to-text capability delivers fast, reliable transcripts from common audio formats while staying fully compatible with the OpenAI Whisper API surface. The endpoint accepts multipart form uploads, normalizes outputs to JSON, and is ideal for meeting notes, support calls, or media captioning.

import requests
API_URL = "https://api.compactif.ai/v1/audio/transcriptions"
API_KEY = "your_api_key_here"
headers = {
"Authorization": f"Bearer {API_KEY}"
}
payload = {
"model": "whisper-large-v3",
"language": "en",
"temperature": 0
}
file_name = "meeting_minutes.mp3"
file_content_type = "audio/mpeg"
with open(file_name, "rb") as audio_file:
response = requests.post(API_URL, headers=headers, data=payload, files={"file": (file_name, audio_file, file_content_type)})
print(response.json()["text"])
FieldTypeDescription
filefile uploadRequired audio file (.mp3, .mp4, .mpeg, .mpga, .wav, .webm)
modelstringRequired model alias such as whisper-large-v3
promptstringOptional text primer to bias the transcription
temperaturenumberOptional float between 0 and 1 (defaults to provider setting)
languagestringOptional ISO language code hint (en by default)
response_formatstringAccepted for compatibility; always returns JSON
streambooleanAccepted for compatibility but ignored; responses are non-streaming
{
"task": "transcribe",
"language": "en",
"duration": 12.6,
"text": "Welcome to the quarterly planning meeting. Let's review the agenda.",
"segments": [
{"id": 0, "start": 0.0, "end": 7.5, "text": "Welcome to the quarterly planning meeting."},
{"id": 1, "start": 7.5, "end": 12.6, "text": "Let's review the agenda."}
]
}
  • Keep uploads under 25 MB for best performance; large files benefit from client-side compression.
  • Provide the language hint when you know the spoken language to reduce warm-up time.
  • Use the same headers and authentication flow as other CompactifAI endpoints; only the form payload differs.