Speech to Text
CompactifAI’s speech-to-text capability delivers fast, reliable transcripts from common audio formats while staying fully compatible with the OpenAI Whisper API surface. The endpoint accepts multipart form uploads, normalizes outputs to JSON, and is ideal for meeting notes, support calls, or media captioning.
Basic Usage (Python)
Section titled “Basic Usage (Python)”import requests
API_URL = "https://api.compactif.ai/v1/audio/transcriptions"API_KEY = "your_api_key_here"
headers = { "Authorization": f"Bearer {API_KEY}"}
payload = { "model": "whisper-large-v3", "language": "en", "temperature": 0}file_name = "meeting_minutes.mp3"file_content_type = "audio/mpeg"with open(file_name, "rb") as audio_file: response = requests.post(API_URL, headers=headers, data=payload, files={"file": (file_name, audio_file, file_content_type)})
print(response.json()["text"])Accepted Parameters
Section titled “Accepted Parameters”| Field | Type | Description |
|---|---|---|
file | file upload | Supported files types - .flac, .mp3, .mp4, .mpga, .m4a, .wav, .webm, .mpeg, and .ogg (Note - For .ogg and .mpeg, the system only supports audio files) |
model | string | Required model alias such as whisper-large-v3 |
prompt | string | Optional text primer to bias the transcription |
temperature | number | Optional float between 0 and 1 (defaults to provider setting) |
language | string | Optional ISO language code hint (en by default) |
response_format | string | Accepted for compatibility; always returns JSON |
stream | boolean | Whether to stream back partial progress |
Example Response
Section titled “Example Response”{ "task": "transcribe", "language": "en", "duration": 12.6, "text": "Welcome to the quarterly planning meeting. Let's review the agenda.", "segments": [ {"id": 0, "start": 0.0, "end": 7.5, "text": "Welcome to the quarterly planning meeting."}, {"id": 1, "start": 7.5, "end": 12.6, "text": "Let's review the agenda."} ]}- Keep uploads under 25 MB for best performance; large files benefit from client-side compression.
- Provide the
languagehint when you know the spoken language to reduce warm-up time. - Use the same headers and authentication flow as other CompactifAI endpoints; only the form payload differs.