Changelog
This page lists all notable changes to the CompactifAI API.
2026-04-01
Section titled “2026-04-01”Removed
Section titled “Removed”DeepSeek R1 0528 Slimhas been removed.
2026-03-24
Section titled “2026-03-24”Removed
Section titled “Removed”POST /v1/usage/completions(usage statistics API) – This endpoint is deprecated and removed from the API. Usage and billing visibility is now provided through the CompactifAI Dashboard; the route is no longer served.
Migration
Section titled “Migration”- View usage and billing in the CompactifAI Dashboard instead of calling the API. Use the dashboard to monitor consumption, manage API tokens, and review account settings.
2026-03-10
Section titled “2026-03-10”- Whisper Streaming Support – The Whisper audio transcription endpoint now supports streaming, enabling real-time transcription and lower-latency audio processing.
- GLM-5 (Private Preview) – GLM-5 is now available as a private model in the US region.
- Agentic Tool Calling Enhancements (Beta) – Improved support for agentic workflows and tool-calling capabilities across the following models:
- gpt-oss-20b
- gpt-oss-120b
- blackstar-10b
- hypernova-60b
Improvements
Section titled “Improvements”- Deterministic Tool Usage in Agent Loops – When
tool_choice=requiredortool_choice=<tool_name>is specified, the model consistently returns a tool call, improving reliability in agent-based workflows. - Improved Tool Call Reliability – Reduced tool-call hallucinations and improved adherence to defined tool schemas.
Performance
Section titled “Performance”- Improved inference throughput across several models, delivering ~30% higher throughput and better overall serving efficiency for:
- cai-llama3-1-8b-slim
- hypernova-60b
- gpt-oss-120b
- gpt-oss-20b
- blackstar-10b
Bug Fixes
Section titled “Bug Fixes”- Fixed an issue where streaming with tool calling was not supported for some models. The following models now fully support streaming responses with tool calls:
- gpt-oss-20b
- gpt-oss-120b
- hypernova-60b
2026-02-25
Section titled “2026-02-25”- Added tool calling support for
hypernova-60bmodel
2026-02-16
Section titled “2026-02-16”Bug fixes
Section titled “Bug fixes”- Fixed a bug where Audio Transcriptions endpoint was not working for all the file mime types specified in our API Reference.
- Significantly improved the performance of the Audio Transcriptions endpoint using the
whisper-large-v3model, reducing latency and increasing the speed factor from 15x to 100x on a 10 minutes long audio file (The speed of your network connection might affect the speed factor). - Fixed a bug where the audio transcription endpoint was not working for audio files with a size greater than 1MB. Now, the endpoint can process audio files up to 25MB in size.
2026-01-08
Section titled “2026-01-08”- Added tool calling support for
gpt-oss-20bandgpt-oss-120b
2025-12-23
Section titled “2025-12-23”- Added
hypernova-60b
2025-12-26
Section titled “2025-12-26”- Added
blackstar-10bmodel
2025-10-06
Section titled “2025-10-06”- Speech-to-text transcription endpoint
/v1/audio/transcriptionswith Whisper Large V3 support for multilingual transcription workflows. - Feature and API documentation detailing request parameters, Python examples, and guidance for the new speech-to-text capability.
Models Updates
Section titled “Models Updates”- Removed the
deepseek-r1-0528model from the API.
2025-09-24
Section titled “2025-09-24”- Multi-modality support for chat completions, enabling image-plus-text inputs across the API.
Models Updates
Section titled “Models Updates”- Added
mistral-small-3-1model with full multi-modal understanding and refreshed usage examples.
2025-08-18
Section titled “2025-08-18”- Function tool compatibility has been activated in all models except mistral.
2025-07-01
Section titled “2025-07-01”Models Updates
Section titled “Models Updates”- Added deepseek-ai/DeepSeek-R1-0528 model, accessible via the
deepseek-r1-0528model ID. - Deprecated
deepseek-r1.
2025-06-11
Section titled “2025-06-11”- Initial release of the CompactifAI inference API with the following features:
- Models API endpoint for listing and retrieving available compressed models
- Chat Completions API endpoint for conversational interactions
- Completions API endpoint for text generation
- OpenAI-compatible API design for easy migration and integration
Models Updates
Section titled “Models Updates”- Added the following models:
cai-llama-4-scout-slimcai-llama-3-3-70b-slimcai-mistral-small-3-1-slimcai-llama-3-1-8b-slim
Security
Section titled “Security”- HTTPS encryption for all API requests
- Secure authentication using Bearer token scheme