Skip to content

Changelog

This page lists all notable changes to the CompactifAI API.

  • DeepSeek R1 0528 Slim has been removed.
  • POST /v1/usage/completions (usage statistics API) – This endpoint is deprecated and removed from the API. Usage and billing visibility is now provided through the CompactifAI Dashboard; the route is no longer served.
  • View usage and billing in the CompactifAI Dashboard instead of calling the API. Use the dashboard to monitor consumption, manage API tokens, and review account settings.
  • Whisper Streaming Support – The Whisper audio transcription endpoint now supports streaming, enabling real-time transcription and lower-latency audio processing.
  • GLM-5 (Private Preview) – GLM-5 is now available as a private model in the US region.
  • Agentic Tool Calling Enhancements (Beta) – Improved support for agentic workflows and tool-calling capabilities across the following models:
    • gpt-oss-20b
    • gpt-oss-120b
    • blackstar-10b
    • hypernova-60b
  • Deterministic Tool Usage in Agent Loops – When tool_choice=required or tool_choice=<tool_name> is specified, the model consistently returns a tool call, improving reliability in agent-based workflows.
  • Improved Tool Call Reliability – Reduced tool-call hallucinations and improved adherence to defined tool schemas.
  • Improved inference throughput across several models, delivering ~30% higher throughput and better overall serving efficiency for:
    • cai-llama3-1-8b-slim
    • hypernova-60b
    • gpt-oss-120b
    • gpt-oss-20b
    • blackstar-10b
  • Fixed an issue where streaming with tool calling was not supported for some models. The following models now fully support streaming responses with tool calls:
    • gpt-oss-20b
    • gpt-oss-120b
    • hypernova-60b
  • Added tool calling support for hypernova-60b model
  • Fixed a bug where Audio Transcriptions endpoint was not working for all the file mime types specified in our API Reference.
  • Significantly improved the performance of the Audio Transcriptions endpoint using the whisper-large-v3 model, reducing latency and increasing the speed factor from 15x to 100x on a 10 minutes long audio file (The speed of your network connection might affect the speed factor).
  • Fixed a bug where the audio transcription endpoint was not working for audio files with a size greater than 1MB. Now, the endpoint can process audio files up to 25MB in size.
  • Added tool calling support for gpt-oss-20b and gpt-oss-120b
  • Added hypernova-60b
  • Added blackstar-10b model
  • Speech-to-text transcription endpoint /v1/audio/transcriptions with Whisper Large V3 support for multilingual transcription workflows.
  • Feature and API documentation detailing request parameters, Python examples, and guidance for the new speech-to-text capability.
  • Removed the deepseek-r1-0528 model from the API.
  • Multi-modality support for chat completions, enabling image-plus-text inputs across the API.
  • Added mistral-small-3-1 model with full multi-modal understanding and refreshed usage examples.
  • Function tool compatibility has been activated in all models except mistral.
  • Added deepseek-ai/DeepSeek-R1-0528 model, accessible via the deepseek-r1-0528 model ID.
  • Deprecated deepseek-r1.
  • Initial release of the CompactifAI inference API with the following features:
    • Models API endpoint for listing and retrieving available compressed models
    • Chat Completions API endpoint for conversational interactions
    • Completions API endpoint for text generation
  • OpenAI-compatible API design for easy migration and integration
  • Added the following models:
    • cai-llama-4-scout-slim
    • cai-llama-3-3-70b-slim
    • cai-mistral-small-3-1-slim
    • cai-llama-3-1-8b-slim
  • HTTPS encryption for all API requests
  • Secure authentication using Bearer token scheme