Frequently Asked Questions

General Questions

What is CompactifAI API?

CompactifAI API is an LLM inference service providing access to powerful language models through a simple, standardized API that’s compatible with the OpenAI API standard for unbeatable prices.

Which models are available through the API?

Visit the Models Catalog for detailed information or call the GET /models endpoint (see API Reference).

We offer two categories of models:

Compressed Models (Optimized using our in-house compression technology for efficiency):
- cai-llama-4-scout-slim
- cai-llama-3-3-70b-slim
- cai-llama-3-1-8b-slim
- cai-mistral-small-3-1-slim

Original Models:
- deepseek-r1-0528
- llama-4-scout
- llama-3-3-70b
- llama-3-1-8b
- mistral-small-3-1

How does pricing work?

Our API offers a pay-as-you-go pricing model. Pricing varies by model. For the most current pricing, please visit our pricing page.

Technical Questions

What programming languages can I use with the API?

Our API is language-agnostic and can be used with any programming language that can make HTTP requests. We provide example code in Python, JavaScript, and cURL, but you can use any other language.

What is the difference between the Chat Completions and Completions endpoints?

Chat Completions is designed for conversational interfaces. It accepts an array of messages with roles (system, user, assistant) and generates a contextually appropriate response.
Completions is designed for text completion tasks. It accepts a single text prompt and generates a continuation of that text.

Does the API include security guardrails for content filtering?

Currently, our API does not implement built-in security guardrails or content filtering mechanisms. It is the responsibility of developers to implement appropriate content moderation, safety checks, and guardrails within their own applications when integrating with our API. We recommend implementing client-side filtering and monitoring to ensure compliance with your application’s content policies and your country’s regulations.

Does the API support streaming responses?

Yes, our API supports streaming responses for both Chat Completions and Completions endpoints. Set the stream parameter to true in your request to receive partial results as Server-Sent Events (SSE) as they are generated. This is useful for creating real-time chat interfaces where you want to display the response as it’s being generated rather than waiting for the complete response.

Account and API Keys

For step-by-step instructions on account creation, API key access, see our Authentication page.

What should I do if my API key is compromised?

If you suspect that your API key has been compromised, contact us immediately through this form so that we can take appropriate action.

Support and Troubleshooting

How do I report issues?

For technical issues, please fill in this form with details about the problem you’re experiencing, including any error codes or messages.

How can I rotate my access key?

You can do this on the MultiverseIAM dashboard. You can rotate your API key by clicking the “Refresh Token” button which appears on the welcome page of the dashboard.

Is there a status page for the API?

Yes, you can check the current status of our API at status.compactif.ai.