OpenAI API Compatibility (Beta)

CompactifAI API provides OpenAI-compatible endpoints that allow you to use existing applications and libraries that work with OpenAI’s API with minimal changes. This compatibility layer makes it easy to quickly evaluate CompactifAI models, switching from other providers with just a few code changes.

Getting Started

Replace OpenAI API key with your CompactifAI key
Update base URL to your CompactifAI endpoint
Use CompactifAI model names (e.g., “cai-llama-3-1-8b-slim”)

from openai import OpenAI

# Initialize the client with your CompactifAI API endpoint
client = OpenAI(
    api_key="your-compactifai-api-key",  # CompactifAI API key
    base_url="https://your-compactifai-api-endpoint/v1"  # Replace with your endpoint
)

# Chat completions
chat_completion = client.chat.completions.create(
    model="cai-llama-3-1-8b-slim",  # Use any one of the available CompactifAI model
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello, what can you tell me about CompactifAI?"}
    ],
    temperature=0.7,
    max_tokens=256
)
print(chat_completion.choices[0].message.content)

# Streaming chat completions
stream = client.chat.completions.create(
    model="cai-llama-3-1-8b-slim",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me about artificial intelligence"}
    ],
    temperature=0.7,
    max_tokens=256,
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

Compatible Endpoints

Endpoint	Description
`/v1/chat/completions`	Chat conversations
`/v1/completions`	Text completions
`/v1/models`	List available models

Request Examples

Chat Completions

{
  "model": "cai-llama-3-1-8b-slim",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello, I'm Camillo"},
    {"role": "user", "content": "What is my name? What is the capital of Colombia?"}
  ],
  "temperature": 0.7,
  "max_tokens": 128,
  "stop": ["###"],
  "n": 1,
  "user": "user-123"
}

Chat Completions with Streaming

{
  "model": "cai-llama-3-1-8b-slim",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Tell me about artificial intelligence"}
  ],
  "temperature": 0.7,
  "max_tokens": 128,
  "stream": true,
  "user": "user-123"
}

Text Completions

{
  "model": "cai-llama-3-1-8b-slim",
  "prompt": "What is the capital of France?",
  "temperature": 0.7,
  "max_tokens": 128,
  "stop": ["###"],
  "user": "user-123"
}

Response Format

Responses are structured to be compatible with OpenAI’s format:

Chat Completion Response

{
  "id": "6a172d30ce8e4f34b4b830f8347c3911",
  "created":1749600000,
  "model": "cai-llama-3-1-8b-slim",
  "object": "chat.completion",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello Camillo. It's nice to meet you.\n\nYour name is Camillo.\n\nThe capital of Colombia is Bogotá."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 35,
    "total_tokens": 60
  }
}

Completion Response

{
  "id": "1861edc39ce648e3862a0b6ae9b7687b",
  "object": "text_completion",
  "created":1749600000,
  "model": "cai-llama-3-1-8b-slim",
  "choices": [
    {
      "text": "The capital of France is Paris.",
      "index": 0,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 7,
    "total_tokens": 12
  }
}

Supported Endpoints

The CompactifAI OpenAI compatibility layer supports the following endpoints:

Chat Completions

Request Fields

Field	Support Status
model	Full (use CompactifAI model names)
messages	Full (system, user, assistant roles)
max_tokens	Full
temperature	Full
top_p	Full
n	Partial (must be 1)
stream	Full
stop	Full
user	Full
presence_penalty	Full
frequency_penalty	Full
logit_bias	Ignored
logprobs	Ignored
top_logprobs	Ignored
seed	Ignored
tools	Ignored
tool_choice	Ignored
function_call	Ignored (deprecated)
functions	Ignored (deprecated)
parallel_tool_calls	Ignored
response_format	Ignored
max_completion_tokens	Ignored
data_sources	Ignored (Azure-specific)

Response Fields

Field	Support Status
id	Full
created	Full
model	Full
object	Full
choices	Full
usage	Full

For more details, see our API reference or have a look at the OpenAI chat completions documentation.

Text Completions

Request Fields

Field	Support Status
model	Full (use CompactifAI model names)
prompt	Full
max_tokens	Full
temperature	Full
top_p	Full
stop	Full
user	Full
best_of	Ignored
echo	Ignored
logit_bias	Ignored
logprobs	Ignored
seed	Ignored
suffix	Ignored

Response Fields

Field	Support Status
id	Full
created	Full
model	Full
object	Full
choices	Full
usage	Full

For more details, see our API reference or have a look at the OpenAI text completions documentation.

Key Differences from OpenAI API

Authentication: Uses CompactifAI’s authentication
Models: Different model names (e.g., “cai-llama-3-1-8b-slim” instead of “gpt-4”)
Endpoint fields: Some OpenAI-specific fields not supported

SDK Compatibility

Tested with:

For other SDKs and libraries that are built to work with OpenAI’s API, you should be able to use them by changing the base URL to point to your CompactifAI API endpoint.

from openai import OpenAI

client = OpenAI(
  api_key='your-compactifai-api-key',
  base_url='https://your-compactifai-api-endpoint/v1',
)

def main():
  # Regular completion
  completion = client.chat.completions.create(
      model='cai-llama-3-1-8b-slim',
      messages=[
          {'role': 'system', 'content': 'You are a helpful assistant.'},
          {'role': 'user', 'content': 'Tell me about CompactifAI.'}
      ],
      temperature=0.7,
      max_tokens=256
  )

  print(completion.choices[0].message.content)

  # Streaming completion
  stream = client.chat.completions.create(
      model='cai-llama-3-1-8b-slim',
      messages=[
          {'role': 'system', 'content': 'You are a helpful assistant.'},
          {'role': 'user', 'content': 'Tell me about artificial intelligence.'}
      ],
      temperature=0.7,
      max_tokens=256,
      stream=True
  )

  for chunk in stream:
      if chunk.choices[0].delta.content is not None:
          print(chunk.choices[0].delta.content, end="")

if __name__ == "__main__":
  main()

import OpenAI from 'openai';

const client = new OpenAI({
apiKey: 'your-compactifai-api-key',
baseURL: 'https://your-compactifai-api-endpoint/v1',
});

async function main() {
// Regular completion
const completion = await client.chat.completions.create({
  model: 'cai-llama-3-1-8b-slim',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Tell me about CompactifAI.' }
  ],
  temperature: 0.7,
  max_tokens: 256
});

console.log(completion.choices[0].message.content);

// Streaming completion
const stream = await client.chat.completions.create({
  model: 'cai-llama-3-1-8b-slim',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Tell me about artificial intelligence.' }
  ],
  temperature: 0.7,
  max_tokens: 256,
  stream: true
});

for await (const chunk of stream) {
  if (chunk.choices[0]?.delta?.content) {
    process.stdout.write(chunk.choices[0].delta.content);
  }
}
}

main();

Error Handling

CompactifAI API maintains consistent error formats with the OpenAI API. However, the detailed error messages may differ. We recommend using the error messages primarily for logging and debugging purposes.