Skip to content

Responses API

The Responses API is OpenAI’s task-oriented interface: you send a model and input (plus optional generation and formatting fields), and receive a Response object—or a stream of events when stream is true. CompactifAI exposes this at POST /v1/responses for select deployed models.

Use it when your client library or product flow targets the Responses schema (for example the OpenAI SDK’s responses surface). For classic chat turns with messages, continue to use Chat completion (POST /v1/chat/completions).

PathPOST /v1/responses
Base URLhttps://api.compactif.ai/v1/responses

Authenticate with a bearer token as described in Authentication.

Only model configuration ids that have the responses capability in your deployment can call this route. What you see in GET /v1/models and the models catalog is authoritative for your account.

Common ids include:

ModelModel ID
GPT OSS 120Bgpt-oss-120b
GPT OSS 20Bgpt-oss-20b
Hypernova 60Bhypernova-60b
Blackstar 10Bblackstar-10b
GLM 5.1glm-5-1
Llama 4 scoutllama-4-scout
CAI Llama 4 scout Slimcai-llama-4-scout-slim

If the model is missing, lacks responses support, or has no endpoint configured, the API returns 400 or 500 with guidance in detail.

  • model — Your CompactifAI model configuration id (mapped to the backend engine name on the wire).
  • input — A string or structured items (per the OpenAI Responses request shape).
FieldNotes
streamtrue returns SSE (text/event-stream) with data: JSON lines, ending with data: [DONE]. Default - false
max_output_tokensCaps generated tokens for this response. Use this field on Responses—not max_tokens (that belongs to chat completions).
temperature, top_pSampling controls.
instructionsHigh-level system-style instructions.
text, reasoning, metadata, truncation, parallel_tool_callsSame semantics as OpenAI Responses where your backend supports them.

See the full parameter table in the API reference → Responses API.

import requests
API_URL = "https://api.compactif.ai/v1/responses"
API_KEY = "your_api_key_here"
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json",
}
data = {
"model": "hypernova-60b",
"input": "Say hello in five words.",
"max_output_tokens": 500,
}
response = requests.post(API_URL, headers=headers, json=data)
response.raise_for_status()
print(response.json())

Set "stream": true. Read the response body as Server-Sent Events: each meaningful line is data: followed by JSON (except the final data: [DONE] sentinel).

Terminal window
curl -N https://api.compactif.ai/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{"model":"gpt-oss-20b","input":"Say hello in one short sentence.","stream":true}'

Field-level support appears in OpenAI compatibility under the Responses tables (request and response fields). For exhaustive parameters and streaming details, use API reference → Responses API.