Responses API

The Responses API is OpenAI’s task-oriented interface: you send a model and input (plus optional generation and formatting fields), and receive a Response object—or a stream of events when stream is true. CompactifAI exposes this at POST /v1/responses for select deployed models.

Use it when your client library or product flow targets the Responses schema (for example the OpenAI SDK’s responses surface). For classic chat turns with messages, continue to use Chat completion (POST /v1/chat/completions).

Endpoint


Path	`POST /v1/responses`
Base URL	`https://api.compactif.ai/v1/responses`

Authenticate with a bearer token as described in Authentication.

Eligible models

Only model configuration ids that have the responses capability in your deployment can call this route. What you see in GET /v1/models and the models catalog is authoritative for your account.

Common ids include:

Model	Model ID
GPT OSS 120B	`gpt-oss-120b`
GPT OSS 20B	`gpt-oss-20b`
Hypernova 60B	`hypernova-60b`
Blackstar 10B	`blackstar-10b`
GLM 5.1	`glm-5-1`
Llama 4 scout	`llama-4-scout`
CAI Llama 4 scout Slim	`cai-llama-4-scout-slim`

If the model is missing, lacks responses support, or has no endpoint configured, the API returns 400 or 500 with guidance in detail.

Required fields

model — Your CompactifAI model configuration id (mapped to the backend engine name on the wire).
input — A string or structured items (per the OpenAI Responses request shape).

Common optional fields

Field	Notes
`stream`	`true` returns SSE (`text/event-stream`) with `data:` JSON lines, ending with `data: [DONE]`. Default - `false`
`max_output_tokens`	Caps generated tokens for this response. Use this field on Responses—not `max_tokens` (that belongs to chat completions).
`temperature`, `top_p`	Sampling controls.
`instructions`	High-level system-style instructions.
`text`, `reasoning`, `metadata`, `truncation`, `parallel_tool_calls`	Same semantics as OpenAI Responses where your backend supports them.

See the full parameter table in the API reference → Responses API.

Non-streaming example

import requests

API_URL = "https://api.compactif.ai/v1/responses"
API_KEY = "your_api_key_here"

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json",
}

data = {
    "model": "hypernova-60b",
    "input": "Say hello in five words.",
    "max_output_tokens": 500,
}

response = requests.post(API_URL, headers=headers, json=data)
response.raise_for_status()
print(response.json())

Streaming example

Set "stream": true. Read the response body as Server-Sent Events: each meaningful line is data: followed by JSON (except the final data: [DONE] sentinel).

curl -N https://api.compactif.ai/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{"model":"gpt-oss-20b","input":"Say hello in one short sentence.","stream":true}'

Compatibility and reference

Field-level support appears in OpenAI compatibility under the Responses tables (request and response fields). For exhaustive parameters and streaming details, use API reference → Responses API.