Responses API
The Responses API is OpenAI’s task-oriented interface: you send a model and input (plus optional generation and formatting fields), and receive a Response object—or a stream of events when stream is true. CompactifAI exposes this at POST /v1/responses for select deployed models.
Use it when your client library or product flow targets the Responses schema (for example the OpenAI SDK’s responses surface). For classic chat turns with messages, continue to use Chat completion (POST /v1/chat/completions).
Endpoint
Section titled “Endpoint”| Path | POST /v1/responses |
| Base URL | https://api.compactif.ai/v1/responses |
Authenticate with a bearer token as described in Authentication.
Eligible models
Section titled “Eligible models”Only model configuration ids that have the responses capability in your deployment can call this route. What you see in GET /v1/models and the models catalog is authoritative for your account.
Common ids include:
| Model | Model ID |
|---|---|
| GPT OSS 120B | gpt-oss-120b |
| GPT OSS 20B | gpt-oss-20b |
| Hypernova 60B | hypernova-60b |
| Blackstar 10B | blackstar-10b |
| GLM 5.1 | glm-5-1 |
| Llama 4 scout | llama-4-scout |
| CAI Llama 4 scout Slim | cai-llama-4-scout-slim |
If the model is missing, lacks responses support, or has no endpoint configured, the API returns 400 or 500 with guidance in detail.
Required fields
Section titled “Required fields”model— Your CompactifAI model configuration id (mapped to the backend engine name on the wire).input— A string or structured items (per the OpenAI Responses request shape).
Common optional fields
Section titled “Common optional fields”| Field | Notes |
|---|---|
stream | true returns SSE (text/event-stream) with data: JSON lines, ending with data: [DONE]. Default - false |
max_output_tokens | Caps generated tokens for this response. Use this field on Responses—not max_tokens (that belongs to chat completions). |
temperature, top_p | Sampling controls. |
instructions | High-level system-style instructions. |
text, reasoning, metadata, truncation, parallel_tool_calls | Same semantics as OpenAI Responses where your backend supports them. |
See the full parameter table in the API reference → Responses API.
Non-streaming example
Section titled “Non-streaming example”import requests
API_URL = "https://api.compactif.ai/v1/responses"API_KEY = "your_api_key_here"
headers = { "Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json",}
data = { "model": "hypernova-60b", "input": "Say hello in five words.", "max_output_tokens": 500,}
response = requests.post(API_URL, headers=headers, json=data)response.raise_for_status()print(response.json())Streaming example
Section titled “Streaming example”Set "stream": true. Read the response body as Server-Sent Events: each meaningful line is data: followed by JSON (except the final data: [DONE] sentinel).
curl -N https://api.compactif.ai/v1/responses \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_API_KEY" \ -d '{"model":"gpt-oss-20b","input":"Say hello in one short sentence.","stream":true}'Compatibility and reference
Section titled “Compatibility and reference”Field-level support appears in OpenAI compatibility under the Responses tables (request and response fields). For exhaustive parameters and streaming details, use API reference → Responses API.