Introduction

Welcome to the CompactifAI API documentation. CompactifAI API empowers organizations with seamless access to ultra-efficient and scalable AI models that slash compute and energy costs, accelerate deployment, and fuel innovation, all without compromising performance or reliability.

Why Compressed Models?

CompactifAI specializes in offering highly compressed versions of leading language models that:

Dramatic Cost Reduction: Up to 70% lower inference costs compared to uncompressed models through optimized resource utilization
Massive Throughput Gains: Process up to 4x more requests per second with compressed models requiring fewer computational resources
Low-Latency Inference: Achieve faster response times due to reduced model size and optimized memory usage
Minimal Quality Loss: Advanced compression techniques preserve model performance with typically <5% benchmark difference
Superior Concurrency: Support significantly more simultaneous users and requests with the same hardware resources
Resource Efficiency: Reduced memory footprint and computational requirements enable better hardware utilization

API Features

Completions API: Generate text completions based on provided prompts
Chat Completions API: Generate conversational responses using chat-based interaction
Models API: List and get information about available compressed models

Base URL

All API requests should be made to:

https://api.compactif.ai/v1

API Compatibility

The CompactifAI API is designed to be compatible with the OpenAI standard, allowing for straightforward migration and integration with existing systems. Our endpoints follow similar patterns and accept compatible parameters.

Getting Started

To start using the CompactifAI API:

Sign up for an API key
Follow our quickstart guide
Explore the API reference for detailed documentation

Support

If you encounter any issues or have questions about using our API, please check our FAQ or contact our support team.