LLM chat completions with cognitive context enrichment and OpenAI compatibility.

Chat Completions

Cerebe provides two chat completion endpoints: a native API with capability-based model routing, and an OpenAI-compatible endpoint that works as a drop-in replacement for ChatOpenAI or any OpenAI SDK client.

Native Chat API

POST /api/v1/llm/chat

The native endpoint supports model selection and domain context enrichment.

Request Body

Parameter	Type	Required	Description
`messages`	`Message[]`	Yes	List of chat messages
`model`	`string`	No	Model to use (default: `gpt-4o`)
`temperature`	`float`	No	Sampling temperature (default: 0.7)
`max_tokens`	`integer`	No	Maximum tokens in response
`domain_context`	`object`	No	Domain context for enrichment

Message Object

Field	Type	Description
`role`	`string`	`system`, `user`, or `assistant`
`content`	`string \| object`	Message content (supports multimodal)

Examples

from cerebe import AsyncCerebe

client = AsyncCerebe(api_key="ck_live_...")

response = await client.llm.chat(
    messages=[
        {"role": "system", "content": "You are a helpful tutor."},
        {"role": "user", "content": "Explain photosynthesis simply."},
    ],
)

print(response.choices[0]["message"]["content"])

import Cerebe from '@cerebe/sdk'

const client = new Cerebe({ apiKey: 'ck_live_...' })

const response = await client.llm.chat({
  messages: [
    { role: 'system', content: 'You are a helpful tutor.' },
    { role: 'user', content: 'Explain photosynthesis simply.' },
  ],
})

console.log(response.choices[0].message.content)

curl -X POST https://api.cerebe.ai/api/v1/llm/chat \
  -H "X-API-Key: ck_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "system", "content": "You are a helpful tutor."},
      {"role": "user", "content": "Explain photosynthesis simply."}
    ]
  }'

Response

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Photosynthesis is how plants make food from sunlight..."
      }
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 150,
    "total_tokens": 175
  }
}

OpenAI-Compatible Endpoint

POST /api/v1/openai/v1/chat/completions

This endpoint accepts the standard OpenAI request format, making Cerebe a drop-in replacement. It supports tools, tool_choice, and stream --- everything ChatOpenAI expects.

Using with OpenAI SDK

from openai import AsyncOpenAI

# Point OpenAI client at Cerebe
client = AsyncOpenAI(
    api_key="ck_live_...",
    base_url="https://api.cerebe.ai/api/v1/openai/v1",
)

response = await client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful tutor."},
        {"role": "user", "content": "What is 2+2?"},
    ],
)

print(response.choices[0].message.content)

import OpenAI from 'openai'

// Point OpenAI client at Cerebe
const client = new OpenAI({
  apiKey: 'ck_live_...',
  baseURL: 'https://api.cerebe.ai/api/v1/openai/v1',
})

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    { role: 'system', content: 'You are a helpful tutor.' },
    { role: 'user', content: 'What is 2+2?' },
  ],
})

console.log(response.choices[0].message.content)

Using with LangChain

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-4o",
    openai_api_key="ck_live_...",
    openai_api_base="https://api.cerebe.ai/api/v1/openai/v1",
)

response = await llm.ainvoke("Explain gravity")

Streaming

Both endpoints support streaming. For the OpenAI-compatible endpoint, set stream: true:

from openai import AsyncOpenAI

client = AsyncOpenAI(
    api_key="ck_live_...",
    base_url="https://api.cerebe.ai/api/v1/openai/v1",
)

stream = await client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True,
)

async for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Capability-Based Routing

When using the OpenAI-compatible endpoint, set the canonical X-LLM-Capability header to control model selection. The legacy X-Cerebe-Capability header is also accepted as an alias for backwards compatibility; when both are present, X-LLM-Capability wins.

Capability	Default Model	Use Case
`standard`	GPT-5	General chat and completions
`vision`	GPT-5	Multimodal/image requests
`reasoning`	GPT-5	Complex reasoning tasks
`classification`	GPT-5 Mini	Lightweight classification

curl -X POST https://api.cerebe.ai/api/v1/openai/v1/chat/completions \
  -H "X-API-Key: ck_live_..." \
  -H "X-LLM-Capability: reasoning" \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "Solve this logic puzzle..."}]}'

Tool Calling

The OpenAI-compatible endpoint fully supports function/tool calling:

response = await client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in London?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                },
                "required": ["location"]
            }
        }
    }],
)

Error Responses

Status	Description
`400`	Invalid request or preprocessing failed
`401`	Missing or invalid API key
`500`	LLM provider error

Chat Completions

On this page