Embed documents and retrieve relevant chunks for Retrieval-Augmented Generation.

RAG

Cerebe's RAG resource lets you embed documents into a tenant-scoped vector space and retrieve the most relevant chunks for any query. Use it to ground LLM responses in your own docs, knowledge base, or any corpus — without shipping entire files as context.

When to use which resource

Cerebe has three complementary retrieval surfaces. Pick the one that matches the data you're storing:

Resource	What it stores	Retrieval strategy	Typical use
`memory`	Personal / session context (facts, experiences, preferences)	Hybrid vector + graph + keyword	Long-term per-user memory
`knowledge`	Structured entities and relationships	Graph traversal + temporal queries	Evolving knowledge graphs
`rag`	Unstructured documents (text, markdown, HTML)	Semantic similarity over embedded chunks	Document search, docs grounding, knowledge-base Q&A

Core Operations

Embed a document

await client.rag.embed(
    source="docs/auth.md",
    content=open("docs/auth.md").read(),
    doc_type="markdown",
)
# → chunks the document, embeds each chunk, stores in the vector index

Search

results = await client.rag.search(
    query="how does authentication work?",
    k=3,
)
for r in results.data["results"]:
    print(r["source"], r["score"])

Clean up

await client.rag.delete_document("docs/auth.md")

How It Works

Chunking — documents are split into overlapping segments (default 1000 characters with 200-character overlap) so retrieval returns focused, self-contained snippets rather than entire files.
Embedding — each chunk is converted to a 1536-dimensional vector using text-embedding-3-small.
Storage — chunk metadata and content live in Postgres; vectors live in Qdrant. Both are scoped per organization.
Retrieval — query text is embedded with the same model, then cosine similarity is computed against the org's chunks. Top-k chunks are returned with their source, content, and similarity score.

Multi-tenancy

Every document and query is automatically scoped to the organization of your API key. You cannot see documents belonging to another org, and documents you embed are never exposed outside your org. This is enforced at the data layer — no bypass is possible through a missing header or malformed request.

API Reference

Method	Endpoint	Description
`POST`	`/api/v1/rag/search`	Semantic document search
`POST`	`/api/v1/rag/search/hybrid`	Weighted semantic + keyword search
`POST`	`/api/v1/rag/search/similar`	Find documents similar to given content
`POST`	`/api/v1/rag/documents`	Embed a document
`POST`	`/api/v1/rag/documents/batch`	Batch embed documents
`GET`	`/api/v1/rag/documents`	List embedded documents
`DELETE`	`/api/v1/rag/documents/{source}`	Delete a document
`GET`	`/api/v1/rag/stats`	Collection statistics

RAG

On this page