CerebeCerebe Docs

RAG

Embed documents and retrieve relevant chunks for Retrieval-Augmented Generation.

RAG

Cerebe's RAG resource lets you embed documents into a tenant-scoped vector space and retrieve the most relevant chunks for any query. Use it to ground LLM responses in your own docs, knowledge base, or any corpus — without shipping entire files as context.

When to use which resource

Cerebe has three complementary retrieval surfaces. Pick the one that matches the data you're storing:

ResourceWhat it storesRetrieval strategyTypical use
memoryPersonal / session context (facts, experiences, preferences)Hybrid vector + graph + keywordLong-term per-user memory
knowledgeStructured entities and relationshipsGraph traversal + temporal queriesEvolving knowledge graphs
ragUnstructured documents (text, markdown, HTML)Semantic similarity over embedded chunksDocument search, docs grounding, knowledge-base Q&A

Core Operations

Embed a document

await client.rag.embed(
    source="docs/auth.md",
    content=open("docs/auth.md").read(),
    doc_type="markdown",
)
# → chunks the document, embeds each chunk, stores in the vector index
results = await client.rag.search(
    query="how does authentication work?",
    k=3,
)
for r in results.data["results"]:
    print(r["source"], r["score"])

Clean up

await client.rag.delete_document("docs/auth.md")

How It Works

  1. Chunking — documents are split into overlapping segments (default 1000 characters with 200-character overlap) so retrieval returns focused, self-contained snippets rather than entire files.
  2. Embedding — each chunk is converted to a 1536-dimensional vector using text-embedding-3-small.
  3. Storage — chunk metadata and content live in Postgres; vectors live in Qdrant. Both are scoped per organization.
  4. Retrieval — query text is embedded with the same model, then cosine similarity is computed against the org's chunks. Top-k chunks are returned with their source, content, and similarity score.

Multi-tenancy

Every document and query is automatically scoped to the organization of your API key. You cannot see documents belonging to another org, and documents you embed are never exposed outside your org. This is enforced at the data layer — no bypass is possible through a missing header or malformed request.

API Reference

MethodEndpointDescription
POST/api/v1/rag/searchSemantic document search
POST/api/v1/rag/search/hybridWeighted semantic + keyword search
POST/api/v1/rag/search/similarFind documents similar to given content
POST/api/v1/rag/documentsEmbed a document
POST/api/v1/rag/documents/batchBatch embed documents
GET/api/v1/rag/documentsList embedded documents
DELETE/api/v1/rag/documents/{source}Delete a document
GET/api/v1/rag/statsCollection statistics

On this page