RAG
Embed documents and retrieve relevant chunks for Retrieval-Augmented Generation.
RAG
Cerebe's RAG resource lets you embed documents into a tenant-scoped vector space and retrieve the most relevant chunks for any query. Use it to ground LLM responses in your own docs, knowledge base, or any corpus — without shipping entire files as context.
When to use which resource
Cerebe has three complementary retrieval surfaces. Pick the one that matches the data you're storing:
| Resource | What it stores | Retrieval strategy | Typical use |
|---|---|---|---|
memory | Personal / session context (facts, experiences, preferences) | Hybrid vector + graph + keyword | Long-term per-user memory |
knowledge | Structured entities and relationships | Graph traversal + temporal queries | Evolving knowledge graphs |
rag | Unstructured documents (text, markdown, HTML) | Semantic similarity over embedded chunks | Document search, docs grounding, knowledge-base Q&A |
Core Operations
Embed a document
await client.rag.embed(
source="docs/auth.md",
content=open("docs/auth.md").read(),
doc_type="markdown",
)
# → chunks the document, embeds each chunk, stores in the vector indexSearch
results = await client.rag.search(
query="how does authentication work?",
k=3,
)
for r in results.data["results"]:
print(r["source"], r["score"])Clean up
await client.rag.delete_document("docs/auth.md")How It Works
- Chunking — documents are split into overlapping segments (default 1000 characters with 200-character overlap) so retrieval returns focused, self-contained snippets rather than entire files.
- Embedding — each chunk is converted to a 1536-dimensional vector using
text-embedding-3-small. - Storage — chunk metadata and content live in Postgres; vectors live in Qdrant. Both are scoped per organization.
- Retrieval — query text is embedded with the same model, then cosine similarity is computed against the org's chunks. Top-
kchunks are returned with their source, content, and similarity score.
Multi-tenancy
Every document and query is automatically scoped to the organization of your API key. You cannot see documents belonging to another org, and documents you embed are never exposed outside your org. This is enforced at the data layer — no bypass is possible through a missing header or malformed request.
API Reference
| Method | Endpoint | Description |
|---|---|---|
POST | /api/v1/rag/search | Semantic document search |
POST | /api/v1/rag/search/hybrid | Weighted semantic + keyword search |
POST | /api/v1/rag/search/similar | Find documents similar to given content |
POST | /api/v1/rag/documents | Embed a document |
POST | /api/v1/rag/documents/batch | Batch embed documents |
GET | /api/v1/rag/documents | List embedded documents |
DELETE | /api/v1/rag/documents/{source} | Delete a document |
GET | /api/v1/rag/stats | Collection statistics |