Documentation Index
Fetch the complete documentation index at: https://docs.powabase.ai/llms.txt
Use this file to discover all available pages before exploring further.
What is a Knowledge Base?
A Knowledge Base (KB) is a container that holds one or more sources and makes their content searchable. When you add a source to a KB, the platform processes the extracted text using the configured indexing strategy and stores the results for retrieval. The indexing strategy determines how content is structured — from simple chunking to hierarchical document trees to structured JSON extraction. The retrieval strategy determines how queries find relevant content — from fast vector similarity to LLM-driven reasoning over document structure.Indexing Pipeline
The indexing pipeline runs automatically when you add a source to a knowledge base. What happens during indexing depends on the strategy: ChunkEmbed splits text and generates embeddings, PageIndex builds a hierarchical tree with LLM-generated summaries, GraphIndex extends PageIndex with cross-reference detection and node embeddings, and Doc2JSON extracts structured fields. You can reindex at any time to reprocess with different settings.Indexing Strategies
The indexing strategy controls how source content is processed and stored. Each strategy produces different data structures, supports different retrieval methods, and has different cost profiles. You set the strategy when creating or updating a knowledge base via the indexing_config field.| Strategy | Best For | Cost | Indexing Speed | Retrieval Speed | Compatible Retrieval |
|---|---|---|---|---|---|
| ChunkEmbed | General RAG, most documents | Low (embedding only) | Fast | Fast | Vector, Hybrid, Full-text |
| PageIndex | Long structured PDFs, complex docs | Medium–High (many LLM calls) | Slow (many LLM calls) | Slow (LLM at query time) | Tree Search only |
| GraphIndex | Cross-referenced documents | High (PageIndex + enrichment + embedding) | Slow (PageIndex + enrichment + embedding) | Fast (vector/hybrid, no LLM) | Vector, Hybrid, Full-text + graph expansion |
| Doc2JSON | Structured field extraction | Medium (LLM per window) | Medium (LLM per window) | Fast | Vector on summary |
ChunkEmbed (Default)
The standard RAG approach. Text is split into overlapping chunks using a configurable chunking strategy, each chunk is embedded into a vector, and the vectors are stored in pgvector for similarity search. A BM25 sparse index is also built over the chunk text, enabling keyword-based full-text search alongside vector search. This is the fastest and cheapest indexing strategy — no LLM calls are needed, only an embedding API call. ChunkEmbed pairs with vector search, hybrid search, or full-text search for retrieval. Hybrid search (which runs both vector and BM25 in parallel and fuses results) is the recommended default for production RAG applications.Chunking Strategies
ChunkEmbed supports three chunking strategies that control how text is split into searchable units. The chunking strategy is independent of the embedding model.| Chunker | How It Works | Best For |
|---|---|---|
| markdown_header (default) | Splits at Markdown headers (h1–h6), then subdivides each section by length using recursive chunking. Prepends the section header as context to each chunk. | The platform default — works well for documents with heading structure |
| recursive | Splits at natural boundaries in order of preference: paragraph breaks, line breaks, sentence endings, then words. Respects document flow. | Unstructured prose, articles, general text without clear heading structure |
| fixed_size | Splits at fixed token counts with word-boundary snapping. Simple and predictable chunk sizes. | Uniform content like logs, transcripts, or code where structure doesn’t matter |
PageIndex
PageIndex builds a hierarchical tree of your document’s structure — sections, subsections, and their content — using LLM-powered analysis. The result is stored as two artifacts: a lightweight ToC (titles and LLM-generated summaries, no full text) and a flat list of section nodes (with the actual text). Retrieval works in two phases: an LLM first reasons over the lightweight ToC to identify relevant sections by structure, then the platform fetches the full text of those sections. PageIndex has two pipelines that are selected automatically. For Markdown content, it parses headers into a nested tree, splits oversized leaf nodes using LLM calls, and generates per-node summaries. For PDFs (when page_texts are provided), it scans the first pages for a table of contents, calibrates page-number offsets against actual headings, infers structure via LLM if no ToC is found, then assigns page text to tree nodes and merges small siblings. Both pipelines cap LLM concurrency at 7 calls by default to avoid rate limits.GraphIndex
GraphIndex builds on PageIndex with two additional stages: cross-reference enrichment and node embedding. In stage one, it runs the same PageIndex pipeline to build the hierarchical document tree. In stage two, an LLM analyzes each node’s text against the full table of contents and identifies which other sections the node explicitly references — citations, mentions, dependencies, or cross-references (not structural parent/child relationships). These references are stored in each node’s metadata. In stage three, each node’s title and summary (plus its reference list) are embedded into a vector, and a BM25 sparse index is built over node text. The key advantage over plain PageIndex is retrieval flexibility. Because nodes have embeddings and a BM25 index, GraphIndex knowledge bases support vector search, hybrid search, and full-text search — the same fast retrieval methods as ChunkEmbed but over structured document sections instead of arbitrary chunks. After the initial retrieval, graph expansion automatically pulls in first-degree referenced nodes (sections that the matched sections explicitly cite), enriching results with related context. This makes GraphIndex suited for documents with dense internal references — regulatory frameworks, technical standards, codebases with cross-module dependencies.Doc2JSON
Doc2JSON extracts structured data from documents using a sliding-window LLM approach. You define a JSON schema with the fields you want to extract (names, types, descriptions, examples), and the platform slides a window across the document content. For each window, an LLM extracts a brief summary and fills in schema fields from the visible text. Extractions are merged across windows: scalar fields use last-value-wins, arrays accumulate new items, and objects are deep-merged. After all windows are processed, a final LLM call generates a combined document summary, which is embedded for vector retrieval. Doc2JSON supports two modes. Text mode (default) processes extracted text using token-based windows (default 4000 tokens, 200 overlap). Image mode processes page screenshots directly as multimodal content — useful for documents with complex layouts, tables, or forms where text extraction loses formatting. In image mode, pages are grouped into windows (default 3 pages per window) and sent as images to the LLM.Retrieval Strategies
The retrieval strategy controls how queries find relevant content within a knowledge base. You set the retrieval method when creating a KB or when calling the search endpoint. The right choice depends on your indexing strategy, query patterns, latency requirements, and budget.| Method | How It Works | Latency | Cost | Best For |
|---|---|---|---|---|
| vector_search | Embeds the query and finds nearest vectors via cosine similarity in pgvector | Very low (~100ms) | Low (one embed call) | Semantic matching — captures meaning even without shared keywords |
| full_text | BM25 keyword scoring with stemming via PostgreSQL tsvector | Low | None (no API calls) | Exact phrases, product names, error codes, IDs, proper nouns |
| hybrid (recommended) | Runs vector + BM25 in parallel, fuses results with Reciprocal Rank Fusion (k=60) | Low | Low (one embed call) | Production RAG — robust across query types |
| tree_search | LLM selects documents, then selects sections by reasoning over ToC structure | Medium (1–3s) | Medium (two LLM calls) | PageIndex KBs only — complex structural queries |
Vector Search
Vector search embeds the query using the same model as indexing, then finds the most similar chunk embeddings via cosine similarity in pgvector. It captures semantic meaning — “How do I reset my credentials?” will match chunks about password resets even without shared keywords. Results are ranked by similarity score (higher = more relevant). An optional similarity_threshold filters out low-quality matches.Full-Text Search (BM25)
Full-text search uses BM25 scoring — a keyword relevance algorithm that considers term frequency, document length, and inverse document frequency. Terms are stemmed using PostgreSQL’s English dictionary (to_tsvector), so “running” matches “run”. BM25 uses standard parameters: k1=1.2 for term frequency saturation and b=0.75 for length normalization. No API calls are needed — scoring runs entirely in PostgreSQL. This complements vector search by catching results that share keywords but may not be semantically close in embedding space.Hybrid Search (Recommended)
Hybrid search runs both vector search and full-text search in parallel, then fuses the results using Reciprocal Rank Fusion (RRF). RRF merges ranked lists without needing to normalize incompatible score ranges — it uses rank positions, not scores. The formula: rrf_score(d) = sum of weight / (k + rank) across all lists, with k=60 (the original RRF paper constant). The vector_weight parameter (default 0.5) balances the two signals: higher values favor semantic matches, lower values favor keyword matches. Results are normalized so the top result has score 1.0.Tree Search
Tree Search is a two-phase LLM-driven retrieval method for PageIndex and GraphIndex knowledge bases. In phase one, the LLM reviews compact summaries of each indexed document (name, description, top-level section titles) and selects which documents are relevant. In phase two, the LLM examines the selected documents’ full ToC structure (section titles and summaries, no full text) and identifies the most relevant sections — returning up to top_k node IDs. The platform then fetches the full text of those sections from the database. For multi-document KBs, node IDs are globally prefixed (e.g., d0:0001, d1:0005) so the LLM can reference sections across documents. Response parsing is robust: it tries JSON first, then falls back to regex pattern matching, and validates all returned IDs against the actual tree structure to prevent hallucinated references.Tree Search requires PageIndexTree Search reads from the page_index_toc and page_index_nodes tables. It is only compatible with the PageIndex strategy. GraphIndex uses vector/hybrid/full-text search over node embeddings instead.
Reranking
Reranking is an optional second stage that improves retrieval precision. The initial retrieval (vector, hybrid, or full-text) fetches a broad candidate pool — by default 20 items (the candidate_count parameter). A cross-encoder reranker then re-scores each candidate by evaluating the query-document pair jointly. Cross-encoders are more accurate than bi-encoder embeddings because they see the query and document together, but they can’t be used for initial retrieval because they don’t produce storable vectors. After reranking, the top_k results are returned to the caller.| Reranker | Provider | Notes |
|---|---|---|
| Rerank English v3.0 (default) | Cohere | High quality, English-optimized |
| Rerank Multilingual v3.0 | Cohere | Multilingual support |
| Jina Reranker v2 Base Multilingual | Jina AI | Multilingual, competitive quality |
| Rerank 2.5 | Voyage | Strong general-purpose reranker |
| Rerank 2.5 Lite | Voyage | Lighter variant, lower cost |
Reranker API keys are platform-managedReranker API keys (Cohere, Jina, Voyage) are configured at the platform level by your administrator, not per-organization. If reranking returns errors, contact your platform admin to verify the reranker provider key is configured.
Embedding Models
Embeddings convert text into high-dimensional vectors that capture semantic meaning. The platform uses OpenAI’s text-embedding-3-small by default (1536 dimensions). All chunks in a knowledge base must use the same embedding model — if you change the model, you must reindex. Embedding calls are batched at up to 250,000 tokens per API call for efficiency. The platform supports embedding models from multiple providers via LiteLLM — select your preferred model in Settings > Knowledge Indexing.| Model | Provider | Dimensions | Tradeoff |
|---|---|---|---|
| text-embedding-3-small (default) | OpenAI | 1536 | Best balance of quality, cost, and speed. Fits within HNSW index limit. |
| text-embedding-3-large | OpenAI | 3072 | Higher quality, 2x storage. Exceeds HNSW dimension limit — see note below. |
| text-embedding-ada-002 | OpenAI | 1536 | Legacy model — use text-embedding-3-small instead. |
| embed-english-v3.0 | Cohere | 1024 | High-quality English embeddings. Fits within HNSW limit. |
| embed-multilingual-v3.0 | Cohere | 1024 | Multilingual support across 100+ languages. |
| embed-english-light-v3.0 / embed-multilingual-light-v3.0 | Cohere | 384 | Lightweight variants — faster and cheaper, lower quality. |
| voyage/voyage-01 | Voyage AI | 1024 | Strong general-purpose embeddings from Voyage AI. |
| gemini/text-embedding-004 | 768 | Google Gemini embedding model. | |
| mistral/mistral-embed | Mistral | 1024 | Mistral AI embedding model. |
Embedding provider API keysOpenAI embeddings use the platform-managed OPENAI_API_KEY. For other providers (Cohere, Voyage, Mistral, Google), the corresponding API key environment variable (e.g. COHERE_API_KEY, VOYAGE_API_KEY, MISTRAL_API_KEY) must be configured at the platform level by your administrator before creating projects. These keys are passed through to LiteLLM at runtime. Contact your platform admin if a non-OpenAI embedding model returns authentication errors.
HNSW index dimension limitThe platform uses pgvector HNSW indexes for fast approximate nearest-neighbor search. HNSW indexes support a maximum of 2000 dimensions. The default model text-embedding-3-small (1536 dimensions) fits within this limit and gets full HNSW acceleration. Models with more than 2000 dimensions (like text-embedding-3-large at 3072) fall back to sequential scan — still correct but significantly slower for large knowledge bases.
Searching a Knowledge Base
Once indexed, you can search a knowledge base with any natural language query. The search uses whichever retrieval method was configured on the KB, or you can override it per-request. Results include the matched text, relevance scores, and source metadata.Reindexing
Recommended Configurations
| Use Case | Indexing | Retrieval | Notes |
|---|---|---|---|
| General RAG (default) | ChunkEmbed (2000 tokens, 50 overlap) | Hybrid Search | Works for most documents. Add a reranker for higher precision. |
| Long structured PDFs | PageIndex | Tree Search | Compliance, legal, technical specs. Higher cost but superior structure-aware retrieval. |
| Cross-referenced documents | GraphIndex | Hybrid Search | Regulations, standards. Vector/hybrid search over node embeddings with automatic graph expansion of referenced sections. |
| Keyword-heavy content | ChunkEmbed | Full-Text Search | Logs, code, error messages. BM25 excels at exact matches without embedding cost. |
| Invoice / form extraction | Doc2JSON | Vector Search | Define a schema and extract structured fields from documents. |
Project-Level Defaults
You can configure project-wide defaults for all indexing and retrieval parameters in Settings > Knowledge Indexing and Settings > Knowledge Retrieval. These defaults apply to newly created knowledge bases unless overridden in the indexing_config or retrieval_config at creation time. Settings include chunk sizes, embedding models, LLM models for PageIndex and GraphIndex, reranker configuration, and many advanced tuning parameters.Next Steps
Create a Knowledge Base
Step-by-step guide to creating and indexing a KB.
Agents & Tools
Attach a KB to an agent for RAG-powered conversations.
Knowledge Bases API Reference
Full endpoint documentation.