Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.powabase.ai/llms.txt

Use this file to discover all available pages before exploring further.

What is a Knowledge Base?

A Knowledge Base (KB) is a container that holds one or more sources and makes their content searchable. When you add a source to a KB, the platform processes the extracted text using the configured indexing strategy and stores the results for retrieval. The indexing strategy determines how content is structured — from simple chunking to hierarchical document trees to structured JSON extraction. The retrieval strategy determines how queries find relevant content — from fast vector similarity to LLM-driven reasoning over document structure.

Indexing Pipeline

The indexing pipeline runs automatically when you add a source to a knowledge base. What happens during indexing depends on the strategy: ChunkEmbed splits text and generates embeddings, PageIndex builds a hierarchical tree with LLM-generated summaries, GraphIndex extends PageIndex with cross-reference detection and node embeddings, and Doc2JSON extracts structured fields. You can reindex at any time to reprocess with different settings.
Indexing pipeline: Source page texts → Indexing Strategy (ChunkEmbed, PageIndex, GraphIndex, or Doc2JSON) → Storage (pgvector chunks, tree nodes, or structured JSON).

Indexing Strategies

The indexing strategy controls how source content is processed and stored. Each strategy produces different data structures, supports different retrieval methods, and has different cost profiles. You set the strategy when creating or updating a knowledge base via the indexing_config field.
StrategyBest ForCostIndexing SpeedRetrieval SpeedCompatible Retrieval
ChunkEmbedGeneral RAG, most documentsLow (embedding only)FastFastVector, Hybrid, Full-text
PageIndexLong structured PDFs, complex docsMedium–High (many LLM calls)Slow (many LLM calls)Slow (LLM at query time)Tree Search only
GraphIndexCross-referenced documentsHigh (PageIndex + enrichment + embedding)Slow (PageIndex + enrichment + embedding)Fast (vector/hybrid, no LLM)Vector, Hybrid, Full-text + graph expansion
Doc2JSONStructured field extractionMedium (LLM per window)Medium (LLM per window)FastVector on summary

ChunkEmbed (Default)

The standard RAG approach. Text is split into overlapping chunks using a configurable chunking strategy, each chunk is embedded into a vector, and the vectors are stored in pgvector for similarity search. A BM25 sparse index is also built over the chunk text, enabling keyword-based full-text search alongside vector search. This is the fastest and cheapest indexing strategy — no LLM calls are needed, only an embedding API call. ChunkEmbed pairs with vector search, hybrid search, or full-text search for retrieval. Hybrid search (which runs both vector and BM25 in parallel and fuses results) is the recommended default for production RAG applications.

Chunking Strategies

ChunkEmbed supports three chunking strategies that control how text is split into searchable units. The chunking strategy is independent of the embedding model.
ChunkerHow It WorksBest For
markdown_header (default)Splits at Markdown headers (h1–h6), then subdivides each section by length using recursive chunking. Prepends the section header as context to each chunk.The platform default — works well for documents with heading structure
recursiveSplits at natural boundaries in order of preference: paragraph breaks, line breaks, sentence endings, then words. Respects document flow.Unstructured prose, articles, general text without clear heading structure
fixed_sizeSplits at fixed token counts with word-boundary snapping. Simple and predictable chunk sizes.Uniform content like logs, transcripts, or code where structure doesn’t matter
Chunk size and overlap control the tradeoff between precision and context. Smaller chunks (500–1000 tokens) give more precise retrieval but may lose surrounding context. Larger chunks (2000–4000 tokens) preserve context but can dilute relevance. The defaults (2000 tokens, 50 token overlap) work well for most use cases.
# Create a KB with ChunkEmbed (the default strategy)
response = requests.post(
    f"{BASE_URL}/api/knowledge-bases",
    headers=headers,
    json={
        "name": "Product Docs",
        "indexing_config": {
            "strategy": "chunk_embed",
            "chunk_size": 1500,
            "overlap": 100,
            "embedding_model": "text-embedding-3-small",
        },
    },
)
kb = response.json()

PageIndex

PageIndex builds a hierarchical tree of your document’s structure — sections, subsections, and their content — using LLM-powered analysis. The result is stored as two artifacts: a lightweight ToC (titles and LLM-generated summaries, no full text) and a flat list of section nodes (with the actual text). Retrieval works in two phases: an LLM first reasons over the lightweight ToC to identify relevant sections by structure, then the platform fetches the full text of those sections. PageIndex has two pipelines that are selected automatically. For Markdown content, it parses headers into a nested tree, splits oversized leaf nodes using LLM calls, and generates per-node summaries. For PDFs (when page_texts are provided), it scans the first pages for a table of contents, calibrates page-number offsets against actual headings, infers structure via LLM if no ToC is found, then assigns page text to tree nodes and merges small siblings. Both pipelines cap LLM concurrency at 7 calls by default to avoid rate limits.
PageIndex is expensive to indexPageIndex makes many LLM calls during indexing — for ToC detection, structure inference, oversized node splitting, and summary generation. A 100-page PDF may take several minutes and consume significant LLM tokens. Use this strategy when document structure matters for retrieval quality — compliance manuals, legal contracts, technical specifications, academic papers.
PageIndex retrieval uses LLM reasoning, not vectorsTree Search retrieval does not use vector similarity. It sends the document’s structural outline (titles + summaries, no text) to an LLM and asks it to identify the most relevant sections. This means each retrieval call incurs LLM cost and latency — typically 1–3 seconds per query. For high-throughput, latency-sensitive workloads, ChunkEmbed with hybrid search is more appropriate.
# Create a KB with PageIndex for structured document retrieval
response = requests.post(
    f"{BASE_URL}/api/knowledge-bases",
    headers=headers,
    json={
        "name": "Compliance Manual",
        "indexing_config": {
            "strategy": "page_index",
            "extra": {
                "model": "gpt-4o",
                "if_add_node_summary": "yes",
            },
        },
        "retrieval_config": {
            "method": "tree_search",
            "top_k": 5,
        },
    },
)

GraphIndex

GraphIndex builds on PageIndex with two additional stages: cross-reference enrichment and node embedding. In stage one, it runs the same PageIndex pipeline to build the hierarchical document tree. In stage two, an LLM analyzes each node’s text against the full table of contents and identifies which other sections the node explicitly references — citations, mentions, dependencies, or cross-references (not structural parent/child relationships). These references are stored in each node’s metadata. In stage three, each node’s title and summary (plus its reference list) are embedded into a vector, and a BM25 sparse index is built over node text. The key advantage over plain PageIndex is retrieval flexibility. Because nodes have embeddings and a BM25 index, GraphIndex knowledge bases support vector search, hybrid search, and full-text search — the same fast retrieval methods as ChunkEmbed but over structured document sections instead of arbitrary chunks. After the initial retrieval, graph expansion automatically pulls in first-degree referenced nodes (sections that the matched sections explicitly cite), enriching results with related context. This makes GraphIndex suited for documents with dense internal references — regulatory frameworks, technical standards, codebases with cross-module dependencies.
GraphIndex is expensive to index, but cheap to retrieveGraphIndex performs all the LLM work of PageIndex (tree building, node splitting, summary generation) plus one additional LLM call per node for cross-reference detection, plus embedding computation for every node. For a document with 50 sections, that means ~50 extra LLM calls on top of the PageIndex work. Enrichment concurrency is capped at 7 by default. However, unlike PageIndex, retrieval is fast and cheap — it uses vector, hybrid, or full-text search over node embeddings (no LLM calls at query time). Use GraphIndex when you want structural awareness during indexing with fast, scalable retrieval.
# Create a KB with GraphIndex for cross-referenced document retrieval
response = requests.post(
    f"{BASE_URL}/api/knowledge-bases",
    headers=headers,
    json={
        "name": "Regulatory Framework",
        "indexing_config": {
            "strategy": "graph_index",
            "extra": {
                "model": "gpt-4o",
                "enrichment_model": "gpt-4o",
                "embedding_model": "text-embedding-3-small",
                "if_add_node_summary": "yes",
            },
        },
        "retrieval_config": {
            "method": "hybrid",
            "top_k": 10,
        },
    },
)

Doc2JSON

Doc2JSON extracts structured data from documents using a sliding-window LLM approach. You define a JSON schema with the fields you want to extract (names, types, descriptions, examples), and the platform slides a window across the document content. For each window, an LLM extracts a brief summary and fills in schema fields from the visible text. Extractions are merged across windows: scalar fields use last-value-wins, arrays accumulate new items, and objects are deep-merged. After all windows are processed, a final LLM call generates a combined document summary, which is embedded for vector retrieval. Doc2JSON supports two modes. Text mode (default) processes extracted text using token-based windows (default 4000 tokens, 200 overlap). Image mode processes page screenshots directly as multimodal content — useful for documents with complex layouts, tables, or forms where text extraction loses formatting. In image mode, pages are grouped into windows (default 3 pages per window) and sent as images to the LLM.
# Create a KB with Doc2JSON for invoice extraction
response = requests.post(
    f"{BASE_URL}/api/knowledge-bases",
    headers=headers,
    json={
        "name": "Invoice Extraction",
        "indexing_config": {
            "strategy": "doc2json",
            "extra": {
                "json_schema": {
                    "fields": [
                        {"name": "vendor_name", "type": "string", "description": "Company that issued the invoice"},
                        {"name": "invoice_date", "type": "string", "description": "Date of the invoice"},
                        {"name": "total_amount", "type": "number", "description": "Total amount due"},
                        {"name": "line_items", "type": "array", "description": "Individual line items",
                         "item_type": "object", "items": {
                            "type": "object", "fields": [
                                {"name": "description", "type": "string"},
                                {"name": "quantity", "type": "integer"},
                                {"name": "unit_price", "type": "number"},
                            ]
                        }},
                    ]
                },
                "extraction_model": "gpt-4o",
            },
        },
    },
)

Retrieval Strategies

The retrieval strategy controls how queries find relevant content within a knowledge base. You set the retrieval method when creating a KB or when calling the search endpoint. The right choice depends on your indexing strategy, query patterns, latency requirements, and budget.
MethodHow It WorksLatencyCostBest For
vector_searchEmbeds the query and finds nearest vectors via cosine similarity in pgvectorVery low (~100ms)Low (one embed call)Semantic matching — captures meaning even without shared keywords
full_textBM25 keyword scoring with stemming via PostgreSQL tsvectorLowNone (no API calls)Exact phrases, product names, error codes, IDs, proper nouns
hybrid (recommended)Runs vector + BM25 in parallel, fuses results with Reciprocal Rank Fusion (k=60)LowLow (one embed call)Production RAG — robust across query types
tree_searchLLM selects documents, then selects sections by reasoning over ToC structureMedium (1–3s)Medium (two LLM calls)PageIndex KBs only — complex structural queries
Vector search embeds the query using the same model as indexing, then finds the most similar chunk embeddings via cosine similarity in pgvector. It captures semantic meaning — “How do I reset my credentials?” will match chunks about password resets even without shared keywords. Results are ranked by similarity score (higher = more relevant). An optional similarity_threshold filters out low-quality matches.

Full-Text Search (BM25)

Full-text search uses BM25 scoring — a keyword relevance algorithm that considers term frequency, document length, and inverse document frequency. Terms are stemmed using PostgreSQL’s English dictionary (to_tsvector), so “running” matches “run”. BM25 uses standard parameters: k1=1.2 for term frequency saturation and b=0.75 for length normalization. No API calls are needed — scoring runs entirely in PostgreSQL. This complements vector search by catching results that share keywords but may not be semantically close in embedding space. Hybrid search runs both vector search and full-text search in parallel, then fuses the results using Reciprocal Rank Fusion (RRF). RRF merges ranked lists without needing to normalize incompatible score ranges — it uses rank positions, not scores. The formula: rrf_score(d) = sum of weight / (k + rank) across all lists, with k=60 (the original RRF paper constant). The vector_weight parameter (default 0.5) balances the two signals: higher values favor semantic matches, lower values favor keyword matches. Results are normalized so the top result has score 1.0.
# Create a KB with hybrid search retrieval
response = requests.post(
    f"{BASE_URL}/api/knowledge-bases",
    headers=headers,
    json={
        "name": "Product Docs",
        "indexing_config": {
            "strategy": "chunk_embed",
            "chunk_size": 2000,
            "overlap": 50,
        },
        "retrieval_config": {
            "method": "hybrid",
            "top_k": 10,
            "vector_weight": 0.6,
        },
    },
)
Tree Search is a two-phase LLM-driven retrieval method for PageIndex and GraphIndex knowledge bases. In phase one, the LLM reviews compact summaries of each indexed document (name, description, top-level section titles) and selects which documents are relevant. In phase two, the LLM examines the selected documents’ full ToC structure (section titles and summaries, no full text) and identifies the most relevant sections — returning up to top_k node IDs. The platform then fetches the full text of those sections from the database. For multi-document KBs, node IDs are globally prefixed (e.g., d0:0001, d1:0005) so the LLM can reference sections across documents. Response parsing is robust: it tries JSON first, then falls back to regex pattern matching, and validates all returned IDs against the actual tree structure to prevent hallucinated references.
Tree Search requires PageIndexTree Search reads from the page_index_toc and page_index_nodes tables. It is only compatible with the PageIndex strategy. GraphIndex uses vector/hybrid/full-text search over node embeddings instead.

Reranking

Reranking is an optional second stage that improves retrieval precision. The initial retrieval (vector, hybrid, or full-text) fetches a broad candidate pool — by default 20 items (the candidate_count parameter). A cross-encoder reranker then re-scores each candidate by evaluating the query-document pair jointly. Cross-encoders are more accurate than bi-encoder embeddings because they see the query and document together, but they can’t be used for initial retrieval because they don’t produce storable vectors. After reranking, the top_k results are returned to the caller.
RerankerProviderNotes
Rerank English v3.0 (default)CohereHigh quality, English-optimized
Rerank Multilingual v3.0CohereMultilingual support
Jina Reranker v2 Base MultilingualJina AIMultilingual, competitive quality
Rerank 2.5VoyageStrong general-purpose reranker
Rerank 2.5 LiteVoyageLighter variant, lower cost
Reranker API keys are platform-managedReranker API keys (Cohere, Jina, Voyage) are configured at the platform level by your administrator, not per-organization. If reranking returns errors, contact your platform admin to verify the reranker provider key is configured.

Embedding Models

Embeddings convert text into high-dimensional vectors that capture semantic meaning. The platform uses OpenAI’s text-embedding-3-small by default (1536 dimensions). All chunks in a knowledge base must use the same embedding model — if you change the model, you must reindex. Embedding calls are batched at up to 250,000 tokens per API call for efficiency. The platform supports embedding models from multiple providers via LiteLLM — select your preferred model in Settings > Knowledge Indexing.
ModelProviderDimensionsTradeoff
text-embedding-3-small (default)OpenAI1536Best balance of quality, cost, and speed. Fits within HNSW index limit.
text-embedding-3-largeOpenAI3072Higher quality, 2x storage. Exceeds HNSW dimension limit — see note below.
text-embedding-ada-002OpenAI1536Legacy model — use text-embedding-3-small instead.
embed-english-v3.0Cohere1024High-quality English embeddings. Fits within HNSW limit.
embed-multilingual-v3.0Cohere1024Multilingual support across 100+ languages.
embed-english-light-v3.0 / embed-multilingual-light-v3.0Cohere384Lightweight variants — faster and cheaper, lower quality.
voyage/voyage-01Voyage AI1024Strong general-purpose embeddings from Voyage AI.
gemini/text-embedding-004Google768Google Gemini embedding model.
mistral/mistral-embedMistral1024Mistral AI embedding model.
Embedding provider API keysOpenAI embeddings use the platform-managed OPENAI_API_KEY. For other providers (Cohere, Voyage, Mistral, Google), the corresponding API key environment variable (e.g. COHERE_API_KEY, VOYAGE_API_KEY, MISTRAL_API_KEY) must be configured at the platform level by your administrator before creating projects. These keys are passed through to LiteLLM at runtime. Contact your platform admin if a non-OpenAI embedding model returns authentication errors.
HNSW index dimension limitThe platform uses pgvector HNSW indexes for fast approximate nearest-neighbor search. HNSW indexes support a maximum of 2000 dimensions. The default model text-embedding-3-small (1536 dimensions) fits within this limit and gets full HNSW acceleration. Models with more than 2000 dimensions (like text-embedding-3-large at 3072) fall back to sequential scan — still correct but significantly slower for large knowledge bases.

Searching a Knowledge Base

Once indexed, you can search a knowledge base with any natural language query. The search uses whichever retrieval method was configured on the KB, or you can override it per-request. Results include the matched text, relevance scores, and source metadata.
response = requests.post(
    f"{BASE_URL}/api/knowledge-bases/{kb_id}/search",
    headers=headers,
    json={"query": "How do I reset my password?", "top_k": 5},
)
results = response.json()
for chunk in results["results"]:
    print(f"Score: {chunk['similarity']:.3f}")
    print(chunk["content"][:200])

Reindexing

Reindexing replaces all indexed contentWhen you reindex a knowledge base, all existing chunks, tree nodes, or extracted JSON are deleted and recreated from scratch. The KB remains searchable during reindexing but results may be incomplete until it finishes. For large KBs with PageIndex or GraphIndex, reindexing can take significant time and LLM tokens.
Use CaseIndexingRetrievalNotes
General RAG (default)ChunkEmbed (2000 tokens, 50 overlap)Hybrid SearchWorks for most documents. Add a reranker for higher precision.
Long structured PDFsPageIndexTree SearchCompliance, legal, technical specs. Higher cost but superior structure-aware retrieval.
Cross-referenced documentsGraphIndexHybrid SearchRegulations, standards. Vector/hybrid search over node embeddings with automatic graph expansion of referenced sections.
Keyword-heavy contentChunkEmbedFull-Text SearchLogs, code, error messages. BM25 excels at exact matches without embedding cost.
Invoice / form extractionDoc2JSONVector SearchDefine a schema and extract structured fields from documents.

Project-Level Defaults

You can configure project-wide defaults for all indexing and retrieval parameters in Settings > Knowledge Indexing and Settings > Knowledge Retrieval. These defaults apply to newly created knowledge bases unless overridden in the indexing_config or retrieval_config at creation time. Settings include chunk sizes, embedding models, LLM models for PageIndex and GraphIndex, reranker configuration, and many advanced tuning parameters.

Next Steps

Create a Knowledge Base

Step-by-step guide to creating and indexing a KB.

Agents & Tools

Attach a KB to an agent for RAG-powered conversations.

Knowledge Bases API Reference

Full endpoint documentation.