Upload Your First Document

Uploading a document creates a Source — the platform’s representation of your file. After upload, an asynchronous extraction pipeline converts the file into structured page texts. You’ll poll for status and then retrieve the extracted content.

Prerequisites:

Authentication configured (see Authentication guide)

Upload a file

Send a multipart form-data request with your file. The server starts extraction automatically and returns the source metadata.Endpoint: POST /api/sources/upload

with open("document.pdf", "rb") as f:
    response = requests.post(
        f"{BASE_URL}/api/sources/upload",
        headers={"apikey": API_KEY, "Authorization": f"Bearer {API_KEY}"},
        files={"file": ("document.pdf", f, "application/pdf")},
    )
source = response.json()
source_id = source["id"]
print(f"Source created: {source_id}")

Response:

{
  "id": "source-uuid",
  "filename": "document.pdf",
  "content_type": "application/pdf",
  "extraction_status": "pending",
  "created_at": "2026-01-01T00:00:00Z"
}

Check extraction status

Poll the source until extraction_status reaches a terminal state: extracted (success), attention_required (partial — some pages failed but the source is still indexable), failed, or cancelled. Typically takes a few seconds for small documents.Endpoint: GET /api/sources/{id}

import time

TERMINAL = {"extracted", "attention_required", "failed", "cancelled"}
while True:
    response = requests.get(
        f"{BASE_URL}/api/sources/{source_id}",
        headers=headers,
    )
    body = response.json()
    status = body["extraction_status"]
    if status == "extracted":
        print("Extraction complete!")
        break
    elif status == "attention_required":
        print("Extraction partial:", body.get("error_message"))
        break
    elif status in ("failed", "cancelled"):
        print(f"Extraction {status}:", body.get("error_message"))
        break
    time.sleep(2)

View extracted content

Retrieve the extracted text, organized by page. Each page includes its text content and page number.Endpoint: GET /api/sources/{id}/page-texts

response = requests.get(
    f"{BASE_URL}/api/sources/{source_id}/page-texts",
    headers=headers,
)
pages = response.json()
for page in pages:
    print(f"Page {page['page']}: {page['text'][:100]}...")

What’s Next

Create a Knowledge Base

Index your extracted content for semantic search.

Sources & Extraction

Understand the extraction pipeline in depth.

Sources API Reference

Full endpoint documentation.

Using the Platform with AI Coding Assistants Create a Knowledge Base

⌘I

Getting Started

Concepts

Guides

API Reference

Upload Your First Document

What’s Next

Create a Knowledge Base

Sources & Extraction

Sources API Reference

Getting Started

Concepts

Guides

API Reference

Documentation Index

​What’s Next

Create a Knowledge Base

Sources & Extraction

Sources API Reference

What’s Next