Documents and Storage
Importing Documents
Section titled “Importing Documents”-
Select the import tool
In the Storage tab, click Import. A selector displays the available import tools, organized by category (Files, Triggers, API, Databases).
-
Configure the fields
Fill in the fields specific to the selected tool:
- Document name and description
- Additional settings (delimiter, format, etc.)
- Import scheduling (optional)
-
Fill the metadata (optional)
Below the main fields, expand the Metadata accordion to set business attributes that travel with the document and are inherited by every chunk:
Field Description Source Origin of the content (e.g. manual,meeting,crm). Used by quick filters and duplicate detection.Subject Subject or topic of the document. Language Document language (e.g. pt-BR,en-US).Tags Comma-separated free-form labels. Each metadata field supports Magic Fill — click the sparkle icon to let AI suggest a value based on the file (its content for PDFs, otherwise its name) and the fields already filled in.
-
Upload the file
Drag and drop the file into the upload area or click to select. The system validates the format and size before sending.
-
Track processing
A progress bar starts at 1% as soon as the upload begins and advances as the document moves through the upload → storage → embedding pipeline. The bar replaces the previous spinner so the user always sees concrete progress.
Spreadsheets: exact answers from tabular data
Section titled “Spreadsheets: exact answers from tabular data”Spreadsheets uploaded to the knowledge base (.xlsx, .xls, .csv, .tsv, .ods) get dedicated handling: besides being indexed for search, their data becomes queryable for analytical questions.
When you ask something like “what are the total sales by segment?” or “how many units of product X were sold in Germany?”, the assistant runs a real query over the spreadsheet data — instead of guessing from text — and returns exact numbers: totals, sums, averages, counts, group-bys and filters.
- Where it works: Playground, communicators (Slack/Teams), workflows (RAG nodes) and MCP clients — the same capability across every surface.
- Each sheet is a queryable unit: multi-sheet workbooks are indexed per sheet, each with its own schema (columns and types).
- Multiple spreadsheets: when more than one spreadsheet is relevant, the system picks the right one or returns the result per spreadsheet. It does not sum data of different natures (distinct currencies/definitions) — in that case it shows the per-source breakdown instead of a meaningless total.
- Descriptive questions (“what’s in this spreadsheet?”) get a column summary and a sample of rows.
Storage-First Architecture
Section titled “Storage-First Architecture”Every document — regardless of how it entered the Knowledge Base (manual upload, workflow, agent, API) — is first persisted as a storage entry and only then indexed as embeddings. This guarantees:
- A single inventory of documents visible in the listing, no matter the origin.
- Business metadata is owned by the storage row and replicated into every chunk.
- Cascade deletion: removing a storage entry automatically drops all embeddings linked to it.
Import Categories
Section titled “Import Categories”| Category | Description |
|---|---|
| Files | Direct document upload (CSV, PDF, DOCX, MD, etc.) |
| Triggers | Event-based imports |
| API | Data obtained from REST endpoints |
| Databases | Direct database connections |
REST API Import
Section titled “REST API Import”For external data sources via API, you can configure:
| Field | Description |
|---|---|
| Base URL | Server address |
| Endpoint | Resource path |
| HTTP method | GET, POST, PUT, PATCH, DELETE |
| Authentication | None, Basic, Bearer, API Key |
| Parameters | Query parameters and headers |
| Response format | JSON, CSV, XML |
| Pagination | Automatic pagination configuration |
| Retries | Retry configuration on failure |
Import Scheduling
Section titled “Import Scheduling”Configure recurring automatic imports:
| Frequency | Options |
|---|---|
| Hourly | Every N hours |
| Daily | Specific time |
| Weekly | Days of the week + time |
| Monthly | Day of the month + time |
Managing Documents
Section titled “Managing Documents”The Storage tab displays all imported documents in a table, regardless of whether they were uploaded manually or produced automatically by an agent or workflow.
Listing Columns
Section titled “Listing Columns”| Column | Description |
|---|---|
| Name | Document name prefixed by the tool logo (the icon identifies the import source at a glance). |
| Updated at | Date of the last update. |
| File size | Size of the original file (formatted, e.g. 1.2 MB). |
| Status | Processing state as a progress bar that advances during indexing. |
| Quality | Overall quality score rendered as a 5-star scale (0–10 mapped to half-stars, amber). |
| Actions | Sticky column on the right side — stays visible while the user scrolls the table horizontally. |
Sorting
Section titled “Sorting”The table is sorted by Updated at in descending order (most recent first) by default. Click a column header to sort by it; click again to toggle the direction (ascending/descending). An arrow in the header indicates the active column and direction.
Sortable columns: Name, Updated at, File size, Status and Quality. Sorting is applied on the server, so it spans the entire collection — not just the current page — and changing it returns the listing to the first page. Column resizing keeps working as usual, without triggering sorting.
Search and Quick Filters
Section titled “Search and Quick Filters”The search bar shares its row with the filter button and a set of quick filter cards that apply client-side:
| Quick filter | Behavior |
|---|---|
| All | Default state — no client-side filter applied. |
| Type | Popover with the document types present in the current page (markdown, pdf, csv, etc.). |
| Facets | Popover with the metadata facets indexed on the documents (source, subject, language, tags). |
Quick filters reset automatically when the user changes the text search or the server-side filters, keeping the listing coherent.
Additional search capabilities:
- Text search — name, description and metadata.
- Date search — filter by creation or update period.
- Status filter — filter by processing state.
- Quality and file-size filters — available through the main filter button (
Category Filter). - Column visibility — show or hide table columns.
Document Status
Section titled “Document Status”| Status | Description |
|---|---|
| Active | Document available for querying |
| Completed | Processing finished |
| Embedded | Embeddings generated successfully |
| Processing | Embedding generation in progress (progress bar advancing) |
| Pending | Awaiting processing |
| Stored | File saved in the system |
| Partial | Partially processed |
| Failed | Processing error |
Actions
Section titled “Actions”The row actions menu (right side of each row) contains:
| Action | Icon | Description |
|---|---|---|
| Info | i inside a circle | Opens the Document Details modal. |
| Download | Download icon | Available when the document has a file in Storage. |
| Delete | Trash icon | Removes the document with a confirmation dialog. |
Document Details Modal
Section titled “Document Details Modal”Selecting Info opens a fixed-height modal organized into four tabs:
| Tab | Content |
|---|---|
| Details | Document name, description (full text), file information (size, type), origin, dates. |
| Metadata | All keys persisted in storage.metadata, including facets and processing data. JSON values (e.g. attendee lists, structured objects) are detected automatically and rendered as formatted blocks instead of raw strings. For spreadsheets, it also shows a readable Schema block (sheets, columns and types). |
| Quality | Quality, completeness and relevance scores; processing block with chunk method, document type, model, provider and timestamp. For spreadsheets, the chunk method is sheet (one unit per sheet). |
| Chunks | Paginated list of chunks generated for the document. Visible only to super-admin users. For spreadsheets, the tab is labeled Sheets, showing the sheet name, row count and columns. |
The footer of the modal exposes Download and Delete as the primary actions, alongside Close.
Idempotent Indexing
Section titled “Idempotent Indexing”Documents produced automatically (e.g. by agents and workflows) use an upsert flow rather than a blind insert:
- A set of
upsertKeys(typicallysourceplus business identifiers such asmeeting_title,meeting_date,organizer_email,document_type) is matched against existing storage entries. - Match found → the existing storage file is overwritten in place, old embeddings are removed, and re-indexing produces a fresh set of chunks while preserving the original
storage_idandcreated_at. - No match → a new storage entry is created normally.
- Identical content → the duplicate is detected during indexing, the orphan storage row created in the meantime is removed, and the listing keeps the original entry untouched.
This makes re-runs of the same source safe and prevents the listing from filling up with duplicates.
Chunking Strategy
Section titled “Chunking Strategy”Before a document is indexed, Prodgy splits it into chunks. By default the platform picks the best strategy automatically by analyzing the content — headings, speaker turns, code blocks, page breaks, length, and so on.
Workflow Knowledge Base nodes (operations Save in storage and Upsert in storage) expose an optional Chunk strategy field that lets you override this automatic choice when a specific behavior is required — for example, keeping a meeting summary as a single block instead of letting it be split by heading.
| Option | Behavior |
|---|---|
| Automatic (content-based) | Default. Prodgy analyzes the content and selects the strategy. Identical to the previous behavior. |
| Single block | Stores the whole document as a single chunk. |
| By heading | Splits on Markdown / section headings. |
| By speaker | Splits on speaker turns (meeting transcriptions). |
| By page | Splits on page breaks (PDFs). |
| By code block | Splits on code fences. |
| Semantic | Groups semantically related passages. |
| By sentence | Splits on sentence boundaries. |
| Fixed size | Splits into fixed-size windows. |