Why it exists
- iterate pages deterministically
- locate per-page artifacts under one document root
- support downstream incremental reads without semantic assumptions
get_document_structure() is the stable page-index contract. It returns document -> pages[]
and does not silently absorb semantic hierarchy.
documents/<documentId>/
document.json
structure.json
pages/
0001.json
0002.json
Stable contract
| Artifact | Purpose | Safe downstream assumption |
|---|---|---|
document.json | source metadata | tracks source path, snapshot, page count, artifact roots |
structure.json | page index | root.children stays a page list, not a semantic tree |
pages/0001.json | page content | contains page text, preview, and artifact path |
If you need headings or sections, use the semantic layer explicitly. The page index is intentionally flatter and more boring, because downstream tooling depends on it being mechanically stable.