EPecho-pdf docs
Concept

Page index stays stable.

get_document_structure() is the stable page-index contract. It returns document -> pages[] and does not silently absorb semantic hierarchy.

Why it exists

  • iterate pages deterministically
  • locate per-page artifacts under one document root
  • support downstream incremental reads without semantic assumptions

Artifact mapping

documents/<documentId>/
  document.json
  structure.json
  pages/
    0001.json
    0002.json
ArtifactPurposeSafe downstream assumption
document.jsonsource metadatatracks source path, snapshot, page count, artifact roots
structure.jsonpage indexroot.children stays a page list, not a semantic tree
pages/0001.jsonpage contentcontains page text, preview, and artifact path

No hidden promotion to semantics.

If you need headings or sections, use the semantic layer explicitly. The page index is intentionally flatter and more boring, because downstream tooling depends on it being mechanically stable.