Primary path
Agent-first extraction using local provider/model configuration.
detector = agent-structured-v1
get_semantic_document_structure() writes semantic-structure.json alongside the page index.
It adds heading and section structure without changing the shape of pages[].
Agent-first extraction using local provider/model configuration.
detector = agent-structured-v1
Conservative heuristic fallback when no model is configured or extraction fails.
detector = heading-heuristic-v1
Read detector and strategy metadata before assuming semantic richness or cache reuse.
Metadata that matters
| Field | Why it exists |
|---|---|
detector | identifies which semantic extraction path produced the artifact |
strategyKey | changes when provider, model, or extraction budget changes enough to invalidate reuse |
pageIndexArtifactPath | links the semantic layer back to the stable page index |
pageArtifactPath | lets section nodes point back to the originating page artifact |
The semantic layer is general document structure. It should not encode datasheet-specific, EDA-specific, or other downstream product semantics.