JSON Output

Unsterwerx now exposes one control-plane JSON transport for the full CLI surface.

--json is a root-global flag and works before or after every command path.
Every command and subcommand emits the same envelope shape on success.
Runtime failures in JSON mode emit the same envelope with a top-level error object.
stderr stays quiet in JSON mode: tracing is disabled and direct stderr noise is suppressed.
--help and --version remain human-readable even when --json is present.

Root-Global Flag

bash

unsterwerx --json status
unsterwerx status --json
unsterwerx knowledge build --json
unsterwerx rules source list --json

Literal positional values stay literal:

bash

unsterwerx search -- --json

That command searches for the text --json; it does not enable JSON mode.

Success Envelope

Successful commands emit:

json

{
  "schema_version": 1,
  "command": "knowledge_build",
  "version": "0.5.4",
  "timestamp": "2026-04-15T12:34:56Z",
  "data": {}
}

Envelope Fields

Field	Meaning
`schema_version`	Transport schema selector. Current value: `1`
`command`	Command ID derived from the clap path, normalized to snake_case
`version`	Unsterwerx version string
`timestamp`	RFC 3339 UTC timestamp
`data`	Command payload

Command IDs

The command field is derived from the command path only. Flags do not change it.

Examples:

CLI invocation	`command`
`unsterwerx ingest --dry-run --json <path>`	`ingest`
`unsterwerx status errors --json`	`status_errors`
`unsterwerx status dismiss <id> --json`	`status_dismiss`
`unsterwerx jobs status <run> --json`	`jobs_status`
`unsterwerx knowledge build --json`	`knowledge_build`
`unsterwerx knowledge dedup scan --json`	`knowledge_dedup_scan`
`unsterwerx metadata extract --json`	`metadata_extract`
`unsterwerx metadata find --concept-key ... --json`	`metadata_find`
`unsterwerx rules metadata alias list --json`	`rules_metadata_alias_list`
`unsterwerx rules metadata rebuild --all --json`	`rules_metadata_rebuild`

Error Envelope

Unhandled command failures in JSON mode emit:

json

{
  "schema_version": 1,
  "command": "knowledge_build",
  "version": "0.5.4",
  "timestamp": "2026-04-15T12:34:56Z",
  "error": {
    "kind": "runtime_error",
    "message": "no training labels available",
    "exit_code": 1,
    "details": {}
  }
}

Parse failures use the synthetic command ID cli_parse because clap could not resolve a valid command path:

json

{
  "schema_version": 1,
  "command": "cli_parse",
  "version": "0.5.4",
  "timestamp": "2026-04-15T12:34:56Z",
  "error": {
    "kind": "parse_error",
    "message": "unexpected argument '--bogus' found",
    "exit_code": 2,
    "details": {}
  }
}

Error Fields

Field	Meaning
`kind`	Error class such as `parse_error` or `runtime_error`
`message`	Human-readable failure text
`exit_code`	CLI exit code
`details`	Reserved JSON object for future structured detail

Payload Contracts

`payload_version`

Some commands wrap their enveloped payload with an additional data.payload_version: number discriminant. Consumers should switch on payload_version before reading the rest of data — the outer schema_version pins the transport (envelope shape), while payload_version pins the command-local payload shape. Currently:

Command	`data.payload_version`
`metadata_find`	`1`
`metadata_values`	`2`

New values are introduced additively; a bump is a breaking change for downstream consumers of that specific payload.

`status`

Mutation and operational-summary payloads use a top-level data.status when the command did work or produced a plan/result summary.

`status`	Meaning
`completed`	All operations succeeded
`completed_with_errors`	Command ran to completion but some items failed; counts include both success and failure
`dry_run`	No mutations occurred; payload contains preview data
`aborted`	Command did not mutate because confirmation was declined or omitted in JSON mode
`current`	`upgrade --check` found the installed version already current
`update_available`	`upgrade --check` found a newer version
`no_op`	Idempotent mutation found the target already in the requested state

Read-only commands omit status.

`mode`

Multi-mode commands expose a discriminant so consumers can switch on payload shape.

Command	`mode` values
`classify`	`bulk`, `single`, `show`
`rules resolve`	`document`, `class`
`rules source resolve`	`document`, `recompute`
`upgrade`	`check`, `install`
`search`	`text`, `metadata`

`notices`

Commands that would otherwise emit warnings on stderr expose them as data.notices: string[] in JSON mode. Empty notice sets are serialized as [].

Current notice-producing commands:

search (includes stale-metadata notices when fact filters are used)
rules add
knowledge labels add
knowledge vectors build
knowledge dedup apply
knowledge dedup rollback
rules source resolve --document
metadata find
metadata values

Optional Policy Fields

rules resolve --class omits merged_policy when cascade preview fails.
rules resolve --class omits cascade_error when cascade preview succeeds.
rules resolve --document omits effective_policy when no policy applies.

Upgrade Error Contract

upgrade --check --json uses:

success envelope + data.status=current when current
success envelope + data.status=update_available with exit code 1 when newer version exists
error envelope + error.kind=server_unreachable with exit code 2 when the update endpoint cannot be reached

Payload Reference

The table below lists the structured data payload fields for the commands that now emit native JSON DTOs.

Command	`data` fields
`config_show`	`source: string`, `config_path: string`, `config: object`
`config_get`	`source: string`, `key: string`, `value: any`, `found: bool`
`config_set`	`status: string`, `key: string`, `value: any`, `previous_value: any`, `config_path: string`
`config_init`	`status: string`, `created: bool`, `config_path: string`
`search`	`query: string`, `total: number`, `results: [{document_id, title, snippet, rank}]`, `notices: string[]`
`reindex`	`status: string`, `canonical_docs: number`, `indexed: number`, `missing_content: number`, `total_in_fts5: number`, `pending_canonical: number`
`canonical`	`status: string`, `documents_processed: number`, `documents_extracted: number`, `documents_failed: number`, `total_elements: number`, `total_words: number`, `documents_reclassified: number`, `stopped: bool`
`similarity`	`pipeline_run_id?: string`, `threshold: number`, `num_hashes: number`, `lsh_bands: number`, `lsh_rows: number`, `canonical_extraction: {documents_extracted, total_elements, total_words}`, `documents_processed: number`, `candidates_found: number`, `exact_duplicates: number`, `top_pairs: [{score, doc_a, doc_b, name_a, name_b}]`
`reconstruct`	`status: string`, `document_id: string`, `output_path: string`, `format: string`, `point_in_time?: string`
`classify` (`bulk`)	`mode: "bulk"`, `status: string`, `documents_classified: number`, `rules_applied: number`, `skipped_rules: number`, `errors: number`, `classifications: []`
`classify` (`single`)	`mode: "single"`, `status: "completed"`, `document_id: string`, `classifications: [{rule_id, rule_name, class, confidence}]`
`classify` (`show`)	`mode: "show"`, `document_id: string`, `classifications: [{class, confidence, rule_name, created_at}]`
`rules_list`	`rules: [{id, name, document_class, scope, scope_id?, priority, definition, is_active}]`, `total: number`
`rules_add`	`status: "completed"`, `id: string`, `name: string`, `document_class: string`, `scope: string`, `scope_id?: string`, `notices: string[]`
`rules_remove`	`status: string`, `id: string`, `name: string`, `purged: bool`, `classifications_removed: number`, `documents_reset: number`, `already_retired: bool`
`rules_reactivate`	`status: "completed"`, `id: string`, `name: string`, `was_already_active: bool`
`rules_policy`	`status: "completed"`, `id: string`, `name: string`, `document_class: string`, `scope: string`, `scope_id?: string`
`rules_policies`	`policies: [{id, name, document_class, retention_years, retention_days, is_mutable, legal_hold, archive_action, scope, scope_id?}]`, `total: number`
`rules_assign_scope`	`status: "completed"`, `document_id: string`, `scope_path: string`
`rules_resolve` (`document`)	`mode: "document"`, `document_id: string`, `scope_path: string`, `effective_policy?: {document_class, retention_years, retention_days, is_mutable, legal_hold, archive_action, source_rules, applied_scopes}`
`rules_resolve` (`class`)	`mode: "class"`, `class: string`, `scope_path: string`, `input_policies: [...]`, `merged_policy?: {...}`, `cascade_error?: string`
`rules_source_list`	`rules: [{id, trust_class, weight, priority, pattern?, is_active}]`, `total: number`
`rules_source_set`	`status: "completed"`, `id: string`, `trust_class: string`, `weight: number`, `action: "created" \	"updated"`
`rules_source_remove`	`status: "completed"`, `id: string`, `trust_class: string`
`rules_source_resolve` (`document`)	`mode: "document"`, `document_id: string`, `sources: [{source_name, trust_class?, effective_weight, matched_rule_id?, explanation, default_weight, evaluations: [{rule_id, trust_class?, pattern?, weight, priority, outcome}]}]`, `notices: string[]`
`rules_source_resolve` (`recompute`)	`mode: "recompute"`, `status: string`, `documents_updated: number`, `documents_unchanged: number`, `errors: number`
`import_sources`	`adapters: [{name, description}]`
`import_status`	`status: "completed"`, `sources: [{name, display_name, trust_class?, default_weight, item_count}]`, `recent_batches: [{id, source_name, input_path, status, started_at, completed_at?, items_total, items_imported, items_duplicate, items_unsupported, items_skipped, items_error, items_image_only}]`
`import_history`	`batches: [{id, source_name, input_path, status, started_at, completed_at?, items_total, items_imported, items_duplicate, items_unsupported, items_skipped, items_error, items_image_only}]`, `total: number`
`import_backfill`	`status: "completed"`, `orphan_documents: number`, `provenance_rows_created: number`, `import_item_rows_backfilled: number`, `local_fallback_rows: number`, `audit_events_logged: number`
`knowledge_labels_add`	`status: "completed"`, `doc_a: string`, `doc_b: string`, `label: string`, `is_candidate: bool`, `posterior?: number`, `feature_scores?: {jaccard, cosine, title_overlap, structural_overlap, temporal_proximity, source_weight_delta}`, `notices: string[]`
`knowledge_labels_list`	`labels: [{doc_a, doc_b, label, label_type, confidence, created_at}]`, `total: number`
`knowledge_vectors_build`	`status: string`, `vectors_created: number`, `vectors_updated: number`, `vectors_deleted: number`, `edges_created: number`, `documents_clustered: number`, `singletons_dropped: number`, `oversize_warned: number`, `timing_secs: number`, `pipeline_run_id?: string`, `upstream_similarity_run_id?: string`, `upstream_knowledge_run_id?: string`, `dry_run: bool`, `notices: string[]`
`knowledge_vectors_list`	`vectors: [{id, label, confidence, member_count, representative_doc_id?, method}]`, `total: number`
`knowledge_vectors_show`	`id: string`, `label: string`, `confidence: number`, `member_count: number`, `method: string`, `representative_doc_id?: string`, `members: [{document_id, file_name, score, is_primary}]`, `edges: [{other_vector_id, other_label, edge_type, weight}]`
`knowledge_vectors_search`	`query: string`, `results: [{vector_id, vector_label, document_id, title, snippet, rank}]`, `total: number`
`knowledge_vectors_traverse`	`origin: {id, label, confidence, member_count, representative_doc_id?, method}`, `depth: number`, `hops: [{depth, vector, edge_weight}]`, `total_reachable: number`
`knowledge_dedup_scan`	`threshold: number`, `vector_filter?: string`, `keep_override?: string`, `plan: {pipeline_run_id?, upstream_vector_run_id?, model_id, threshold, groups, total_kept, total_removed, total_vectors_affected}`
`knowledge_dedup_apply`	`status: string`, `documents_removed: number`, `diffs_computed: number`, `provenance_merged: number`, `labels_inserted: number`, `bytes_freed: number`, `errors: number`, `skipped_no_canonical: number`, `rule_id: string`, `timing_secs: number`, `plan?: object`, `notices: string[]`
`knowledge_dedup_list`	`rules: [{id, name, threshold, action_count, is_active, created_at}]`, `total: number`
`knowledge_dedup_show`	`rule_id: string`, `actions: [{id, document_id, vector_id, action_type, original_status, created_at}]`, `total: number`
`knowledge_dedup_rollback`	`status: string`, `target_type: "action" \	"rule"`,` target_id: string`,` documents_restored: number`,` labels_removed: number`,` errors: number`,` timing_secs: number`,` notices: string[]`
`upgrade` (`check`)	`mode: "check"`, `status: "current" \	"update_available"`,` installed: string`,` latest: string`
`upgrade` (`install`)	`mode: "install"`, `status: string`, `from_version?: string`, `to_version?: string`, `install_path?: string`, `backup_path?: string`, `source?: "remote" \	"local"`
`status` (summary)	`mode: "summary"`, `detailed: bool`, `reconciled_legacy_stuck_imports: number`, `data_directory: string`, `total_documents: number`, `total_size_bytes: number`, `indexed_documents: number`, `audit_events: number`, `metadata_bearing_documents: number`, `metadata_stale_documents: number`, `metadata_seed_conflicts: number`, `by_status: object`, `by_file_type?: object`, `similarity?: object`, `classification?: object`
`metadata_find`	`payload_version: 1`, `concept_key: string`, `value?: string`, `value_pattern?: string`, `documents: [{document_id, file_name, matches: [{fact_id, concept_key, extractor, raw_key, raw_value, normalized_value?, canonical_value?, confidence, low_confidence, facet_suppressed}]}]`, `stale_documents: number`, `notices: string[]`
`metadata_values`	`payload_version: 2`, `items: [{canonical_value, total_doc_count, usable_doc_count, low_confidence_doc_count, suppressed_doc_count, file_types} \	{concept_key, canonical_value, total_doc_count, usable_doc_count, low_confidence_doc_count, suppressed_doc_count, file_types}]`,` notices: string[]`,` stale_documents: number`
`metadata_show` (LegacyRaw)	`document_id: string`, `file_name: string`, `stale: bool`, `extractions: [...]`, `facts: [{concept_family, concept_key, raw_key, raw_value, normalized_value?, canonical_value?, confidence, low_confidence, facet_suppressed, extractor}]`
`rules_metadata_alias_list`	`rules: [{rule_id, concept_key, from_canonical, to_canonical, created_at, updated_at}]`, `total: number`
`rules_metadata_alias_set`	`outcome: {status: "created" \	"updated" \	"no_op", rule_id, concept_key, from_canonical, to_canonical, generation_bumped: bool}`
`rules_metadata_alias_remove`	`outcome: {status: "removed" \	"no_op", rule_id?, generation_bumped: bool}`
`rules_metadata_alias_resolve`	`concept_key: string`, `input: string`, `canonical_input: string`, `resolved: string`, `steps: [{from, to, rule_id}]`
`rules_metadata_noise_list`	`rules: [{rule_id, concept_key, extractor?, match_canonical, action, created_at, updated_at}]`, `total: number`
`rules_metadata_noise_set`	`outcome: {status: "created" \	"updated" \	"no_op", rule_id, concept_key, action, generation_bumped: bool}`
`rules_metadata_noise_remove`	`outcome: {status: "removed" \	"no_op", rule_id?, generation_bumped: bool}`
`rules_metadata_noise_resolve`	`concept_key: string`, `canonical_value: string`, `extractor?: string`, `effective_action: "none" \	"low-confidence" \	"suppress-facet"`,` matched_rule_ids: [string]`
`rules_metadata_concept_config_list`	`config: [{concept_key, low_confidence_threshold, created_at, updated_at}]`, `total: number`
`rules_metadata_concept_config_set`	`outcome: {status: "created" \	"updated" \	"no_op", concept_key, low_confidence_threshold, generation_bumped: bool}`
`rules_metadata_concept_config_remove`	`outcome: {status: "removed" \	"no_op", concept_key, generation_bumped: bool}`
`rules_metadata_settings_show`	`current_generation: number`, `default_low_confidence_threshold: number`, `updated_at: string`
`rules_metadata_settings_set`	`outcome: {status: "updated" \	"no_op", default_low_confidence_threshold, generation_bumped: bool}`
`rules_metadata_rebuild`	`outcome: {scope: "document" \	"all", documents_rebuilt: number, documents_skipped: number, target_generation: number, dry_run: bool}`

Metadata Control Plane

Group B and Group C of the Metadata Control Plane introduced 14 new rules metadata ... subcommands and two additive surfaces on existing metadata commands. All 14 mutation-side commands use the enveloped transport (schema_version = 1) and — in contrast to most other control-plane commands — use error_semantics = TransportEnvelope, which means operational errors (unknown concept key, invalid regex, etc.) surface as a top-level error object inside the same envelope shape instead of plain-text stderr output. stderr remains empty.

New rules metadata command paths:

rules metadata alias list | set | remove | resolve
rules metadata noise list | set | remove | resolve
rules metadata concept-config list | set | remove
rules metadata settings show | set
rules metadata rebuild (--document <id> or --all, optional --dry-run)

Mutation outcomes (set, remove, rebuild, settings set) always bump the control-plane generation when they change state and flip metadata_dirty = 1 on every metadata-bearing document, so operators should run rules metadata rebuild --all to refresh derived facts.

`metadata find` - new in Group C

metadata find is a new command that searches for documents matching a concept/value pair. It uses the enveloped transport with data.payload_version = 1:

json

{
  "schema_version": 1,
  "command": "metadata_find",
  "version": "0.5.6",
  "timestamp": "2026-04-16T12:00:00Z",
  "data": {
    "payload_version": 1,
    "concept_key": "origin_software_name",
    "value": "libreoffice",
    "documents": [
      {
        "document_id": "...",
        "file_name": "a.pdf",
        "matches": [
          {
            "fact_id": 42,
            "concept_key": "origin_software_name",
            "extractor": "builtin_pdf",
            "raw_key": "Producer",
            "raw_value": "LibreOffice 7.5",
            "normalized_value": "libreoffice",
            "canonical_value": "libreoffice",
            "confidence": 0.9,
            "low_confidence": false,
            "facet_suppressed": false
          }
        ]
      }
    ],
    "stale_documents": 0,
    "notices": []
  }
}

Flags: --concept-key, --value | --value-pattern (mutually exclusive), --file-type, --extractor, --min-confidence, --match-quality {usable|any|low-confidence}, --limit.

`metadata values` - BREAKING CHANGE

metadata values moved from LegacyRaw to LegacyEnvelope (schema_version = 1). The payload now lives under data with payload_version = 2.

Migration note. Previously --json returned a bare array:

json

[{"canonical_value": "...", "total_doc_count": 3, ...}]

Now it returns the standard envelope. To adapt an existing consumer, replace a top-level array read (.[]) with:

bash

unsterwerx metadata values --concept-key ... --json \
  | jq '.data.items[]'

The new items carry additional facet counts (usable_doc_count, low_confidence_doc_count, suppressed_doc_count) that were not present in the legacy shape. stale_documents and notices have moved out of the item array and into data siblings.

`metadata show` - additive fields

metadata show remains on LegacyRaw. The payload gains four additive fields (no shape break):

data.stale: bool - true when the document is metadata-bearing and

its generation is out of sync.

data.facts[].canonical_value: string | null - canonicalized value

after alias rules.

data.facts[].low_confidence: bool - fact was marked

below-threshold by noise rules or concept config.

data.facts[].facet_suppressed: bool - fact was hidden from

aggregates by a suppress-facet noise rule.

Consumers that ignore unknown fields are forward-compatible.

Metadata-aware `search`

Group D extended search with five metadata-fact filters and a metadata-only query mode. When no free-text query is provided but at least one metadata filter is, search runs in metadata-only mode and orders results by (file_name ASC, document_id ASC). Filters:

--author <value> (repeatable) - matches document_author.
--origin-software <value> (repeatable) - matches

origin_software_name.

--file-type <ext> (repeatable) - structural filter on

documents.file_type.

--created-from <date> / --created-to <date> - bounds on

document_created_at.

--modified-from <date> / --modified-to <date> - bounds on

document_modified_at.

Exact filter inputs are canonicalized against concept alias rules before query planning, so --author "Robert C. Whetsel" and --author "robert c. whetsel" find the same documents.

Successful metadata-only responses have the same search envelope shape, with data.mode = "metadata" and data.stale_documents set to the number of metadata-bearing documents excluded because their fact rows are stale. A notice describing the exclusion is appended to data.notices.

Status - new operator-visibility counters

The status summary payload (default and --detailed) gains three metadata-control-plane counters so operators can monitor freshness and rule conflicts:

metadata_bearing_documents: number - distinct documents with at

least one document_metadata_extractions row where status = 'ok'.

metadata_stale_documents: number - metadata-bearing documents

where metadata_dirty = 1, metadata_generation IS NULL, or metadata_generation != current_generation. Non-metadata-bearing documents (including ones with metadata_dirty = 1 but no ok extraction) are excluded by design.

metadata_seed_conflicts: number - row count in

metadata_seed_conflicts.

The same counters are printed in the non-JSON text output under "Metadata bearing", "Metadata stale", and "Seed conflicts".

jq Examples

bash

# Show the effective runtime config source
unsterwerx config show --json | jq -r '.data.source'

# Inspect classify bulk mode + status
unsterwerx classify --json | jq '.data | {mode, status, documents_classified}'

# Read the cascaded policy preview and any cascade violation
unsterwerx rules resolve --class playbook --scope acme/security --json \
  | jq '.data | {mode, scope_path, cascade_error, merged_policy}'

# Inspect source hierarchy resolution traces
unsterwerx rules source resolve --document <DOC_ID> --json \
  | jq '.data.sources[] | {source_name, effective_weight, evaluations}'

# Preview dedup without mutating
unsterwerx knowledge dedup apply --dry-run --json \
  | jq '.data | {status, total_removed: .plan.total_removed, notices}'

# Read structured upgrade errors
unsterwerx upgrade --check --json | jq '.error // .data'

# Monitor metadata freshness from the status summary
unsterwerx status --json \
  | jq '.data | {metadata_bearing_documents, metadata_stale_documents, metadata_seed_conflicts}'

# Iterate metadata-find documents (switch on payload_version)
unsterwerx metadata find --concept-key origin_software_name --value libreoffice --json \
  | jq 'if .data.payload_version == 1 then .data.documents[] else error("unexpected payload_version") end'

# Migrate legacy `metadata values` consumers: old top-level array →
# new envelope with payload_version=2
unsterwerx metadata values --concept-key origin_software_name --json \
  | jq '.data.items[]'

# Preview a rules-metadata rebuild without mutating
unsterwerx rules metadata rebuild --all --dry-run --json \
  | jq '.data.outcome | {documents_rebuilt, documents_skipped, target_generation}'

JSON Output

Root-Global Flag

Success Envelope

Envelope Fields

Command IDs

Error Envelope

Error Fields

Payload Contracts

payload_version

status

mode

notices

Optional Policy Fields

Upgrade Error Contract

Payload Reference

Metadata Control Plane

metadata find - new in Group C

metadata values - BREAKING CHANGE

metadata show - additive fields

Metadata-aware search

Status - new operator-visibility counters

jq Examples

`payload_version`

`status`

`mode`

`notices`

`metadata find` - new in Group C

`metadata values` - BREAKING CHANGE

`metadata show` - additive fields

Metadata-aware `search`