JSON Output
Unsterwerx now exposes one control-plane JSON transport for the full CLI surface.
--jsonis a root-global flag and works before or after every command path.- Every command and subcommand emits the same envelope shape on success.
- Runtime failures in JSON mode emit the same envelope with a top-level
errorobject. stderrstays quiet in JSON mode: tracing is disabled and direct stderr noise is suppressed.--helpand--versionremain human-readable even when--jsonis present.
Root-Global Flag
unsterwerx --json status
unsterwerx status --json
unsterwerx knowledge build --json
unsterwerx rules source list --json
Literal positional values stay literal:
unsterwerx search -- --json
That command searches for the text --json; it does not enable JSON mode.
Success Envelope
Successful commands emit:
{
"schema_version": 1,
"command": "knowledge_build",
"version": "0.5.4",
"timestamp": "2026-04-15T12:34:56Z",
"data": {}
}
Envelope Fields
| Field | Meaning |
|---|---|
schema_version | Transport schema selector. Current value: 1 |
command | Command ID derived from the clap path, normalized to snake_case |
version | Unsterwerx version string |
timestamp | RFC 3339 UTC timestamp |
data | Command payload |
Command IDs
The command field is derived from the command path only. Flags do not change it.
Examples:
| CLI invocation | command |
|---|---|
unsterwerx ingest --dry-run --json <path> | ingest |
unsterwerx status errors --json | status_errors |
unsterwerx status dismiss <id> --json | status_dismiss |
unsterwerx jobs status <run> --json | jobs_status |
unsterwerx knowledge build --json | knowledge_build |
unsterwerx knowledge dedup scan --json | knowledge_dedup_scan |
unsterwerx metadata extract --json | metadata_extract |
unsterwerx metadata find --concept-key ... --json | metadata_find |
unsterwerx rules metadata alias list --json | rules_metadata_alias_list |
unsterwerx rules metadata rebuild --all --json | rules_metadata_rebuild |
Error Envelope
Unhandled command failures in JSON mode emit:
{
"schema_version": 1,
"command": "knowledge_build",
"version": "0.5.4",
"timestamp": "2026-04-15T12:34:56Z",
"error": {
"kind": "runtime_error",
"message": "no training labels available",
"exit_code": 1,
"details": {}
}
}
Parse failures use the synthetic command ID cli_parse because clap could not resolve a valid command path:
{
"schema_version": 1,
"command": "cli_parse",
"version": "0.5.4",
"timestamp": "2026-04-15T12:34:56Z",
"error": {
"kind": "parse_error",
"message": "unexpected argument '--bogus' found",
"exit_code": 2,
"details": {}
}
}
Error Fields
| Field | Meaning |
|---|---|
kind | Error class such as parse_error or runtime_error |
message | Human-readable failure text |
exit_code | CLI exit code |
details | Reserved JSON object for future structured detail |
Payload Contracts
payload_version
Some commands wrap their enveloped payload with an additional data.payload_version: number discriminant. Consumers should switch on payload_version before reading the rest of data — the outer schema_version pins the transport (envelope shape), while payload_version pins the command-local payload shape. Currently:
| Command | data.payload_version |
|---|---|
metadata_find | 1 |
metadata_values | 2 |
New values are introduced additively; a bump is a breaking change for downstream consumers of that specific payload.
status
Mutation and operational-summary payloads use a top-level data.status when the command did work or produced a plan/result summary.
status | Meaning |
|---|---|
completed | All operations succeeded |
completed_with_errors | Command ran to completion but some items failed; counts include both success and failure |
dry_run | No mutations occurred; payload contains preview data |
aborted | Command did not mutate because confirmation was declined or omitted in JSON mode |
current | upgrade --check found the installed version already current |
update_available | upgrade --check found a newer version |
no_op | Idempotent mutation found the target already in the requested state |
Read-only commands omit status.
mode
Multi-mode commands expose a discriminant so consumers can switch on payload shape.
| Command | mode values |
|---|---|
classify | bulk, single, show |
rules resolve | document, class |
rules source resolve | document, recompute |
upgrade | check, install |
search | text, metadata |
notices
Commands that would otherwise emit warnings on stderr expose them as data.notices: string[] in JSON mode. Empty notice sets are serialized as [].
Current notice-producing commands:
search(includes stale-metadata notices when fact filters are used)rules addknowledge labels addknowledge vectors buildknowledge dedup applyknowledge dedup rollbackrules source resolve --documentmetadata findmetadata values
Optional Policy Fields
rules resolve --classomitsmerged_policywhen cascade preview fails.rules resolve --classomitscascade_errorwhen cascade preview succeeds.rules resolve --documentomitseffective_policywhen no policy applies.
Upgrade Error Contract
upgrade --check --json uses:
- success envelope +
data.status=currentwhen current - success envelope +
data.status=update_availablewith exit code1when newer version exists - error envelope +
error.kind=server_unreachablewith exit code2when the update endpoint cannot be reached
Payload Reference
The table below lists the structured data payload fields for the commands that now emit native JSON DTOs.
| Command | data fields | ||
|---|---|---|---|
config_show | source: string, config_path: string, config: object | ||
config_get | source: string, key: string, value: any, found: bool | ||
config_set | status: string, key: string, value: any, previous_value: any, config_path: string | ||
config_init | status: string, created: bool, config_path: string | ||
search | query: string, total: number, results: [{document_id, title, snippet, rank}], notices: string[] | ||
reindex | status: string, canonical_docs: number, indexed: number, missing_content: number, total_in_fts5: number, pending_canonical: number | ||
canonical | status: string, documents_processed: number, documents_extracted: number, documents_failed: number, total_elements: number, total_words: number, documents_reclassified: number, stopped: bool | ||
reconstruct | status: string, document_id: string, output_path: string, format: string, point_in_time?: string | ||
classify (bulk) | mode: "bulk", status: string, documents_classified: number, rules_applied: number, skipped_rules: number, errors: number, classifications: [] | ||
classify (single) | mode: "single", status: "completed", document_id: string, classifications: [{rule_id, rule_name, class, confidence}] | ||
classify (show) | mode: "show", document_id: string, classifications: [{class, confidence, rule_name, created_at}] | ||
rules_list | rules: [{id, name, document_class, scope, scope_id?, priority, definition, is_active}], total: number | ||
rules_add | status: "completed", id: string, name: string, document_class: string, scope: string, scope_id?: string, notices: string[] | ||
rules_remove | status: string, id: string, name: string, purged: bool, classifications_removed: number, documents_reset: number, already_retired: bool | ||
rules_reactivate | status: "completed", id: string, name: string, was_already_active: bool | ||
rules_policy | status: "completed", id: string, name: string, document_class: string, scope: string, scope_id?: string | ||
rules_policies | policies: [{id, name, document_class, retention_years, retention_days, is_mutable, legal_hold, archive_action, scope, scope_id?}], total: number | ||
rules_assign_scope | status: "completed", document_id: string, scope_path: string | ||
rules_resolve (document) | mode: "document", document_id: string, scope_path: string, effective_policy?: {document_class, retention_years, retention_days, is_mutable, legal_hold, archive_action, source_rules, applied_scopes} | ||
rules_resolve (class) | mode: "class", class: string, scope_path: string, input_policies: [...], merged_policy?: {...}, cascade_error?: string | ||
rules_source_list | rules: [{id, trust_class, weight, priority, pattern?, is_active}], total: number | ||
rules_source_set | status: "completed", id: string, trust_class: string, weight: number, `action: "created" \ | "updated"` | |
rules_source_remove | status: "completed", id: string, trust_class: string | ||
rules_source_resolve (document) | mode: "document", document_id: string, sources: [{source_name, trust_class?, effective_weight, matched_rule_id?, explanation, default_weight, evaluations: [{rule_id, trust_class?, pattern?, weight, priority, outcome}]}], notices: string[] | ||
rules_source_resolve (recompute) | mode: "recompute", status: string, documents_updated: number, documents_unchanged: number, errors: number | ||
import_sources | adapters: [{name, description}] | ||
import_status | status: "completed", sources: [{name, display_name, trust_class?, default_weight, item_count}], recent_batches: [{id, source_name, input_path, status, started_at, completed_at?, items_total, items_imported, items_duplicate, items_unsupported, items_skipped, items_error, items_image_only}] | ||
import_history | batches: [{id, source_name, input_path, status, started_at, completed_at?, items_total, items_imported, items_duplicate, items_unsupported, items_skipped, items_error, items_image_only}], total: number | ||
import_backfill | status: "completed", orphan_documents: number, provenance_rows_created: number, import_item_rows_backfilled: number, local_fallback_rows: number, audit_events_logged: number | ||
knowledge_labels_add | status: "completed", doc_a: string, doc_b: string, label: string, is_candidate: bool, posterior?: number, feature_scores?: {jaccard, cosine, title_overlap, structural_overlap, temporal_proximity, source_weight_delta}, notices: string[] | ||
knowledge_labels_list | labels: [{doc_a, doc_b, label, label_type, confidence, created_at}], total: number | ||
knowledge_vectors_build | status: string, vectors_created: number, vectors_updated: number, vectors_deleted: number, edges_created: number, documents_clustered: number, singletons_dropped: number, oversize_warned: number, timing_secs: number, pipeline_run_id?: string, upstream_similarity_run_id?: string, upstream_knowledge_run_id?: string, dry_run: bool, notices: string[] | ||
knowledge_vectors_list | vectors: [{id, label, confidence, member_count, representative_doc_id?, method}], total: number | ||
knowledge_vectors_show | id: string, label: string, confidence: number, member_count: number, method: string, representative_doc_id?: string, members: [{document_id, file_name, score, is_primary}], edges: [{other_vector_id, other_label, edge_type, weight}] | ||
knowledge_vectors_search | query: string, results: [{vector_id, vector_label, document_id, title, snippet, rank}], total: number | ||
knowledge_vectors_traverse | origin: {id, label, confidence, member_count, representative_doc_id?, method}, depth: number, hops: [{depth, vector, edge_weight}], total_reachable: number | ||
knowledge_dedup_scan | threshold: number, vector_filter?: string, keep_override?: string, plan: {pipeline_run_id?, upstream_vector_run_id?, model_id, threshold, groups, total_kept, total_removed, total_vectors_affected} | ||
knowledge_dedup_apply | status: string, documents_removed: number, diffs_computed: number, provenance_merged: number, labels_inserted: number, bytes_freed: number, errors: number, skipped_no_canonical: number, rule_id: string, timing_secs: number, plan?: object, notices: string[] | ||
knowledge_dedup_list | rules: [{id, name, threshold, action_count, is_active, created_at}], total: number | ||
knowledge_dedup_show | rule_id: string, actions: [{id, document_id, vector_id, action_type, original_status, created_at}], total: number | ||
knowledge_dedup_rollback | status: string, `target_type: "action" \ | "rule", target_id: string, documents_restored: number, labels_removed: number, errors: number, timing_secs: number, notices: string[]` | |
upgrade (check) | mode: "check", `status: "current" \ | "update_available", installed: string, latest: string` | |
upgrade (install) | mode: "install", status: string, from_version?: string, to_version?: string, install_path?: string, backup_path?: string, `source?: "remote" \ | "local"` | |
status (summary) | mode: "summary", detailed: bool, reconciled_legacy_stuck_imports: number, data_directory: string, total_documents: number, total_size_bytes: number, indexed_documents: number, audit_events: number, metadata_bearing_documents: number, metadata_stale_documents: number, metadata_seed_conflicts: number, by_status: object, by_file_type?: object, similarity?: object, classification?: object | ||
metadata_find | payload_version: 1, concept_key: string, value?: string, value_pattern?: string, documents: [{document_id, file_name, matches: [{fact_id, concept_key, extractor, raw_key, raw_value, normalized_value?, canonical_value?, confidence, low_confidence, facet_suppressed}]}], stale_documents: number, notices: string[] | ||
metadata_values | payload_version: 2, `items: [{canonical_value, total_doc_count, usable_doc_count, low_confidence_doc_count, suppressed_doc_count, file_types} \ | {concept_key, canonical_value, total_doc_count, usable_doc_count, low_confidence_doc_count, suppressed_doc_count, file_types}], notices: string[], stale_documents: number` | |
metadata_show (LegacyRaw) | document_id: string, file_name: string, stale: bool, extractions: [...], facts: [{concept_family, concept_key, raw_key, raw_value, normalized_value?, canonical_value?, confidence, low_confidence, facet_suppressed, extractor}] | ||
rules_metadata_alias_list | rules: [{rule_id, concept_key, from_canonical, to_canonical, created_at, updated_at}], total: number | ||
rules_metadata_alias_set | `outcome: {status: "created" \ | "updated" \ | "no_op", rule_id, concept_key, from_canonical, to_canonical, generation_bumped: bool}` |
rules_metadata_alias_remove | `outcome: {status: "removed" \ | "no_op", rule_id?, generation_bumped: bool}` | |
rules_metadata_alias_resolve | concept_key: string, input: string, canonical_input: string, resolved: string, steps: [{from, to, rule_id}] | ||
rules_metadata_noise_list | rules: [{rule_id, concept_key, extractor?, match_canonical, action, created_at, updated_at}], total: number | ||
rules_metadata_noise_set | `outcome: {status: "created" \ | "updated" \ | "no_op", rule_id, concept_key, action, generation_bumped: bool}` |
rules_metadata_noise_remove | `outcome: {status: "removed" \ | "no_op", rule_id?, generation_bumped: bool}` | |
rules_metadata_noise_resolve | concept_key: string, canonical_value: string, extractor?: string, `effective_action: "none" \ | "low-confidence" \ | "suppress-facet", matched_rule_ids: [string]` |
rules_metadata_concept_config_list | config: [{concept_key, low_confidence_threshold, created_at, updated_at}], total: number | ||
rules_metadata_concept_config_set | `outcome: {status: "created" \ | "updated" \ | "no_op", concept_key, low_confidence_threshold, generation_bumped: bool}` |
rules_metadata_concept_config_remove | `outcome: {status: "removed" \ | "no_op", concept_key, generation_bumped: bool}` | |
rules_metadata_settings_show | current_generation: number, default_low_confidence_threshold: number, updated_at: string | ||
rules_metadata_settings_set | `outcome: {status: "updated" \ | "no_op", default_low_confidence_threshold, generation_bumped: bool}` | |
rules_metadata_rebuild | `outcome: {scope: "document" \ | "all", documents_rebuilt: number, documents_skipped: number, target_generation: number, dry_run: bool}` |
Metadata Control Plane
Group B and Group C of the Metadata Control Plane introduced 14 new rules metadata ... subcommands and two additive surfaces on existing metadata commands. All 14 mutation-side commands use the enveloped transport (schema_version = 1) and — in contrast to most other control-plane commands — use error_semantics = TransportEnvelope, which means operational errors (unknown concept key, invalid regex, etc.) surface as a top-level error object inside the same envelope shape instead of plain-text stderr output. stderr remains empty.
New rules metadata command paths:
rules metadata alias list | set | remove | resolverules metadata noise list | set | remove | resolverules metadata concept-config list | set | removerules metadata settings show | setrules metadata rebuild(--document <id>or--all, optional--dry-run)
Mutation outcomes (set, remove, rebuild, settings set) always bump the control-plane generation when they change state and flip metadata_dirty = 1 on every metadata-bearing document, so operators should run rules metadata rebuild --all to refresh derived facts.
metadata find — new in Group C
metadata find is a new command that searches for documents matching a concept/value pair. It uses the enveloped transport with data.payload_version = 1:
{
"schema_version": 1,
"command": "metadata_find",
"version": "0.5.6",
"timestamp": "2026-04-16T12:00:00Z",
"data": {
"payload_version": 1,
"concept_key": "origin_software_name",
"value": "libreoffice",
"documents": [
{
"document_id": "...",
"file_name": "a.pdf",
"matches": [
{
"fact_id": 42,
"concept_key": "origin_software_name",
"extractor": "builtin_pdf",
"raw_key": "Producer",
"raw_value": "LibreOffice 7.5",
"normalized_value": "libreoffice",
"canonical_value": "libreoffice",
"confidence": 0.9,
"low_confidence": false,
"facet_suppressed": false
}
]
}
],
"stale_documents": 0,
"notices": []
}
}
Flags: --concept-key, --value | --value-pattern (mutually exclusive), --file-type, --extractor, --min-confidence, --match-quality {usable|any|low-confidence}, --limit.
metadata values — BREAKING CHANGE
metadata values moved from LegacyRaw to LegacyEnvelope (schema_version = 1). The payload now lives under data with payload_version = 2.
Migration note. Previously --json returned a bare array:
[{"canonical_value": "...", "total_doc_count": 3, ...}]
Now it returns the standard envelope. To adapt an existing consumer, replace a top-level array read (.[]) with:
unsterwerx metadata values --concept-key ... --json \
| jq '.data.items[]'
The new items carry additional facet counts (usable_doc_count, low_confidence_doc_count, suppressed_doc_count) that were not present in the legacy shape. stale_documents and notices have moved out of the item array and into data siblings.
metadata show — additive fields
metadata show remains on LegacyRaw. The payload gains four additive fields (no shape break):
data.stale: bool— true when the document is metadata-bearing and
its generation is out of sync.
data.facts[].canonical_value: string | null— canonicalized value
after alias rules.
data.facts[].low_confidence: bool— fact was marked
below-threshold by noise rules or concept config.
data.facts[].facet_suppressed: bool— fact was hidden from
aggregates by a suppress-facet noise rule.
Consumers that ignore unknown fields are forward-compatible.
Metadata-aware search
Group D extended search with five metadata-fact filters and a metadata-only query mode. When no free-text query is provided but at least one metadata filter is, search runs in metadata-only mode and orders results by (file_name ASC, document_id ASC). Filters:
--author <value>(repeatable) — matchesdocument_author.--origin-software <value>(repeatable) — matches
origin_software_name.
--file-type <ext>(repeatable) — structural filter on
documents.file_type.
--created-from <date>/--created-to <date>— bounds on
document_created_at.
--modified-from <date>/--modified-to <date>— bounds on
document_modified_at.
Exact filter inputs are canonicalized against concept alias rules before query planning, so --author "Robert C. Whetsel" and --author "robert c. whetsel" find the same documents.
Successful metadata-only responses have the same search envelope shape, with data.mode = "metadata" and data.stale_documents set to the number of metadata-bearing documents excluded because their fact rows are stale. A notice describing the exclusion is appended to data.notices.
Status — new operator-visibility counters
The status summary payload (default and --detailed) gains three metadata-control-plane counters so operators can monitor freshness and rule conflicts:
metadata_bearing_documents: number— distinct documents with at
least one document_metadata_extractions row where status = 'ok'.
metadata_stale_documents: number— metadata-bearing documents
where metadata_dirty = 1, metadata_generation IS NULL, or metadata_generation != current_generation. Non-metadata-bearing documents (including ones with metadata_dirty = 1 but no ok extraction) are excluded by design.
metadata_seed_conflicts: number— row count in
metadata_seed_conflicts.
The same counters are printed in the non-JSON text output under "Metadata bearing", "Metadata stale", and "Seed conflicts".
jq Examples
# Show the effective runtime config source
unsterwerx config show --json | jq -r '.data.source'
# Inspect classify bulk mode + status
unsterwerx classify --json | jq '.data | {mode, status, documents_classified}'
# Read the cascaded policy preview and any cascade violation
unsterwerx rules resolve --class playbook --scope acme/security --json \
| jq '.data | {mode, scope_path, cascade_error, merged_policy}'
# Inspect source hierarchy resolution traces
unsterwerx rules source resolve --document <DOC_ID> --json \
| jq '.data.sources[] | {source_name, effective_weight, evaluations}'
# Preview dedup without mutating
unsterwerx knowledge dedup apply --dry-run --json \
| jq '.data | {status, total_removed: .plan.total_removed, notices}'
# Read structured upgrade errors
unsterwerx upgrade --check --json | jq '.error // .data'
# Monitor metadata freshness from the status summary
unsterwerx status --json \
| jq '.data | {metadata_bearing_documents, metadata_stale_documents, metadata_seed_conflicts}'
# Iterate metadata-find documents (switch on payload_version)
unsterwerx metadata find --concept-key origin_software_name --value libreoffice --json \
| jq 'if .data.payload_version == 1 then .data.documents[] else error("unexpected payload_version") end'
# Migrate legacy `metadata values` consumers: old top-level array →
# new envelope with payload_version=2
unsterwerx metadata values --concept-key origin_software_name --json \
| jq '.data.items[]'
# Preview a rules-metadata rebuild without mutating
unsterwerx rules metadata rebuild --all --dry-run --json \
| jq '.data.outcome | {documents_rebuilt, documents_skipped, target_generation}'