Unsterwerx

JSON Output

Unsterwerx now exposes one control-plane JSON transport for the full CLI surface.

Root-Global Flag

bash
unsterwerx --json status
unsterwerx status --json
unsterwerx knowledge build --json
unsterwerx rules source list --json

Literal positional values stay literal:

bash
unsterwerx search -- --json

That command searches for the text --json; it does not enable JSON mode.

Success Envelope

Successful commands emit:

json
{
  "schema_version": 1,
  "command": "knowledge_build",
  "version": "0.5.4",
  "timestamp": "2026-04-15T12:34:56Z",
  "data": {}
}

Envelope Fields

FieldMeaning
schema_versionTransport schema selector. Current value: 1
commandCommand ID derived from the clap path, normalized to snake_case
versionUnsterwerx version string
timestampRFC 3339 UTC timestamp
dataCommand payload

Command IDs

The command field is derived from the command path only. Flags do not change it.

Examples:

CLI invocationcommand
unsterwerx ingest --dry-run --json <path>ingest
unsterwerx status errors --jsonstatus_errors
unsterwerx status dismiss <id> --jsonstatus_dismiss
unsterwerx jobs status <run> --jsonjobs_status
unsterwerx knowledge build --jsonknowledge_build
unsterwerx knowledge dedup scan --jsonknowledge_dedup_scan
unsterwerx metadata extract --jsonmetadata_extract
unsterwerx metadata find --concept-key ... --jsonmetadata_find
unsterwerx rules metadata alias list --jsonrules_metadata_alias_list
unsterwerx rules metadata rebuild --all --jsonrules_metadata_rebuild

Error Envelope

Unhandled command failures in JSON mode emit:

json
{
  "schema_version": 1,
  "command": "knowledge_build",
  "version": "0.5.4",
  "timestamp": "2026-04-15T12:34:56Z",
  "error": {
    "kind": "runtime_error",
    "message": "no training labels available",
    "exit_code": 1,
    "details": {}
  }
}

Parse failures use the synthetic command ID cli_parse because clap could not resolve a valid command path:

json
{
  "schema_version": 1,
  "command": "cli_parse",
  "version": "0.5.4",
  "timestamp": "2026-04-15T12:34:56Z",
  "error": {
    "kind": "parse_error",
    "message": "unexpected argument '--bogus' found",
    "exit_code": 2,
    "details": {}
  }
}

Error Fields

FieldMeaning
kindError class such as parse_error or runtime_error
messageHuman-readable failure text
exit_codeCLI exit code
detailsReserved JSON object for future structured detail

Payload Contracts

payload_version

Some commands wrap their enveloped payload with an additional data.payload_version: number discriminant. Consumers should switch on payload_version before reading the rest of data — the outer schema_version pins the transport (envelope shape), while payload_version pins the command-local payload shape. Currently:

Commanddata.payload_version
metadata_find1
metadata_values2

New values are introduced additively; a bump is a breaking change for downstream consumers of that specific payload.

status

Mutation and operational-summary payloads use a top-level data.status when the command did work or produced a plan/result summary.

statusMeaning
completedAll operations succeeded
completed_with_errorsCommand ran to completion but some items failed; counts include both success and failure
dry_runNo mutations occurred; payload contains preview data
abortedCommand did not mutate because confirmation was declined or omitted in JSON mode
currentupgrade --check found the installed version already current
update_availableupgrade --check found a newer version
no_opIdempotent mutation found the target already in the requested state

Read-only commands omit status.

mode

Multi-mode commands expose a discriminant so consumers can switch on payload shape.

Commandmode values
classifybulk, single, show
rules resolvedocument, class
rules source resolvedocument, recompute
upgradecheck, install
searchtext, metadata

notices

Commands that would otherwise emit warnings on stderr expose them as data.notices: string[] in JSON mode. Empty notice sets are serialized as [].

Current notice-producing commands:

Optional Policy Fields

Upgrade Error Contract

upgrade --check --json uses:

Payload Reference

The table below lists the structured data payload fields for the commands that now emit native JSON DTOs.

Commanddata fields
config_showsource: string, config_path: string, config: object
config_getsource: string, key: string, value: any, found: bool
config_setstatus: string, key: string, value: any, previous_value: any, config_path: string
config_initstatus: string, created: bool, config_path: string
searchquery: string, total: number, results: [{document_id, title, snippet, rank}], notices: string[]
reindexstatus: string, canonical_docs: number, indexed: number, missing_content: number, total_in_fts5: number, pending_canonical: number
canonicalstatus: string, documents_processed: number, documents_extracted: number, documents_failed: number, total_elements: number, total_words: number, documents_reclassified: number, stopped: bool
reconstructstatus: string, document_id: string, output_path: string, format: string, point_in_time?: string
classify (bulk)mode: "bulk", status: string, documents_classified: number, rules_applied: number, skipped_rules: number, errors: number, classifications: []
classify (single)mode: "single", status: "completed", document_id: string, classifications: [{rule_id, rule_name, class, confidence}]
classify (show)mode: "show", document_id: string, classifications: [{class, confidence, rule_name, created_at}]
rules_listrules: [{id, name, document_class, scope, scope_id?, priority, definition, is_active}], total: number
rules_addstatus: "completed", id: string, name: string, document_class: string, scope: string, scope_id?: string, notices: string[]
rules_removestatus: string, id: string, name: string, purged: bool, classifications_removed: number, documents_reset: number, already_retired: bool
rules_reactivatestatus: "completed", id: string, name: string, was_already_active: bool
rules_policystatus: "completed", id: string, name: string, document_class: string, scope: string, scope_id?: string
rules_policiespolicies: [{id, name, document_class, retention_years, retention_days, is_mutable, legal_hold, archive_action, scope, scope_id?}], total: number
rules_assign_scopestatus: "completed", document_id: string, scope_path: string
rules_resolve (document)mode: "document", document_id: string, scope_path: string, effective_policy?: {document_class, retention_years, retention_days, is_mutable, legal_hold, archive_action, source_rules, applied_scopes}
rules_resolve (class)mode: "class", class: string, scope_path: string, input_policies: [...], merged_policy?: {...}, cascade_error?: string
rules_source_listrules: [{id, trust_class, weight, priority, pattern?, is_active}], total: number
rules_source_setstatus: "completed", id: string, trust_class: string, weight: number, `action: "created" \"updated"`
rules_source_removestatus: "completed", id: string, trust_class: string
rules_source_resolve (document)mode: "document", document_id: string, sources: [{source_name, trust_class?, effective_weight, matched_rule_id?, explanation, default_weight, evaluations: [{rule_id, trust_class?, pattern?, weight, priority, outcome}]}], notices: string[]
rules_source_resolve (recompute)mode: "recompute", status: string, documents_updated: number, documents_unchanged: number, errors: number
import_sourcesadapters: [{name, description}]
import_statusstatus: "completed", sources: [{name, display_name, trust_class?, default_weight, item_count}], recent_batches: [{id, source_name, input_path, status, started_at, completed_at?, items_total, items_imported, items_duplicate, items_unsupported, items_skipped, items_error, items_image_only}]
import_historybatches: [{id, source_name, input_path, status, started_at, completed_at?, items_total, items_imported, items_duplicate, items_unsupported, items_skipped, items_error, items_image_only}], total: number
import_backfillstatus: "completed", orphan_documents: number, provenance_rows_created: number, import_item_rows_backfilled: number, local_fallback_rows: number, audit_events_logged: number
knowledge_labels_addstatus: "completed", doc_a: string, doc_b: string, label: string, is_candidate: bool, posterior?: number, feature_scores?: {jaccard, cosine, title_overlap, structural_overlap, temporal_proximity, source_weight_delta}, notices: string[]
knowledge_labels_listlabels: [{doc_a, doc_b, label, label_type, confidence, created_at}], total: number
knowledge_vectors_buildstatus: string, vectors_created: number, vectors_updated: number, vectors_deleted: number, edges_created: number, documents_clustered: number, singletons_dropped: number, oversize_warned: number, timing_secs: number, pipeline_run_id?: string, upstream_similarity_run_id?: string, upstream_knowledge_run_id?: string, dry_run: bool, notices: string[]
knowledge_vectors_listvectors: [{id, label, confidence, member_count, representative_doc_id?, method}], total: number
knowledge_vectors_showid: string, label: string, confidence: number, member_count: number, method: string, representative_doc_id?: string, members: [{document_id, file_name, score, is_primary}], edges: [{other_vector_id, other_label, edge_type, weight}]
knowledge_vectors_searchquery: string, results: [{vector_id, vector_label, document_id, title, snippet, rank}], total: number
knowledge_vectors_traverseorigin: {id, label, confidence, member_count, representative_doc_id?, method}, depth: number, hops: [{depth, vector, edge_weight}], total_reachable: number
knowledge_dedup_scanthreshold: number, vector_filter?: string, keep_override?: string, plan: {pipeline_run_id?, upstream_vector_run_id?, model_id, threshold, groups, total_kept, total_removed, total_vectors_affected}
knowledge_dedup_applystatus: string, documents_removed: number, diffs_computed: number, provenance_merged: number, labels_inserted: number, bytes_freed: number, errors: number, skipped_no_canonical: number, rule_id: string, timing_secs: number, plan?: object, notices: string[]
knowledge_dedup_listrules: [{id, name, threshold, action_count, is_active, created_at}], total: number
knowledge_dedup_showrule_id: string, actions: [{id, document_id, vector_id, action_type, original_status, created_at}], total: number
knowledge_dedup_rollbackstatus: string, `target_type: "action" \"rule", target_id: string, documents_restored: number, labels_removed: number, errors: number, timing_secs: number, notices: string[]`
upgrade (check)mode: "check", `status: "current" \"update_available", installed: string, latest: string`
upgrade (install)mode: "install", status: string, from_version?: string, to_version?: string, install_path?: string, backup_path?: string, `source?: "remote" \"local"`
status (summary)mode: "summary", detailed: bool, reconciled_legacy_stuck_imports: number, data_directory: string, total_documents: number, total_size_bytes: number, indexed_documents: number, audit_events: number, metadata_bearing_documents: number, metadata_stale_documents: number, metadata_seed_conflicts: number, by_status: object, by_file_type?: object, similarity?: object, classification?: object
metadata_findpayload_version: 1, concept_key: string, value?: string, value_pattern?: string, documents: [{document_id, file_name, matches: [{fact_id, concept_key, extractor, raw_key, raw_value, normalized_value?, canonical_value?, confidence, low_confidence, facet_suppressed}]}], stale_documents: number, notices: string[]
metadata_valuespayload_version: 2, `items: [{canonical_value, total_doc_count, usable_doc_count, low_confidence_doc_count, suppressed_doc_count, file_types} \{concept_key, canonical_value, total_doc_count, usable_doc_count, low_confidence_doc_count, suppressed_doc_count, file_types}], notices: string[], stale_documents: number`
metadata_show (LegacyRaw)document_id: string, file_name: string, stale: bool, extractions: [...], facts: [{concept_family, concept_key, raw_key, raw_value, normalized_value?, canonical_value?, confidence, low_confidence, facet_suppressed, extractor}]
rules_metadata_alias_listrules: [{rule_id, concept_key, from_canonical, to_canonical, created_at, updated_at}], total: number
rules_metadata_alias_set`outcome: {status: "created" \"updated" \"no_op", rule_id, concept_key, from_canonical, to_canonical, generation_bumped: bool}`
rules_metadata_alias_remove`outcome: {status: "removed" \"no_op", rule_id?, generation_bumped: bool}`
rules_metadata_alias_resolveconcept_key: string, input: string, canonical_input: string, resolved: string, steps: [{from, to, rule_id}]
rules_metadata_noise_listrules: [{rule_id, concept_key, extractor?, match_canonical, action, created_at, updated_at}], total: number
rules_metadata_noise_set`outcome: {status: "created" \"updated" \"no_op", rule_id, concept_key, action, generation_bumped: bool}`
rules_metadata_noise_remove`outcome: {status: "removed" \"no_op", rule_id?, generation_bumped: bool}`
rules_metadata_noise_resolveconcept_key: string, canonical_value: string, extractor?: string, `effective_action: "none" \"low-confidence" \"suppress-facet", matched_rule_ids: [string]`
rules_metadata_concept_config_listconfig: [{concept_key, low_confidence_threshold, created_at, updated_at}], total: number
rules_metadata_concept_config_set`outcome: {status: "created" \"updated" \"no_op", concept_key, low_confidence_threshold, generation_bumped: bool}`
rules_metadata_concept_config_remove`outcome: {status: "removed" \"no_op", concept_key, generation_bumped: bool}`
rules_metadata_settings_showcurrent_generation: number, default_low_confidence_threshold: number, updated_at: string
rules_metadata_settings_set`outcome: {status: "updated" \"no_op", default_low_confidence_threshold, generation_bumped: bool}`
rules_metadata_rebuild`outcome: {scope: "document" \"all", documents_rebuilt: number, documents_skipped: number, target_generation: number, dry_run: bool}`

Metadata Control Plane

Group B and Group C of the Metadata Control Plane introduced 14 new rules metadata ... subcommands and two additive surfaces on existing metadata commands. All 14 mutation-side commands use the enveloped transport (schema_version = 1) and — in contrast to most other control-plane commands — use error_semantics = TransportEnvelope, which means operational errors (unknown concept key, invalid regex, etc.) surface as a top-level error object inside the same envelope shape instead of plain-text stderr output. stderr remains empty.

New rules metadata command paths:

Mutation outcomes (set, remove, rebuild, settings set) always bump the control-plane generation when they change state and flip metadata_dirty = 1 on every metadata-bearing document, so operators should run rules metadata rebuild --all to refresh derived facts.

metadata find — new in Group C

metadata find is a new command that searches for documents matching a concept/value pair. It uses the enveloped transport with data.payload_version = 1:

json
{
  "schema_version": 1,
  "command": "metadata_find",
  "version": "0.5.6",
  "timestamp": "2026-04-16T12:00:00Z",
  "data": {
    "payload_version": 1,
    "concept_key": "origin_software_name",
    "value": "libreoffice",
    "documents": [
      {
        "document_id": "...",
        "file_name": "a.pdf",
        "matches": [
          {
            "fact_id": 42,
            "concept_key": "origin_software_name",
            "extractor": "builtin_pdf",
            "raw_key": "Producer",
            "raw_value": "LibreOffice 7.5",
            "normalized_value": "libreoffice",
            "canonical_value": "libreoffice",
            "confidence": 0.9,
            "low_confidence": false,
            "facet_suppressed": false
          }
        ]
      }
    ],
    "stale_documents": 0,
    "notices": []
  }
}

Flags: --concept-key, --value | --value-pattern (mutually exclusive), --file-type, --extractor, --min-confidence, --match-quality {usable|any|low-confidence}, --limit.

metadata values — BREAKING CHANGE

metadata values moved from LegacyRaw to LegacyEnvelope (schema_version = 1). The payload now lives under data with payload_version = 2.

Migration note. Previously --json returned a bare array:

json
[{"canonical_value": "...", "total_doc_count": 3, ...}]

Now it returns the standard envelope. To adapt an existing consumer, replace a top-level array read (.[]) with:

bash
unsterwerx metadata values --concept-key ... --json \
  | jq '.data.items[]'

The new items carry additional facet counts (usable_doc_count, low_confidence_doc_count, suppressed_doc_count) that were not present in the legacy shape. stale_documents and notices have moved out of the item array and into data siblings.

metadata show — additive fields

metadata show remains on LegacyRaw. The payload gains four additive fields (no shape break):

its generation is out of sync.

after alias rules.

below-threshold by noise rules or concept config.

aggregates by a suppress-facet noise rule.

Consumers that ignore unknown fields are forward-compatible.

Group D extended search with five metadata-fact filters and a metadata-only query mode. When no free-text query is provided but at least one metadata filter is, search runs in metadata-only mode and orders results by (file_name ASC, document_id ASC). Filters:

origin_software_name.

documents.file_type.

document_created_at.

document_modified_at.

Exact filter inputs are canonicalized against concept alias rules before query planning, so --author "Robert C. Whetsel" and --author "robert c. whetsel" find the same documents.

Successful metadata-only responses have the same search envelope shape, with data.mode = "metadata" and data.stale_documents set to the number of metadata-bearing documents excluded because their fact rows are stale. A notice describing the exclusion is appended to data.notices.

Status — new operator-visibility counters

The status summary payload (default and --detailed) gains three metadata-control-plane counters so operators can monitor freshness and rule conflicts:

least one document_metadata_extractions row where status = 'ok'.

where metadata_dirty = 1, metadata_generation IS NULL, or metadata_generation != current_generation. Non-metadata-bearing documents (including ones with metadata_dirty = 1 but no ok extraction) are excluded by design.

metadata_seed_conflicts.

The same counters are printed in the non-JSON text output under "Metadata bearing", "Metadata stale", and "Seed conflicts".

jq Examples

bash
# Show the effective runtime config source
unsterwerx config show --json | jq -r '.data.source'

# Inspect classify bulk mode + status
unsterwerx classify --json | jq '.data | {mode, status, documents_classified}'

# Read the cascaded policy preview and any cascade violation
unsterwerx rules resolve --class playbook --scope acme/security --json \
  | jq '.data | {mode, scope_path, cascade_error, merged_policy}'

# Inspect source hierarchy resolution traces
unsterwerx rules source resolve --document <DOC_ID> --json \
  | jq '.data.sources[] | {source_name, effective_weight, evaluations}'

# Preview dedup without mutating
unsterwerx knowledge dedup apply --dry-run --json \
  | jq '.data | {status, total_removed: .plan.total_removed, notices}'

# Read structured upgrade errors
unsterwerx upgrade --check --json | jq '.error // .data'

# Monitor metadata freshness from the status summary
unsterwerx status --json \
  | jq '.data | {metadata_bearing_documents, metadata_stale_documents, metadata_seed_conflicts}'

# Iterate metadata-find documents (switch on payload_version)
unsterwerx metadata find --concept-key origin_software_name --value libreoffice --json \
  | jq 'if .data.payload_version == 1 then .data.documents[] else error("unexpected payload_version") end'

# Migrate legacy `metadata values` consumers: old top-level array →
# new envelope with payload_version=2
unsterwerx metadata values --concept-key origin_software_name --json \
  | jq '.data.items[]'

# Preview a rules-metadata rebuild without mutating
unsterwerx rules metadata rebuild --all --dry-run --json \
  | jq '.data.outcome | {documents_rebuilt, documents_skipped, target_generation}'