Artefacts - The Building Blocks of Proof

An artefact is a small, verifiable package of evidence: a quote from a web page, a paragraph from a PDF, a table cell, or a snippet of code plus the metadata that proves where it came from and when it was seen.

Artefacts are stored and versioned in Perofant™ (the CueCrux evidence database) with provenance and policy metadata so they can be reused safely across answers, receipts, and audits.

Receipts and answers are built from artefacts. If artefacts are clean and well‑described, proof becomes simple and trustworthy.

How an Artefact Is Created (No Jargon)

%%{init: {'theme':'base','themeVariables': {
  'primaryColor':'#f6f8fa',
  'primaryTextColor':'#24292e',
  'tertiaryTextColor':'#24292e',
  'lineColor':'#24292e',
  'fontSize':'14px'
}, 'themeCSS': '.node > rect, .node > polygon, .cluster rect, .label rect { filter: none !important; box-shadow: none !important; }' }}%%
flowchart LR
    A[User or Crawler] --> B[Fetch]
    B --> C[Scan]
    C --> D[Parse]
    D --> E[Chunk]
    E --> F[Embed]
    F --> G[Commit]
    G --> H[Provenance Ledger]

    %% Notes
    B:::note
    C:::note
    D:::note
    E:::note
    F:::note
    G:::note
    classDef note fill:#f6f8fa,stroke:#24292e,color:#24292e;
    %% Highlight endpoints (start A, finish H) in menu‑blue without shadows
    style A fill:#f6f8fa,stroke:#3b82f6
    style H fill:#f6f8fa,stroke:#3b82f6

%%{init: {'theme':'base','themeVariables': {
  'primaryColor':'#f6f8fa',
  'primaryTextColor':'#24292e',
  'tertiaryTextColor':'#24292e',
  'lineColor':'#24292e',
  'fontSize':'14px'
}, 'themeCSS': '.node > rect, .node > polygon, .cluster rect, .label rect { filter: none !important; box-shadow: none !important; }' }}%%
flowchart LR
    A[User or Crawler] --> B[Fetch]
    B --> C[Scan]
    C --> D[Parse]
    D --> E[Chunk]
    E --> F[Embed]
    F --> G[Commit]
    G --> H[Provenance Ledger]

    %% Notes
    B:::note
    C:::note
    D:::note
    E:::note
    F:::note
    G:::note
    classDef note fill:#f6f8fa,stroke:#24292e,color:#24292e;
    %% Highlight endpoints (start A, finish H) in menu‑blue without shadows
    style A fill:#f6f8fa,stroke:#3b82f6
    style H fill:#f6f8fa,stroke:#3b82f6

Fetch: download a page or file.
Scan: antivirus, licence/robots checks, redaction for sensitive data when required.
Parse: turn PDFs/HTML into clean text and structure.
Chunk: split long documents into small, meaningful pieces (artefacts).
Embed: add vectors so the Engine can find the artefact quickly.
Commit: write the artefact and its metadata; record a hash in the Provenance Ledger.

Every step is logged so the result is auditable later.

What an Artefact Contains

Each artefact has two parts: the payload (what was said) and the envelope (everything that proves and describes it).

Payload (examples)

Normalised text of the quote or snippet
Optional structure (e.g., table cell coordinates)

Envelope (metadata)

Source URL or document ID
Observed_at (timestamp) and last_verified
Content hash (BLAKE3) a fingerprint to detect changes
Licence (e.g., CC‑BY, public domain) and jurisdiction
Domain, author, publication date (if known)
Language, MIME type, content_type (text, table, code)
Version number, canonical_url, ETag
Parent/child links (for pages → sections → paragraphs)
Robots/ingest status (ok, blocked, metadata‑only)
OCR flag (true/false)
PII/risk flags (if any), redaction notes
Reputation prior for the domain (0–1)

Visual Shape of an Artefact

classDiagram
class Artefact {
  id
  source_url
  canonical_url
  observed_at
  last_verified
  content_hash
  licence
  jurisdiction
  domain
  author
  published_at
  language
  content_type
  mime
  payload_text
  version
  parent_id
  robots_status
  ocr
  pii_flags_count
  risk_flags_count
  reputation
}

class SourceFetch {
  source_url
  canonical_url
  observed_at
  mime
  domain
}

class ScanPolicy {
  licence
  jurisdiction
  robots_status
  pii_flags_count
  risk_flags_count
}

class ParseContent {
  content_type
  language
  payload_text
}

class LedgerEntry {
  content_hash
  last_verified
  version
}

class SourcePageMeta {
  author
  published_at
}

class DomainReputation {
  reputation
}

SourceFetch --> Artefact
ScanPolicy --> Artefact
ParseContent --> Artefact
LedgerEntry --> Artefact
SourcePageMeta --> Artefact
DomainReputation --> Artefact

classDiagram
class Artefact {
  id
  source_url
  canonical_url
  observed_at
  last_verified
  content_hash
  licence
  jurisdiction
  domain
  author
  published_at
  language
  content_type
  mime
  payload_text
  version
  parent_id
  robots_status
  ocr
  pii_flags_count
  risk_flags_count
  reputation
}

class SourceFetch {
  source_url
  canonical_url
  observed_at
  mime
  domain
}

class ScanPolicy {
  licence
  jurisdiction
  robots_status
  pii_flags_count
  risk_flags_count
}

class ParseContent {
  content_type
  language
  payload_text
}

class LedgerEntry {
  content_hash
  last_verified
  version
}

class SourcePageMeta {
  author
  published_at
}

class DomainReputation {
  reputation
}

SourceFetch --> Artefact
ScanPolicy --> Artefact
ParseContent --> Artefact
LedgerEntry --> Artefact
SourcePageMeta --> Artefact
DomainReputation --> Artefact

This is a simplified view; the real object includes more fields, but the idea is the same. The additional boxes show where each group of fields comes from (fetch headers, scan/policy checks, parsing, provenance ledger, page metadata, and the domain reputation prior).

How Artefacts Are Scored

Scoring is not a single “magic number”. It’s several signals that the UI can explain:

Recency: newer observations are preferred for time‑sensitive topics.
Authority: official or peer‑reviewed venues carry more weight.
Licence clarity: clearly licenced material is preferred; incompatible licences are down‑weighted.
Diversity group: sources from different domains count more than five from the same site.
Duplication: near‑duplicates don’t add trust; they are damped.
Counterfactuals: credible disagreement is shown and reduces confidence until resolved.
Venue risk: retraction lists or low‑credibility venues reduce influence.
Reuse: artefacts reused by independent organisations get a small boost.

Mode matters:

Light: quick, less strict.
Verified: enforces provenance and domain diversity.
Audit: requires deterministic replay and counterfactual checks.

Exact weights are not published (to prevent gaming) and evolve with audits see Trust Scoring for the explainable set of signals.

Example Artefact (Simplified)

{
  "id": "art_01JXZ3A",
  "source_url": "https://energy.example.gov/reports/solar-2024#installations",
  "observed_at": "2025-10-11T09:23:00Z",
  "last_verified": "2025-11-01T10:02:00Z",
  "content_hash": "b3:9f4a...e12c",
  "licence": "CC-BY",
  "jurisdiction": "UK",
  "domain": "energy.example.gov",
  "language": "en",
  "content_type": "text",
  "mime": "text/html",
  "payload_text": "In 2024, Region X added 3,214 community solar installations...",
  "version": 3,
  "robots_status": "ok",
  "ocr": false,
  "pii_flags": [],
  "risk_flags": [],
  "reputation": 0.92
}

How to read this

It shows exactly where the quote came from, when we saw it, and a fingerprint to detect changes.
The licence says you may reuse with attribution; the domain looks official.
The high reputation and recent timestamps mean it’s likely a strong source.

Where You See Artefacts in the UI

Citation chips under an answer show domain, licence, and timestamp they come from artefacts.
Clicking a citation reveals the exact quote and its hash.
Trust reports and receipts link back to the artefacts selected by QUORUM (MiSES) (the minimal evidence set).