Artefacts - The Building Blocks of Proof
An artefact is a small, verifiable package of evidence: a quote from a web page, a paragraph from a PDF, a table cell, or a snippet of code plus the metadata that proves where it came from and when it was seen.
Artefacts are stored and versioned in Perofant™ (the CueCrux evidence database) with provenance and policy metadata so they can be reused safely across answers, receipts, and audits.
Receipts and answers are built from artefacts. If artefacts are clean and well‑described, proof becomes simple and trustworthy.
How an Artefact Is Created (No Jargon)
%%{init: {'theme':'base','themeVariables': {
'primaryColor':'#f6f8fa',
'primaryTextColor':'#24292e',
'tertiaryTextColor':'#24292e',
'lineColor':'#24292e',
'fontSize':'14px'
}, 'themeCSS': '.node > rect, .node > polygon, .cluster rect, .label rect { filter: none !important; box-shadow: none !important; }' }}%%
flowchart LR
A[User or Crawler] --> B[Fetch]
B --> C[Scan]
C --> D[Parse]
D --> E[Chunk]
E --> F[Embed]
F --> G[Commit]
G --> H[Provenance Ledger]
%% Notes
B:::note
C:::note
D:::note
E:::note
F:::note
G:::note
classDef note fill:#f6f8fa,stroke:#24292e,color:#24292e;
%% Highlight endpoints (start A, finish H) in menu‑blue without shadows
style A fill:#f6f8fa,stroke:#3b82f6
style H fill:#f6f8fa,stroke:#3b82f6
%%{init: {'theme':'base','themeVariables': {
'primaryColor':'#f6f8fa',
'primaryTextColor':'#24292e',
'tertiaryTextColor':'#24292e',
'lineColor':'#24292e',
'fontSize':'14px'
}, 'themeCSS': '.node > rect, .node > polygon, .cluster rect, .label rect { filter: none !important; box-shadow: none !important; }' }}%%
flowchart LR
A[User or Crawler] --> B[Fetch]
B --> C[Scan]
C --> D[Parse]
D --> E[Chunk]
E --> F[Embed]
F --> G[Commit]
G --> H[Provenance Ledger]
%% Notes
B:::note
C:::note
D:::note
E:::note
F:::note
G:::note
classDef note fill:#f6f8fa,stroke:#24292e,color:#24292e;
%% Highlight endpoints (start A, finish H) in menu‑blue without shadows
style A fill:#f6f8fa,stroke:#3b82f6
style H fill:#f6f8fa,stroke:#3b82f6
- Fetch: download a page or file.
- Scan: antivirus, licence/robots checks, redaction for sensitive data when required.
- Parse: turn PDFs/HTML into clean text and structure.
- Chunk: split long documents into small, meaningful pieces (artefacts).
- Embed: add vectors so the Engine can find the artefact quickly.
- Commit: write the artefact and its metadata; record a hash in the Provenance Ledger.
Every step is logged so the result is auditable later.
What an Artefact Contains
Each artefact has two parts: the payload (what was said) and the envelope (everything that proves and describes it).
Payload (examples)
- Normalised text of the quote or snippet
- Optional structure (e.g., table cell coordinates)
Envelope (metadata)
- Source URL or document ID
- Observed_at (timestamp) and last_verified
- Content hash (BLAKE3) a fingerprint to detect changes
- Licence (e.g., CC‑BY, public domain) and jurisdiction
- Domain, author, publication date (if known)
- Language, MIME type, content_type (text, table, code)
- Version number, canonical_url, ETag
- Parent/child links (for pages → sections → paragraphs)
- Robots/ingest status (ok, blocked, metadata‑only)
- OCR flag (true/false)
- PII/risk flags (if any), redaction notes
- Reputation prior for the domain (0–1)
Visual Shape of an Artefact
classDiagram
class Artefact {
id
source_url
canonical_url
observed_at
last_verified
content_hash
licence
jurisdiction
domain
author
published_at
language
content_type
mime
payload_text
version
parent_id
robots_status
ocr
pii_flags_count
risk_flags_count
reputation
}
class SourceFetch {
source_url
canonical_url
observed_at
mime
domain
}
class ScanPolicy {
licence
jurisdiction
robots_status
pii_flags_count
risk_flags_count
}
class ParseContent {
content_type
language
payload_text
}
class LedgerEntry {
content_hash
last_verified
version
}
class SourcePageMeta {
author
published_at
}
class DomainReputation {
reputation
}
SourceFetch --> Artefact
ScanPolicy --> Artefact
ParseContent --> Artefact
LedgerEntry --> Artefact
SourcePageMeta --> Artefact
DomainReputation --> Artefact
classDiagram
class Artefact {
id
source_url
canonical_url
observed_at
last_verified
content_hash
licence
jurisdiction
domain
author
published_at
language
content_type
mime
payload_text
version
parent_id
robots_status
ocr
pii_flags_count
risk_flags_count
reputation
}
class SourceFetch {
source_url
canonical_url
observed_at
mime
domain
}
class ScanPolicy {
licence
jurisdiction
robots_status
pii_flags_count
risk_flags_count
}
class ParseContent {
content_type
language
payload_text
}
class LedgerEntry {
content_hash
last_verified
version
}
class SourcePageMeta {
author
published_at
}
class DomainReputation {
reputation
}
SourceFetch --> Artefact
ScanPolicy --> Artefact
ParseContent --> Artefact
LedgerEntry --> Artefact
SourcePageMeta --> Artefact
DomainReputation --> Artefact
This is a simplified view; the real object includes more fields, but the idea is the same. The additional boxes show where each group of fields comes from (fetch headers, scan/policy checks, parsing, provenance ledger, page metadata, and the domain reputation prior).
How Artefacts Are Scored
Scoring is not a single “magic number”. It’s several signals that the UI can explain:
- Recency: newer observations are preferred for time‑sensitive topics.
- Authority: official or peer‑reviewed venues carry more weight.
- Licence clarity: clearly licenced material is preferred; incompatible licences are down‑weighted.
- Diversity group: sources from different domains count more than five from the same site.
- Duplication: near‑duplicates don’t add trust; they are damped.
- Counterfactuals: credible disagreement is shown and reduces confidence until resolved.
- Venue risk: retraction lists or low‑credibility venues reduce influence.
- Reuse: artefacts reused by independent organisations get a small boost.
Mode matters:
- Light: quick, less strict.
- Verified: enforces provenance and domain diversity.
- Audit: requires deterministic replay and counterfactual checks.
Exact weights are not published (to prevent gaming) and evolve with audits see Trust Scoring for the explainable set of signals.
Example Artefact (Simplified)
{
"id": "art_01JXZ3A",
"source_url": "https://energy.example.gov/reports/solar-2024#installations",
"observed_at": "2025-10-11T09:23:00Z",
"last_verified": "2025-11-01T10:02:00Z",
"content_hash": "b3:9f4a...e12c",
"licence": "CC-BY",
"jurisdiction": "UK",
"domain": "energy.example.gov",
"language": "en",
"content_type": "text",
"mime": "text/html",
"payload_text": "In 2024, Region X added 3,214 community solar installations...",
"version": 3,
"robots_status": "ok",
"ocr": false,
"pii_flags": [],
"risk_flags": [],
"reputation": 0.92
}
How to read this
- It shows exactly where the quote came from, when we saw it, and a fingerprint to detect changes.
- The licence says you may reuse with attribution; the domain looks official.
- The high reputation and recent timestamps mean it’s likely a strong source.
Where You See Artefacts in the UI
- Citation chips under an answer show domain, licence, and timestamp they come from artefacts.
- Clicking a citation reveals the exact quote and its hash.
- Trust reports and receipts link back to the artefacts selected by QUORUM (MiSES) (the minimal evidence set).
See Also
- Receipt Anatomy: how proofs are packaged and checked
- Trust Scoring: explainable signals for domains, artefacts, and users
- Public Proof Pages: shareable, read‑only receipts

