#Data Flow

#The Problem

Understanding how data moves through WRIT — from raw CSV files to actionable questions and actions — is essential for debugging unexpected results, designing effective domain models, and explaining the system to stakeholders.

#How WRIT Handles It

WRIT follows a unidirectional pipeline. Data flows in one direction, and each stage transforms it into a more refined form.

#The Pipeline

graph TD
  A["Attestation Sources<br/>(CSV, APIs)"] --> B["Entity Sets<br/>(collections of entities)"]
  B --> C["FluentFacts<br/>(time-aware attribute values with 3vl)"]
  C --> D["Partitions<br/>(classify entities via predicates)"]
  D --> E["Expectations<br/>(measure alignment)"]
  E --> F["Actions<br/>(false → expectee)"]
  E --> G["Questions<br/>(unknown → attestor)"]
  style A fill:#ecfeff,stroke:#06B6D4,color:#0e7490
  style B fill:#ecfeff,stroke:#06B6D4,color:#0e7490
  style C fill:#ecfeff,stroke:#06B6D4,color:#0e7490
  style D fill:#ecfeff,stroke:#06B6D4,color:#0e7490
  style E fill:#ecfeff,stroke:#06B6D4,color:#0e7490
  style F fill:#fef3c7,stroke:#f59e0b,color:#92400e
  style G fill:#fef3c7,stroke:#f59e0b,color:#92400e

#Stage by Stage

1. Attestation Sources → Entity Sets

Raw data from CSV files (or other sources) is loaded into entity sets. Each row becomes an entity. The entity type schema determines which fields map to which attributes.

services_data.csv:
  name,owner_email,tier
  CRM,alice@co.com,critical
  Email,,standard
  HR Portal,N/A,critical

→ Entity Set "all_it_services" with 3 entities

2. Entity Sets → FluentFacts

Each attribute value is wrapped in a FluentFact — a time-aware descriptor that:

Records when the data was attested (timestamp from the data source).
Applies sentinel mappings ("N/A" → Unknown, "DELETED" → Void).
Checks the validity period (is the data stale?).
Produces a three-valued result (True/False/Unknown/Void) with a reason chain explaining how the value was derived.

3. FluentFacts → Partitions

Partition predicates evaluate FluentFact values for each entity. The predicate "owner_email" is known checks whether the FluentFact for owner_email is a known value or Unknown/Void.

Each entity lands in one of five branches: is_true, is_false, is_unknown, is_void, or the convenience branch is_known.

4. Partitions → Expectations

Expectations assert that specific partition branches should be empty. "Expect empty: is_false" means the is_false branch of the partition should contain zero entities.

The expectation evaluates to:

Met (true) — The branch is empty.
Not met (false) — The branch contains entities. Actions are generated.
Uncertain (unknown) — The is_unknown branch contains entities. Questions are generated.

5. Expectations → Questions & Actions

The final outputs are directed at specific people:

Actions go to the expectee (found via relationship traversal from the failing entity).
Questions go to the attestor (the person responsible for the data source).

#Reason Chains

Every value in the pipeline carries a reason chain — a tuple of strings explaining how it was derived:

('owner_email=alice@co.com from services_data',
 'owner_email (alice@co.com) is known -> True')

These chains provide full provenance for debugging. When a result is unexpected, the reason chain shows exactly which data and which logical steps produced it.

#Campaign Scoping

The pipeline runs within the scope of a campaign. Before evaluation begins, WRIT builds a scoped registry containing only the entity types, entity sets, and data sources relevant to the campaign's expectations. This means:

Unrelated data is not loaded.
Unrelated partitions are not evaluated.
Reports are focused on the campaign's goals.

#Worked Example

End-to-end flow for "all IT services must have an owner":

1. services_data.csv loaded → 3 entities in "all_it_services"

2. FluentFacts created:
   CRM.owner_email      = "alice@co.com" (known, True)
   Email.owner_email     = "" (sentinel → Unknown)
   HR Portal.owner_email = "N/A" (sentinel → Unknown)

3. Partition "has_owner" evaluated:
   is_true:    [CRM]           (owner is known)
   is_false:   []              (none definitely lack an owner)
   is_unknown: [Email, HR Portal] (ownership data missing)
   is_void:    []

4. Expectation "all_services_owned" checked:
   expect empty: "is_false" → is_false is empty ✓
   But is_unknown has 2 entities → Questions generated

5. Outputs:
   Question: "What is the owner_email for Email?" → attestor of services_data
   Question: "What is the owner_email for HR Portal?" → attestor of services_data

#Common Pitfalls

Debugging without reason chains — When a result is unexpected, always check the reason chain first. It shows exactly which data values and logical operations produced the result.
Forgetting campaign scope — If an expectation is not being evaluated, check that it is included in the campaign's scope tree (via step towards).
Assuming data loads instantly — Entity sets are loaded from files at evaluation time. If the source file has changed, re-evaluation is needed.

#What to Read Next

This is the capstone page of the Core Concepts learning path. From here, you can:

Domain Modelling — Detailed guide to building domain models.
Writing Expectations (DSL) — Full DSL syntax reference.
Glossary — All terms in one place.