#Domain Modeling

This document explains how to model a business domain in kbpy — the entity types, entity sets, partitions, relationships, attestation sources, and FluentFacts that make up a domain. For the foundational theory behind these concepts (three-valued logic, Unknown vs Void), see Core Concepts. For the DSL syntax used to write .kb files, see DSL Reference. For running and operating kbpy, see Serving & Operations.

#How to Think About Domain Modeling

Domain modeling in kbpy follows a consistent workflow:

Identify your business concern — What misalignment do you want to detect and address?
Write expectations — What should be true? (e.g., "no IT service should lack an owner")
Define entity types — What kinds of things are mentioned in the expectations? (e.g., IT services, people, departments)
Declare relationships — How do these things connect? (e.g., a service has an owner, an employee belongs to a department)
Create partitions — How do you classify entities to evaluate the expectations? (e.g., "services without owners", "employees in high-budget departments")
Collect attestations — Where does the data needed to evaluate the expectations come from? (e.g., HR exports, Splunk queries, manual input)

Every step exists to support the expectation at the end. If a partition doesn't feed into an expectation, it doesn't belong in the model.

Domain models are nested: expectations are steps towards expectations; partitions derive entity sets based on the members of other entity sets. As a result, the process of donation modelling, proceeds iteratively downward from the high level statement of an overarching expectation down through lower and lower levels of vocabulary until the statements can be grounded in attestations.

#Exhaustive vs Partial: A Critical Choice

When defining an entity type, you must decide whether the entity set is exhaustive (complete) or partial (incomplete). This decision has significant downstream consequences:

	Exhaustive	Partial
Meaning	This set contains ALL entities of this type	This set may be missing entities
Missing lookup	Returns Void (definitely doesn't exist)	Returns Unknown (might exist elsewhere)
Typed unknown	Not added	Added automatically (see below)
Expectation status	Definitive	May be Unknown
Example	"We have a complete list of all departments"	"We have some employees but not necessarily all"

When to choose exhaustive: Your data source is the authoritative, complete list. If something isn't in the set, it genuinely doesn't exist in your domain. Examples: departments from the official HR system, clusters from the infrastructure registry.

When to choose partial: Your data source might not have everything. Missing entities should generate questions, not false conclusions. Examples: employee records from a single office, services discovered by automated scanning.

Note: in practice a partition of a partial entity set may be an exhaustive entity set. This feature is not yet represented. To do so could involve the Attestation Source listing entity sets, derived at any level, that are exhaustive. Alternatively the Unknown entity could have a sentinel for attributes that are exhaustive or default values rather than Missing. Then whether a set was partial or not would depend on the presence of an Unknown entity.

#Entity System

Entities are the core data objects in kbpy. They are defined using the .kb Domain Definition Language.

#Defining an Entity Type

entity type: "Person"
    display name: "Person"
    display format: "{name}"
    source entity sets: "all_people"
    sink entity set: "all_people"
    predicate: true
    exhaustive: true
    identifying attributes: "name"

attribute: "name"
    of: "all_people"
    type: string

attribute: "salary"
    of: "all_people"
    type: int

The DSL is parsed and domain registries are built automatically. Attestation sources (data mapping, sentinels, validity) are defined separately — see DSL Reference for the full syntax.

#Missing Data Handling

The unknown if missing and void if missing clauses on an entity type control what happens when a data field is absent from the source data:

unknown if missing: "salary" — If the salary field is absent, the attribute returns Unknown. This means "we don't know the salary yet" and will generate a question.
void if missing: "name" — If the name field is absent, the attribute returns Void. This means "the name fundamentally cannot be determined."

For a deeper explanation of Unknown vs Void, see Core Concepts: Unknown vs Void.

#Exhaustive vs Partial Sets

Exhaustive (complete): The entity set contains ALL entities of this type. Looking up a missing entity returns Void.

entity type: "Department"
    ...
    exhaustive: true
    ...

// Looking up non-existent department → Void (definitely doesn't exist)

Partial (incomplete): The entity set may not contain all entities. Looking up a missing entity returns Unknown.

entity type: "Employee"
    ...
    exhaustive: false
    ...

// Looking up non-existent employee → Unknown (might exist elsewhere)

#Typed Unknowns

When an entity set is partial, kbpy automatically adds a typed unknown — an instance of the entity class with empty data and is_known=False. This entity represents "there may be additional entities we don't know about."

Why typed unknowns exist: Without them, partial entity sets can't generate completeness questions. The problem is: how do you ask "are there more entities?" when there's no entity to attach the question to? The typed unknown solves this by acting as a probe that:

Flows through partition predicates: All its attribute accesses return Unknown values, so it naturally lands in the is_unknown branch. Along the way, the FluentFact descriptors record which attributes the predicate touches — this is how scoped registries know which attributes to show in the domain graph.
Carries attestation source metadata: The typed unknown inherits the attestation source from its parent entity set. When it lands in is_unknown, the question generation machinery creates a completeness question. Because it has a real attestation source, the attestor resolution chain can identify who to ask about additional entities.
Uses the same question model: Completeness questions use the same EntityQuestion type as partition, traversal, and lookup questions. The question_type field distinguishes them: "partition", "traversal", "lookup", or "completeness". This means all questions share the same attestor resolution, display, and grouping logic.

Typed unknowns are visible in entity listings — they appear as entities with all-unknown attribute values. In contexts where they shouldn't be processed (entity lookup, search), is_known guards prevent them from being treated as real entities.

#Entity Sets

Entity sets are lazy-loaded collections declared in .kb files. The sink entity set clause on an entity type definition names the entity set it produces:

entity type: "Employee"
    ...
    sink entity set: "all_employees"
    ...

Entity sets load their data from SQLite on first access. Multiple entity sets can share the same entity type, and an entity set can accept data from multiple attestation sources.

Entities are frozen (immutable) after creation. This enables safe sharing across scoped registries via LRU caching — the same entity objects can be used by concurrent requests without risk of mutation.

#Entity Set Provenance

Every entity set and entity carries provenance information:

entity_set.reason — A string explaining why the entity set exists as a whole. For root sets: "loaded from employees_data". For partition branches: "well_paid partition is_true".
entity_set.reason_for(entity) — Returns the full reason chain (tuple of strings) explaining why a specific entity is in this set. For partition branches, this includes every step of the predicate evaluation.

# Set-level provenance
print(well_paid.is_true.reason)
# "well_paid partition is_true"

# Entity-level provenance
for emp in well_paid.is_true:
    chain = well_paid.is_true.reason_for(emp)
    # ('salary: 150000', 'salary (150000) > 100000', 'well_paid: true')

This is useful for determining why an entity has, or hasn't met an expectation. Predicates allow for the full expresiveness of first order logic, including quantifiers (for all, there exists), conjunction (and) and disjunction (or). When chained together via entity sets complex rules can be represented. The reason statement explains why the conclusion was reached; if an entity ends up in an unexpected partition branch, reason_for() shows every step of the evaluation.

#Partitions

Partitions divide entity sets into groups based on a predicate (a boolean test).

#Defining a Partition

partition: "well_paid"
    source entity sets: "all_employees"
    predicate: "salary" is greater than 100000
    when true: "well_paid"
    when false: "not_well_paid"
    when unknown: "__IRRELEVANT__"
    when void: "__IRRELEVANT__"

#Five Branches

Every partition creates five branches:

Branch	Contains
`is_true`	Entities where predicate returns True
`is_false`	Entities where predicate returns False
`is_known`	Union of is_true and is_false
`is_unknown`	Entities where predicate returns Unknown
`is_void`	Entities where predicate returns Void

The when true: and when false: clauses explicitly name the first two branches. The when unknown: and when void: clauses name those branches (use "__IRRELEVANT__" when they don't need a named entity set). The is_known branch (union of is_true and is_false) is created automatically.

#Automatic Void and Unknown Handling

DSL predicates automatically propagate Void and Unknown through kbpy's three-valued logic system. When a predicate accesses an attribute that is Void or Unknown, the result propagates correctly without any defensive code. For example, if "salary" is Unknown, then "salary" is greater than 100000 evaluates to Unknown, and the entity lands in the is_unknown branch.

#Void vs Unknown Propagation Rules

Understanding how Void and Unknown interact in logical operations is critical for predicate design. The key insight: definite values win over Void, and Void wins over Unknown.

AND truth table (with Void):

A	B	A AND B	Why
False	Unknown	False	False AND anything is False
False	Void	False	False AND anything is False
Void	Unknown	Void	Void is "stronger" than Unknown
True	Unknown	Unknown	Could go either way
True	Void	Void	Void propagates

OR truth table (with Void):

A	B	A OR B	Why
True	Unknown	True	True OR anything is True
True	Void	True	True OR anything is True
Void	Unknown	Void	Void is "stronger" than Unknown
False	Unknown	Unknown	Could go either way
False	Void	Void	Void propagates

NOT:

A	NOT A
True	False
False	True
Unknown	Unknown
Void	Void

Why Void beats Unknown: Void means "this value is fundamentally impossible" — for example, looking up a department that doesn't exist in an exhaustive list. Unknown means "this value might exist but we haven't found it yet." When you combine the two, the impossibility takes precedence: if one part of a condition is impossible, the result is impossible (not merely unknown).

Practical consequence: If a partition predicate traverses a relationship to an exhaustive set and the lookup fails (Void), the entity will land in is_void even if other attributes are merely Unknown. This is by design — it prevents generating questions about entities whose fundamental preconditions don't hold.

#Interaction Between Relationships and Partitions

When a partition predicate traverses a relationship and the lookup returns Void or Unknown, the resulting entity placement depends on the target set:

Target is exhaustive + no match → VoidLookup → Entity lands in is_void. No question is generated because the entity definitely doesn't exist.
Target is partial + no match → UnknownLookup → Entity lands in is_unknown. A lookup question is generated asking the attestor of the target set.
Target entity found → Normal evaluation continues using the target entity's attributes.

The DSL handles this automatically. You don't need to write defensive code.

#Chaining Partitions

Partitions can be applied to derived entity sets:

partition: "elderly"
    source entity sets: "all_people"
    predicate: "age" is greater than 65
    when true: "elderly"
    when false: "youth"
    when unknown: "__IRRELEVANT__"
    when void: "__IRRELEVANT__"

partition: "wise"
    source entity sets: "elderly"
    predicate: "wisdom" is greater than 80
    when true: "wise_elderly"
    when false: "__IRRELEVANT__"
    when unknown: "__IRRELEVANT__"
    when void: "__IRRELEVANT__"

#Set Operations on Entity Types

The source entity sets: field accepts a set expression using three infix keywords: union, intersect, and except. Parentheses are supported for grouping; mixing operators without parentheses is a parse error.

#Union

entity type: "MergedEmployee"
    display name: "Merged Employee"
    display format: "{name}"
    source entity sets: "employees_from_hr" union "employees_from_payroll"
    sink entity set: "all_merged_employees"
    predicate: true
    exhaustive: true
    identifying attributes: "name"

An entity appears in the result if its identity exists in at least one source. When multiple sources are declared:

Entity union: Entities from all source entity sets are combined into the sink entity set, deduplicated by their identifying attributes.
Attribute inheritance: Attributes from all source branches are unioned via BFS traversal. The derivation chain becomes a DAG rather than a linear chain.
Topological ordering: Entity types are automatically sorted so that all source producers are built before their dependents, regardless of declaration order.
exhaustive: is the domain modeller's explicit assertion — two partial sources may together cover all entities of a type, or may not.

#Intersection

entity type: "CloudAndCMDBService"
    display name: "Cloud and CMDB Service"
    display format: "{name}"
    source entity sets: "cloud_services" intersect "cmdb_services"
    sink entity set: "verified_services"
    predicate: true
    exhaustive: false
    identifying attributes: "name"

An entity appears in the result if its identity exists in all listed source entity sets. Use intersection when you want entities we can confirm from multiple independent sources — for example, "services that appear in both the cloud inventory and the CMDB."

Attribute merging: Attributes from all source branches are available (same merge-or-unknown rule as union).
exhaustive: is ignored for intersection types — is_partial is computed from source partiality: if all sources are exhaustive, the intersection is exhaustive; if any source is partial, the result is partial.

Three-way intersection (entity must appear in all three):

source entity sets: "cloud_assets" intersect "cmdb_assets" intersect "finance_assets"

#Difference

entity type: "ActiveCluster"
    display name: "Active Cluster"
    display format: "{name}"
    source entity sets: "all_clusters" except "decommissioned_clusters"
    sink entity set: "active_clusters"
    predicate: true
    exhaustive: false
    identifying attributes: "cluster_id"

An entity appears in the result if its identity exists in the primary source and does not exist in the secondary source. Use difference when you want to exclude a known set — for example, "all clusters except those tagged as decommissioned."

Attributes from the secondary source are not available on the sink. collect_all_attributes walks only the primary branch.
Groundedness: only the primary source's attributes require attestation coverage.
exhaustive: is ignored for difference types — is_partial is derived from source partiality.

Compound exclusion (exclude from multiple sets):

source entity sets: "all_services" except ("decommissioned_services" union "excluded_services")

#Schema-Level Conflict Resolution (Union and Intersection)

When the same attribute name is inherited from multiple source branches, the definitions are compared:

Same definition (same data field and descriptor type): The attribute is kept as-is.
Different definition (different data field or descriptor type): The attribute is marked as a conflict. Accessing the attribute on any entity returns Unknown with a reason explaining the conflict.

#Entity-Level Conflict Resolution (Union and Intersection)

When the same entity (matched by identifying attributes) appears in multiple source entity sets:

Identical values: One copy is kept in the union.
Different values: Conflicting attribute values are set to Unknown. Non-conflicting attributes retain their values. The entity's source field records both contributing sources.
Attestation precedence: Entities loaded from attestation data take precedence over entities inherited from source entity sets. If an entity exists in both an attestation source and a source entity set, the attestation version is used.

#Derivation Sankey

The kbpy web UI provides a Derivation Sankey view (/d/{domain}/x/{campaign}/derivation_sankey) — a D3-based flow diagram showing the full entity set derivation chain. Each entity set appears as a rectangle (height proportional to entity count). Each entity type appears as an intermediate operation node (pill-shaped, labeled with entity type name and operation symbol: ∪ for union, ∩ for intersection, ∖ for difference). Flow bands connect source → operation → sink, with width proportional to entity count.

Visual encoding per operation:

Operation	Visual
Union	Flows from all sources merge into the op node; sink is narrower than the combined input when overlap exists
Intersection	Bottleneck — discarded portions of each source appear as thin gray bands; sink is narrower than either source
Difference	Primary source flows fully into the op node; secondary enters as a dashed exclusion line; excluded portion exits as a thin gray flow

The writ-kb Remix app provides an equivalent React SVG Sankey at /derivation-sankey?domainId=…&campaignId=….

#Relationships

Relationships between entities are declared in .kb files using the relationship: block.

#Forward Relationships

attribute: "department_name"
    of: "Employee"
    from data field: "department_name"
    match on: "name"
    type: string
    unknowns: "__EMPTY__"
    voids: "__EMPTY__"
    trues: "__EMPTY__"
    falses: "__EMPTY__"
    validity: 365 days

relationship: "department"
    of: "Employee"
    from: "department_name"
    to: "all_departments" of "department"
    match on: "name"

Now accessing an employee's department returns the actual Department entity, enabling traversal across the domain model.

#Lookup Semantics

Scenario	Result
Match found	Target entity
No match, target exhaustive	VoidLookup with reason chain
No match, target partial	UnknownLookup (generates lookup question if entity lands in `is_unknown`)
Source field is Unknown	UnknownLookup
Source field is Void	VoidLookup

#Reverse Relationships

Reverse relationships are created automatically. If Employee has a department relationship to Department, then each Department entity gains an employees attribute returning the set of employees in that department.

#Navigating Relationship Chains

DSL predicates can traverse relationships directly:

# Access a related entity's attribute in a predicate
predicate: "budget" of "department" is greater than 250000

# If department is Void (e.g. "Sales" not found), the entire
# expression is Void and the entity lands in is_void

#Attestation Sources

Attestation data is loaded from external sources and attached to entity types via the from: clause. All attestations have a timestamp indicating when the data was attested to be valid.

#Data Flow

The primary workflow uses kbpy history to extract attestation data from a git repository into a SQLite DatomStore, then serves from that database:

# 1. Extract history into SQLite
uv run kbpy history <domain-repo-path> -o data.sqlite -v

# 2. Serve from SQLite
uv run kbpy serve --db data.sqlite --port 8000

The .kb DSL references attestation sources by name:

attestation source: "employees_data"
    entity set: "all_employees"
    ...

#Multiple Attestation Sources

Multiple attestation sources can feed the same entity set. Records are merged at the data-loading layer, matched by the identifying attributes:

attestation source: "hr_data"
    entity set: "all_employees"
    ...

attestation source: "payroll_data"
    entity set: "all_employees"
    ...

The same employee can have some fields from HR and others from payroll. See DSL Reference for the full attestation source syntax.

#Supported Source Types

The history extraction tool supports CSV files, Splunk queries, SQLite databases, and inline data. These are handled internally during extraction — users define their domain structure in .kb files and the extraction pipeline populates the SQLite DatomStore.

#Attestation Source Descriptions

Each attestation source carries a structured description: block that documents where the data comes from, how to obtain it, and what it provides:

attestation source: "hr_data"
    description:
        data_source: "HR system export"
        method: "Weekly CSV download from HR portal"
        returns: "Employee names, departments, and salaries"
    ...

These descriptions make the domain model self-documenting. When an expectation is ungrounded (see below), the descriptions on existing attestation sources provide context for what kind of data source is needed to fill the gap.

#Groundedness

Groundedness is a structural property of the domain model. It answers the question: is the model wired up correctly? Specifically, do attestation sources exist that cover all attributes an expectation needs to be evaluated?

Groundedness does not inspect individual entity data — that is the concern of partiality and uncertainty. A grounded expectation can still evaluate to Unknown (if the data source is partial) or have stale data (if attestations have expired). Groundedness only checks that the plumbing is in place.

#What makes an expectation grounded

An expectation is grounded when attestation sources exist that cover all its evaluation-relevant attributes. There are two kinds:

Partition-relevant attributes: attributes referenced by the partition predicate (e.g., "legacy_flag" in predicate: "legacy_flag").
Traversal-relevant attributes: attributes referenced by the expectee path (e.g., "owner_ref" in expectee: SELF->"owner_ref").

Both kinds must have attestation source coverage for the expectation to be grounded.

#Example: grounded vs ungrounded

# This expectation depends on "legacy_flag" (partition) and "owner_ref" (expectee)
partition: "is_legacy"
    predicate: "legacy_flag"
    ...

expectation: "all_legacy_clusters_have_migration_plan"
    partition: "is_legacy"
    expectee: SELF->"owner_ref"
    ...

If an attestation source covers legacy_flag on the clusters entity set AND another (or the same) attestation source covers owner_ref, the expectation is grounded. If no attestation source covers owner_ref, the expectation is ungrounded — and a warning identifies exactly which attribute on which entity set is missing coverage.

#Campaign-level groundedness

A campaign is grounded when the root expectation and every step towards it are individually grounded. If any expectation in the tree is ungrounded, the campaign is ungrounded.

#How to resolve ungrounded expectations

Check the groundedness warnings (visible in the web UI discrepancies page, the expectations index, or the JSON API). Each warning names the expectation, the entity set, and the missing attribute.
Either add an attestation source that covers the missing attribute, or adjust the predicate/expectee path so it only references attributes that have attestation sources.
The description: blocks on existing attestation sources document what data is already available — use them to understand what kind of new data source is needed.

#Where groundedness appears

Web UI: Orange "ungrounded" badges on expectations (index, detail, and campaign dashboard). Detailed warnings on the discrepancies page.
CLI: The sources command shows attestation sources and their descriptions.
JSON API: The grounded boolean on expectation and campaign endpoints. Groundedness warnings on the discrepancies endpoint.

#FluentFacts and Attestation

kbpy supports time-dimensioned facts (FluentFacts) that track when data was attested and who attested it. This enables reasoning about data freshness and directing questions to the appropriate attestor.

#Why Time Matters

Without time-awareness, old data looks the same as fresh data. This is a problem because:

A salary figure from three years ago might no longer be accurate
A compliance certification that expired last month shouldn't count as current
An infrastructure scan from six months ago might miss recent changes

FluentFacts solve this by treating expired attestations as Unknown, which automatically generates questions directed to the attestor. When data goes stale, the system doesn't silently use outdated values — it flags them as unknown and asks the data source for updates.

Debugging tip: If all values for an entity appear as Unknown, check the attestation timestamps and validity_days settings. A common cause is that the validity window has expired and the data needs to be re-attested.

#What is a Fluent?

In artificial intelligence, a fluent is a condition or fact that can change over time. Unlike eternal predicates (e.g., "Alice is human"), fluents represent time-varying properties (e.g., "Alice is employed" or "The system budget is $50,000"). The term comes from the situation calculus and event calculus formalisms.

#FluentFacts

FluentFacts are Python descriptors (internal implementation) that sit on entity attributes and produce 3vl values from raw data. When you access an attribute like employee.salary, the FluentFact descriptor intercepts the access and returns an appropriate 3vl value.

The descriptor decides what to return based on two checks:

Sentinel check — if the raw value matches an entry in the unknowns list, the descriptor returns Unknown; if it matches an entry in the voids list, it returns Void.
Staleness check — if the attestation has expired (i.e. validity_days has been exceeded), the descriptor returns Unknown regardless of the raw value. The reason chain preserves what the value was and why it became unknown, so no information is silently lost.

If neither check triggers, the descriptor wraps the raw value in the corresponding 3vl type and returns it as a known value.

Each FluentFact type maps to a specific 3vl value type:

FluentFact descriptor	3vl value produced
`IntFluentFact`	`Int3vl`
`StrFluentFact`	`Str3vl`
`FloatFluentFact`	`Float3vl`
`BoolFluentFact`	`Bool3vl`
`ConstantFluentFact`	Depends on the constant value type

#Attestation Model

AttestationSource (e.g., CSV file, Splunk query)
       ↓
   Attestation (a row of data with timestamp)
       ↓
   FluentFact (time-aware attribute on Entity)
       ↓
   If attestation_date + validity_days < now:
       → Value is UNKNOWN (stale)
       → Question directed to Attestor
   Else:
       → Value is known (fresh)

#Defining Validity and Sentinels

Validity, unknowns, and voids are configured in the attestation source's field mappings:

attestation source: "employees_data"
    entity set: "all_employees"
    ...
    from data field: "salary" -> "salary"
        unknowns: "N/A", ""
        validity: 365 days

The validity: 365 days clause sets the staleness window. After 365 days from the attestation timestamp, the value becomes Unknown. The reason chain preserves what the value was and why it became unknown, so no information is silently lost.

ConstantFluentFact has no time dimension — it always returns the same value regardless of time, and is never subject to staleness.

#Attestor Attribution

Attestors are defined at the attestation source level, not per-attribute. This design choice was driven by a practical constraint: per-attribute attestor traversals caused infinite recursion when an attribute used for a relationship lookup also had a traversal that went through that relationship. For example, if department_name had an attestor traversal via department, accessing department_name would trigger the department relationship, which would access department_name again, creating an infinite loop.

Moving attestors to the attestation source level eliminates this entirely — no traversal happens during attribute access. It also simplifies the mental model: a data source (CSV file, Splunk query, database) has a single responsible party, and all attributes from that source share the same attestor. The web interface groups questions by attestor, making it clear who needs to provide updated data.

#Reason Chains with Attestation

FluentFacts include attestation information in their reason chains:

# Fresh attestation
"salary=120000 attested by HR_System on 2024-01-15 (valid for 365 days)"

# Stale attestation
"salary unknown: attestation from HR_System on 2023-01-15 expired (365 day validity)"

This provides full provenance: what the value was, who attested it, when, and why it may have become unknown.

#Expectations as Entity Set Creators

Expectations evaluate entities against a partition predicate and classify them as aligned, misaligned, or uncertain. These classifications can be materialised as named entity sets, making expectations a third kind of EntitySetCreator alongside entity types and partitions.

#Sink Entity Sets

Each expectation can produce up to three sink entity sets via when aligned:, when misaligned:, and when uncertain: clauses:

expectation: "no_legacy_clusters"
    expector: "all_stakeholders" ? "id" = "PLATFORM_LEAD"
    of partition: "is_legacy" of "all_clusters" of "cluster"
    expect empty: "is_true"
    when aligned: "modern_clusters"
    when misaligned: "legacy_clusters_to_migrate"
    when uncertain: "__IRRELEVANT__"

After evaluation, modern_clusters contains entities where is_legacy is false (aligned with the expectation that is_true should be empty), and legacy_clusters_to_migrate contains entities where is_legacy is true (misaligned).

Use "__IRRELEVANT__" for sinks that have no downstream use. No entity set is registered for irrelevant sinks.

#Downstream Derivation

Expectation sink entity sets participate in the derivation hierarchy. They can be used as source entity sets for other entity types or partitions, enabling chained analysis:

entity type: "LegacyCluster"
    source entity sets: "legacy_clusters_to_migrate"
    sink entity set: "legacy_cluster_details"
    predicate: true
    exhaustive: false

Sink entity sets inherit is_partial from their partition's parent entity set and participate in collect_all_attributes chain traversal.