Multi-Table Metadata Extraction
Subject Level Privacy
Subject Level Privacy is currently in pre-release.
Overview
Subject-level privacy means that privacy guarantees are applied to people (subjects) rather than to individual rows.
In practice, a subject is defined by the subject table (for example, a users or patients table). When other tables can be linked back to the same subject (for example, one user with many orders), privacy accounting must consider that subject’s total contribution across those linked records.
Max subject references
max_subject_references is a table-level value that estimates the maximum number of records in a given table that can be impacted by (or associated with) a single subject record.
It is a worst-case bound used in downstream DP-related calculations.
How it is computed
For a given table:
-
Identify all foreign-key (FK) paths from the subject table to the target table.
-
For each path, compute the product of per-FK
max_referencesalong the path.max_referencesis the maximum fan-out for that FK when traversing the path.
-
If multiple FK paths exist, take the sum of the path products across all paths.
-
Round the result up to the next power of two.
In other words, if is the sum of path products, the rounded value is:
Rounding up makes the value more conservative and stable.
Example
Assume users is the subject table.
Consider a typical e-commerce schema:
orders.user_id -> users.id(one user can have many orders)max_references(users -> orders) = 20
order_items.order_id -> orders.id(one order can have many items)max_references(orders -> order_items) = 10
For the order_items table, one FK path from users to order_items is:
users -> orders -> order_items
The path product is 20 * 10 = 200. Rounding up to the next power of two yields 256, so:
max_subject_references(order_items) = 256
If there are multiple distinct paths from users to order_items, compute the product for each path and sum the products before rounding.
Why it matters
max_subject_references is used anywhere the system needs a conservative bound on how strongly a table can be influenced by a single subject when relationships are followed across tables (for example, scaling noise or thresholds in differentially private metadata extraction).