Skip to main content

Privacy Budget

Overview

As mentioned in the previous section, epsilon (ϵϵ) and delta (δδ) are the base parameters that quantify and manage privacy protection in differential privacy. AGENT distributes its values to teams and data scientists through the Privacy Budget. All Datasets have a lifetime Privacy Budget, which is then allocated to Teams and data scientists. Each query they make spends Epsilon and Delta from the Privacy Budget. The amount of Epsilon and Delta spent on their analysis also determines the tradeoff between safeguarding the privacy of individual data points and preserving the dataset's utility for data scientists.

Administrators can ensure privacy effectively by creating a hierarchy and allocating Privacy Budget amounts, even when different teams and members use the same dataset.

Budget Accountant

AGENT maintains an in‑built differential‑privacy accountant that tracks the cumulative privacy loss (ε,δ)(\varepsilon,\delta) over the lifetime of a project or team. Given a list of spends (one record per query to AGENT), the accountant chooses the tightest valid composition theorem and returns the total spent.

Basic Composition (always valid)

If every query ii provides (εi,δi)(\varepsilon_i,\delta_i)‑DP, simply add

εtotal=i=1nεi,δtotal=i=1nδi.\varepsilon_{\text{total}}=\sum_{i=1}^{n}\varepsilon_i, \qquad \delta_{\text{total}}=\sum_{i=1}^{n}\delta_i.

This is the worst‑case bound but applies to all mechanisms.

Advanced Composition (tighter for heterogeneous ε\varepsilon/δ\delta)

Fix an extra slack parameter δ>0\delta' > 0. For (εi,δi)(\varepsilon_i,\delta_i)‑DP queries you may use

εtotal=2log ⁣(1/δ)i=1nεi2+i=1nεi(eεi1),\varepsilon_{\text{total}} = \sqrt{\,2\log\!\bigl(1/\delta'\bigr)\,\sum_{i=1}^{n}\varepsilon_i^{2}} + \sum_{i=1}^{n}\varepsilon_i\bigl(e^{\varepsilon_i}-1\bigr), δtotal=i=1nδi+δ.\delta_{\text{total}} = \sum_{i=1}^{n}\delta_i+\delta'.

When every εi1\varepsilon_i \ll 1, the second term above is often negligible.

zCDP Composition (when rho=True)

For mechanisms that satisfy zero‑Concentrated DP (zCDP) with parameters ρi\rho_i (e.g. Gaussian mechanism):

ρtotal=i=1nρi.\rho_{\text{total}}=\sum_{i=1}^{n}\rho_i.

Convert back to (ε,δ)(\varepsilon,\delta) with

ε(δ)=ρtotal+4ρtotallog ⁣(1/δ).\varepsilon(\delta)=\rho_{\text{total}}+\sqrt{\,4\rho_{\text{total}}\log\!\bigl(1/\delta\bigr)}.

If you only know (εi,δi)(\varepsilon_i,\delta_i) for each query you can lift them to zCDP via

ρi=12εi2+εi2log ⁣(1/δi).\rho_i=\tfrac12\,\varepsilon_i^{2}+\varepsilon_i\sqrt{2\log\!\bigl(1/\delta_i\bigr)}.

Understanding δ′ (delta‑prime) in Advanced Composition

In the advanced–composition theorem we introduce an extra parameter δ′ > 0. It is not one of the per‑query δ’s; instead, it is a tunable slack variable that lets the accountant trade a little extra failure probability for a tighter bound on ε. AGENT uses the remaining delta in the budget for this slack variable. For example, if your total delta budget is (δtotal\delta_\text{total}) and you have already spent (δ1\delta_1) on queries, then the remaining delta budget is (δtotalδ1\delta_{\text{total}} - \delta_1). This value will be used as the value of δ′ in the advanced composition theorem.

Warning

When a lot of δ\delta is remaining the budget, you might see the total ϵ\epsilon being very low since we get a tight bound on ϵ\epsilon using the advanced composition theorem. This is because the slack variable δ′ is used to get a tighter bound on ϵ\epsilon. When you start exhausting the delta budget, you will see the total epsilon increasing. This is because the slack variable δ′ reduces as you exhaust δ from your budget. When the delta budget is completely exhausted, the accountant uses a very small value 101210^{-12} as the value of δ′ and the same is added to your total spent. Hence, you might see that that total δ\delta that you spent is slightly more than your budget when you exhaust your delta completely.

Total spent calculation

For low epsilon values, the accountant uses advanced composition. High values of epsilon (Usually (ϵ>1.0ϵ > 1.0), unless set otherwise by admin) are handled with basic composition because advanced composition performs worse than basic. The accountant also uses the zCDP composition when gaussian noise mechanism is used in the query.

Privacy Budget Hierarchy

The Privacy Budget is handled hierarchically, and different users with different roles control its distribution. The process begins when a user creates a dataset, becomes the Dataset Admin, and defines the total Privacy Budget available. The Dataset Admin manages and delegates the Privacy Budget amount each team and member can use.

The primary Privacy Budget hierarchy flow works as follows:

Team Admins and Members can request more Privacy Budget:

Privacy Budget Hierarchy Example

The diagram below presents an example of Privacy Budget distribution.

In the diagram, we can observe the following:

In this example, both the Dataset and Team Admins allocated their whole Privacy Budget, but they can also allocate it partially, leaving available Privacy Budget that can be allocated later.

Spending Privacy Budget for Your Needs

When spending the Privacy Budget to allocate Epsilon (ϵ\epsilon) and Delta (δ\delta) to teams and members, consider the following:

1
Understand the parameters

It is essential to understand how Epsilon and Delta affect the utility or accuracy of the data analysis.

2
Consider the context of the data usage

The nature of the data and its intended use are crucial in determining ϵ\epsilon and δ\delta. Highly sensitive data like health records may require a smaller ϵ\epsilon for stronger privacy.

3
Define a desired level of privacy

Determine the acceptable level of privacy risk. In scenarios where individual privacy is critical, opting for a smaller ϵ\epsilon is advisable.

4
Understand the data analysis goals

Consider the required accuracy and specificity of the data analysis results. For broader, less granular insights, a smaller ε may suffice.

Selecting suitable Epsilon (ϵ\epsilon) and delta (δ\delta) values is a critical decision that impacts the balance between data utility and privacy. Depending on the scenario, consider the following factors:

Data sensitivity

Highly sensitive data needs lower ϵ\epsilon values to ensure data is more strictly protected.

Population size

Larger datasets typically will have better signal-to-noise ratios, mitigating the effect of the noise on the insights created.

Data access frequency

Frequent access to data for analysis might require more restrictive ϵ\epsilon settings to maintain privacy over time, especially if data is reused in a new analysis, and thus the Privacy Budgets can compound.

Collaborative environments

When multiple parties are involved, consider the cumulative privacy risk and adjust ϵ\epsilon and δ\delta accordingly.

Caching and budgeting

Often, similar questions are asked, which can be used to answer one another, such as taking the sum, count, and later the mean. The net privacy loss can be minimised by caching and reusing the queries and responses across multiple queries.

Note

When setting a lifetime Privacy Budget for a dataset it is advised to choose δ\delta not greater than the inverse of the dataset size e.g. for a dataset with million records, a delta value of 10610^{-6}.