Writing Differential Privacy queries | Antigranular Enterprise Docs

Overview

A comprehensive guide to writing Differential Privacy queries with AGENT SQL.

Basic Syntax

Supported Features

Known Limitations

Privacy Protection

Best Practices

Common Patterns

Column Naming & Practical Tips

Basic Syntax

Every query must start with a Differential Privacy clause:

WITH DP(epsilon=0.1, delta=1e-6)
SELECT ...

Parameters:

epsilon (or shorthand eps)

Privacy budget (smaller = more private, typically 0.001 to 1.0)

delta (or shorthand del)

Failure probability (typically 1e-6 to 1e-9)

Supported Features

Basic Aggregates

All standard aggregation functions work perfectly on numeric columns:

-- Basic aggregates (numeric columns only for MIN/MAX)
WITH DP(epsilon=0.1)
SELECT
    COUNT(*) AS total_count,
    SUM(salary) AS total_salary,
    AVG(age) AS average_age,
    MIN(age) AS min_age,                    -- ✅ Works on numeric
    MAX(experience_years) AS max_experience -- ✅ Works on numeric
FROM employees;

MIN/MAX for numeric columns only

MIN/MAX only work on numeric columns, not on datetime or string columns.

Arithmetic on Aggregates

You can perform arithmetic operations on aggregate results:

-- Arithmetic expressions outside aggregates
WITH DP(epsilon=0.1)
SELECT
    region,
    COUNT(*) + 10 AS adjusted_count,
    SUM(salary) * 1.1 AS salary_with_bonus,
    AVG(age) - 25 AS age_relative_to_25,
    (COUNT(*) * 100) / SUM(salary) AS efficiency_ratio
FROM employees
GROUP BY region;

Mathematical Functions

All common mathematical functions are supported:

-- Mathematical functions on aggregates
WITH DP(epsilon=0.1)
SELECT
    department,
    POWER(AVG(salary), 2) AS salary_squared,
    SQRT(AVG(experience_years)) AS sqrt_experience,
    ABS(AVG(score) - 75) AS score_deviation,
    FLOOR(AVG(salary) / 1000) * 1000 AS salary_floor_thousands,
    CEILING(AVG(bonus_percent)) AS bonus_ceiling,
    ROUND(AVG(rating), 2) AS rounded_rating
FROM employees
GROUP BY department;

Supported Math Functions:

POWER(x, y)

x raised to power y

SQRT(x)

Square root

ABS(x)

Absolute value

FLOOR(x)

Round down to integer

CEILING(x)

Round up to integer

ROUND(x, decimals)

Round to specified decimals

TRUNC(x, decimals)

Round to specified decimals

Mathematical Functions in GROUP BY

Use math functions to create grouping categories:

-- Age brackets using FLOOR
WITH DP(epsilon=0.1)
SELECT
    FLOOR(age / 10) * 10 AS age_decade,
    COUNT(*) AS count_in_decade,
    AVG(salary) AS avg_salary_in_decade
FROM employees
GROUP BY age_decade
ORDER BY age_decade;

-- Salary tiers using CEILING
WITH DP(epsilon=0.1)
SELECT
    CEILING(salary / 25000) * 25000 AS salary_tier,
    COUNT(*) AS employees_in_tier,
    AVG(experience_years) AS avg_experience
FROM employees
GROUP BY salary_tier;

COALESCE Function

Handle NULL values elegantly in GROUP BY:

-- Basic COALESCE in GROUP BY
WITH DP(epsilon=0.1)
SELECT
    COALESCE(race, 'Unknown') AS race_group,
    COUNT(*) AS employee_count,
    AVG(salary) AS average_salary
FROM employees
GROUP BY race_group
ORDER BY race_group;

-- Multiple COALESCE fallbacks
WITH DP(epsilon=0.1)
SELECT
    COALESCE(department, region, 'Unassigned') AS work_group,
    COUNT(*) AS employee_count,
    AVG(performance_score) AS avg_performance
FROM employees
GROUP BY work_group;

CASE Expressions

Create conditional logic and categories:

-- CASE expressions with aggregates
WITH DP(epsilon=0.1)
SELECT
    CASE
        WHEN AVG(salary) > 80000 THEN 'High-paying'
        WHEN AVG(salary) > 60000 THEN 'Medium-paying'
        ELSE 'Entry-level'
    END AS department_tier,
    COUNT(*) AS department_count
FROM employees
GROUP BY department;

-- Complex CASE in GROUP BY
WITH DP(epsilon=0.1)
SELECT
    CASE
        WHEN age < 30 AND salary > 70000 THEN 'Young High Earner'
        WHEN age >= 50 AND experience_years > 20 THEN 'Senior Expert'
        WHEN salary < 40000 THEN 'Entry Level'
        ELSE 'Standard Employee'
    END AS employee_category,
    COUNT(*) AS category_size,
    AVG(performance_rating) AS avg_performance
FROM employees
GROUP BY employee_category;

Date Functions

Extract date components for time-based analysis:

-- Extract date parts (avoid reserved keywords like 'year', 'quarter')
WITH DP(epsilon=0.1)
SELECT
    EXTRACT(YEAR FROM hire_date) AS hire_year,     -- ✅ Use descriptive names
    EXTRACT(MONTH FROM hire_date) AS hire_month,
    COUNT(*) AS hires_count,
    AVG(starting_salary) AS avg_starting_salary
FROM employees
GROUP BY hire_year, hire_month
ORDER BY hire_year, hire_month;

-- Day of week analysis
WITH DP(epsilon=0.1)
SELECT
    EXTRACT(DOW FROM hire_date) AS day_of_week,  -- 0=Sunday, 6=Saturday
    COUNT(*) AS hires_on_day,
    AVG(age_at_hire) AS avg_hire_age
FROM employees
GROUP BY day_of_week
ORDER BY day_of_week;

-- Quarterly breakdown
WITH DP(epsilon=0.1)
SELECT
    EXTRACT(YEAR FROM hire_date) AS hire_year,
    EXTRACT(QUARTER FROM hire_date) AS hire_quarter,  -- ✅ Avoid 'quarter'
    COUNT(*) AS quarterly_hires,
    AVG(salary) AS avg_starting_salary
FROM employees
GROUP BY hire_year, hire_quarter;

Supported Date Extractions:

YEAR, MONTH, DAY
QUARTER (1-4)
DOW (Day of Week: 0=Sunday, 6=Saturday)
HOUR, MINUTE, SECOND
DECADE

Reserved Keywords

Avoid using SQL reserved words like year, quarter, month as column aliases. Use descriptive names like hire_year, hire_quarter instead.

Complex Combinations

Combine multiple features for powerful analysis:

-- Everything together: COALESCE + Math + CASE + Arithmetic
WITH DP(epsilon=0.2)
SELECT
    EXTRACT(YEAR FROM hire_date) AS hire_year,
    COALESCE(race, 'Unknown') AS race_group,
    CASE
        WHEN FLOOR(salary / 10000) * 10000 < 50000 THEN 'Low Income'
        WHEN FLOOR(salary / 10000) * 10000 < 80000 THEN 'Middle Income'
        ELSE 'High Income'
    END AS income_bracket,
    COUNT(*) + 5 AS adjusted_count,
    ROUND(AVG(salary) * 1.05, 2) AS projected_salary,
    POWER(AVG(performance_score), 2) AS performance_squared
FROM employees
WHERE hire_date IS NOT NULL
GROUP BY hire_year, race_group, income_bracket
ORDER BY hire_year, race_group;

ORDER BY with Complex Expressions

Sort results using any supported expression:

-- ORDER BY with COALESCE
WITH DP(epsilon=0.1)
SELECT
    COALESCE(department, 'Unknown') AS dept_group,
    COUNT(*) AS employee_count,
    AVG(salary) AS avg_salary
FROM employees
GROUP BY dept_group
ORDER BY dept_group, avg_salary DESC;

-- ORDER BY with mathematical expressions
WITH DP(epsilon=0.1)
SELECT
    FLOOR(experience_years / 5) * 5 AS experience_bracket,
    COUNT(*) AS employee_count,
    AVG(salary) AS avg_salary
FROM employees
GROUP BY experience_bracket
ORDER BY experience_bracket;

Known Limitations

A number of limitations on the use of SQL statements are listed below. Some are features that are currently not supported, others are incompatible with Differential Privacy. We have given alternative approaches where possible.

Datetime Aggregates

MIN/MAX are not supported on datetime columns:

-- NOT SUPPORTED - MIN/MAX on datetime columns
WITH DP(epsilon=0.1)
SELECT
    MIN(hire_date) AS earliest_hire,  -- ❌ Expected to fail - datetime not supported
    MAX(hire_date) AS latest_hire     -- ❌ Expected to fail - datetime not supported
FROM employees;

Alternative

use EXTRACT to convert to a numeric type:

-- SUPPORTED ALTERNATIVE
WITH DP(epsilon=0.1)
SELECT
    MIN(EXTRACT(YEAR FROM hire_date)) AS earliest_year,  -- ✅ Works on numeric
    MAX(EXTRACT(YEAR FROM hire_date)) AS latest_year     -- ✅ Works on numeric
FROM employees;

Nested Aggregates

Nested aggregates are not currently supported:

-- NOT SUPPORTED - Nested aggregates
WITH DP(epsilon=0.1)
SELECT
    region,
    COUNT(SUM(salary)) AS nested_agg  -- ❌ Expected to fail - nested aggregates not supported
FROM employees
GROUP BY region;

Advanced Date Functions

DATE_TRUNC, DATE_PART and LAST_DAY are not currently supported:

-- NOT SUPPORTED - These date functions are not supported
WITH DP(epsilon=0.1)
SELECT
    DATE_TRUNC('month', hire_date) AS month,     -- ❌ Expected to fail
    DATE_PART('year', hire_date) AS year,        -- ❌ Expected to fail
    LAST_DAY(hire_date) AS month_end             -- ❌ Expected to fail
FROM employees;

Alternative

Instead use EXTRACT:

-- SUPPORTED ALTERNATIVE
WITH DP(epsilon=0.1)
SELECT
    EXTRACT(YEAR FROM hire_date) AS hire_year,
    EXTRACT(MONTH FROM hire_date) AS hire_month,
    COUNT(*) AS hires
FROM employees
GROUP BY hire_year, hire_month;

Trigonometric & Logarithmic Functions

Trigonometric & Logarithmic Functions are not currently supported:

-- NOT SUPPORTED - Bounds calculation issues
WITH DP(epsilon=0.1)
SELECT
    SIN(AVG(score)) AS sin_score,    -- ❌ Expected to fail
    LOG(AVG(salary)) AS log_salary,  -- ❌ Expected to fail
    EXP(AVG(rating)) AS exp_rating   -- ❌ Expected to fail
FROM employees;

Window Functions

Window functions are not compatible with Differential Privacy:

-- NOT SUPPORTED - Incompatible with Differential Privacy
WITH DP(epsilon=0.1)
SELECT
    department,
    salary,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary) AS rank  -- ❌ Expected to fail
FROM employees;

Why?

Window functions would expose individual row information, violating Differential Privacy.

String Functions

String functions such as UPPER and SUBSTRING are not currently supported:

-- NOT SUPPORTED
WITH DP(epsilon=0.1)
SELECT
    UPPER(department) AS dept_upper,     -- ❌ Expected to fail
    SUBSTRING(name, 1, 3) AS name_prefix -- ❌ Expected to fail
FROM employees;

Subqueries in SELECT

Subqueries in SELECT statements are not currently supported

-- NOT SUPPORTED
WITH DP(epsilon=0.1)
SELECT
    department,
    COUNT(*) AS dept_count,
    (SELECT AVG(salary) FROM employees e2 WHERE e2.region = e1.region) AS region_avg  -- ❌ Expected to fail
FROM employees e1
GROUP BY department;

Set Operations

The set operations UNION, INTERSECT and EXCEPT are not currently supported

-- NOT SUPPORTED - DP composition complexity
WITH DP(epsilon=0.1)
(SELECT region, COUNT(*) FROM employees WHERE age < 40 GROUP BY region)
UNION ALL
(SELECT region, COUNT(*) FROM employees WHERE age >= 40 GROUP BY region);  -- ❌ Expected to fail

Privacy Protection

AGENT SQL automatically prevents privacy violations:

Safe Patterns

AGENT enforces safe patterns so row-level data cannot be exposed

-- SAFE - Only aggregated results exposed
WITH DP(epsilon=0.1)
SELECT
    FLOOR(salary / 10000) * 10000 AS salary_bracket,
    COUNT(*) AS employee_count,
    AVG(age) AS average_age
FROM employees
GROUP BY salary_bracket;
-- Result columns: [salary_bracket, employee_count, average_age]
-- ✅ No raw 'salary' or 'age' columns exposed

Automatic Prevention

AGENT prevents the following privacy violations:

Raw columns in output: (e.g. salary, age, employee_count)

Individual row data exposure

Non-aggregated personal information

Best Practices

Epsilon Budget Management

For exploratory analysis, use a higher Epsilon value for lower privacy but with greater accuracy:

-- For exploratory analysis
WITH DP(epsilon=1.0, delta=1e-6)  -- Higher epsilon = less privacy, more accuracy

For production or sensitive data, use a lower Epsilon value for greater privacy but with less accuracy:

-- For production/sensitive data
WITH DP(epsilon=0.01, delta=1e-9) -- Lower epsilon = more privacy, less accuracy

Column Aliasing

Avoid using reserved keywords for column aliases as they can cause parsing errors, e.g. AS year, AS quarter, AS month, AS day, AS count, etc.

Instead choose other aliases:

-- GOOD - Avoid reserved keywords
WITH DP(epsilon=0.1)
SELECT
    EXTRACT(YEAR FROM hire_date) AS hire_year,       -- ✅ Not 'year'
    EXTRACT(QUARTER FROM hire_date) AS hire_quarter, -- ✅ Not 'quarter'
    COUNT(*) AS employee_count                       -- ✅ Explicit alias
FROM employees
GROUP BY hire_year, hire_quarter;

Effective Grouping

Create meaningful groups and filter out smaller groups for better privacy:

-- GOOD - Creates meaningful groups
WITH DP(epsilon=0.1)
SELECT
    FLOOR(age / 10) * 10 AS age_decade,
    COALESCE(department, 'Unknown') AS dept,
    COUNT(*) AS employee_count
FROM employees
GROUP BY age_decade, dept
HAVING COUNT(*) > 5;  -- Filter out small groups for better privacy

Handling Nulls

Use explicit handling of NULL values:

-- GOOD - Explicit NULL handling
WITH DP(epsilon=0.1)
SELECT
    COALESCE(race, 'Not Specified') AS race_category,
    COUNT(*) AS employee_count
FROM employees
WHERE salary IS NOT NULL  -- Filter NULLs in WHERE clause
GROUP BY race_category;

Meaningful Ranges

Create interpretable ranges:

-- GOOD - Create interpretable ranges
WITH DP(epsilon=0.1)
SELECT
    CASE
        WHEN age BETWEEN 18 AND 25 THEN '18-25'
        WHEN age BETWEEN 26 AND 35 THEN '26-35'
        WHEN age BETWEEN 36 AND 50 THEN '36-50'
        WHEN age > 50 THEN '50+'
        ELSE 'Unknown'
    END AS age_group,
    AVG(salary) AS avg_salary,
    COUNT(*) AS group_size
FROM employees
GROUP BY age_group;

Common Patterns

Here are some examples of common query patterns.

Pattern 1

Age Demographics

WITH DP(epsilon=0.1)
SELECT
    FLOOR(age / 10) * 10 AS age_decade,
    COALESCE(gender, 'Unknown') AS gender_group,
    COUNT(*) AS population,
    AVG(salary) AS avg_salary,
    ROUND(AVG(performance_score), 1) AS avg_performance
FROM employees
GROUP BY age_decade, gender_group
ORDER BY age_decade, gender_group;

Pattern 2

Salary Analysis

WITH DP(epsilon=0.2)
SELECT
    CEILING(salary / 25000) * 25000 AS salary_tier,
    EXTRACT(YEAR FROM hire_date) AS hire_year,
    COUNT(*) AS employee_count,
    AVG(experience_years) AS avg_experience,
    POWER(AVG(performance_score), 2) AS performance_index
FROM employees
WHERE salary IS NOT NULL
GROUP BY salary_tier, hire_year
HAVING COUNT(*) > 3
ORDER BY hire_year, salary_tier;

Pattern 3

Time-based Trends

WITH DP(epsilon=0.15)
SELECT
    EXTRACT(YEAR FROM hire_date) AS hire_year,
    EXTRACT(QUARTER FROM hire_date) AS hire_quarter,
    COALESCE(department, 'Unknown') AS dept,
    COUNT(*) + 2 AS adjusted_hires,
    ROUND(AVG(starting_salary) * 1.03, 0) AS inflation_adjusted_salary
FROM employees
WHERE hire_date >= '2015-01-01'
GROUP BY hire_year, hire_quarter, dept
ORDER BY hire_year, hire_quarter, dept;

Pattern 4

Performance Categories

WITH DP(epsilon=0.1)
SELECT
    CASE
        WHEN performance_score >= 4.5 THEN 'Excellent'
        WHEN performance_score >= 3.5 THEN 'Good'
        WHEN performance_score >= 2.5 THEN 'Satisfactory'
        ELSE 'Needs Improvement'
    END AS performance_category,
    FLOOR(experience_years / 5) * 5 AS experience_bracket,
    COUNT(*) AS employee_count,
    AVG(salary) AS avg_salary,
    SQRT(AVG(bonus_percent)) AS bonus_factor
FROM employees
WHERE performance_score IS NOT NULL
GROUP BY performance_category, experience_bracket;

Column Naming & Practical Tips

Column Naming Behavior

AGENT SQL handles column aliases as follows.

Explicit aliases are preseved exactly:

-- Explicit aliases are preserved exactly
WITH DP(epsilon=0.1)
SELECT
    COUNT(*) AS employee_count  -- ✅ Result column: 'employee_count'
FROM employees;

Without an explicit alias, an auto-generated alias is produced:

-- Without alias, COUNT(*) becomes 'count_result'
WITH DP(epsilon=0.1)
SELECT
    COUNT(*)  -- ⚠️ Result column: 'count_result' (auto-generated)
FROM employees;

Recommendation

Always use explicit, descriptive aliases for better code readability and predictable column names.

Error Messages

When queries fail, you may see:

Parser errors

"Bad token 'keyword' expecting IDENT" → Usually reserved keyword issues

Type errors

"Aggregations must be over numeric or boolean" → Using wrong data types

Privacy errors

Automatic prevention of privacy violations

Testing Your Queries

Start with simple queries and build complexity:

-- 1. Basic aggregation
WITH DP(epsilon=0.1)
SELECT COUNT(*) AS total FROM employees;

Add grouping:

WITH DP(epsilon=0.1)
SELECT region, COUNT(*) AS count FROM employees GROUP BY region;

Add complexity gradually:

WITH DP(epsilon=0.1)
SELECT
    COALESCE(region, 'Unknown') AS region_group,
    FLOOR(age / 10) * 10 AS age_decade,
    COUNT(*) AS population
FROM employees
GROUP BY region_group, age_decade;

Quick Reference

Mathematical Functions

Function	Example	Description
`POWER(x,y)`	`POWER(AVG(score), 2)`	x raised to power y
`SQRT(x)`	`SQRT(AVG(years))`	Square root
`ABS(x)`	`ABS(AVG(score) - 75)`	Absolute value
`FLOOR(x)`	`FLOOR(age / 10) * 10`	Round down
`CEILING(x)`	`CEILING(salary / 1000)`	Round up
`ROUND(x,d)`	`ROUND(AVG(score), 2)`	Round to d decimals

Date Extractions

Extract	Example	Returns
`YEAR`	`EXTRACT(YEAR FROM date)`	2024
`MONTH`	`EXTRACT(MONTH FROM date)`	1-12
`QUARTER`	`EXTRACT(QUARTER FROM date)`	1-4
`DOW`	`EXTRACT(DOW FROM date)`	0-6 (Sun-Sat)
`DAY`	`EXTRACT(DAY FROM date)`	1-31

Privacy Parameters

Use Case	Epsilon	Delta	Privacy Level
Exploration	1.0	1e-6	Low privacy, high accuracy
Analysis	0.1	1e-6	Balanced
Production	0.01	1e-9	High privacy, lower accuracy

Reserved Keywords to Avoid

Avoid these as column aliases: year, quarter, month, day, count, order, group, select, where, from

Use descriptive alternatives: hire_year, hire_quarter, hire_month, hire_day, employee_count

Remember

Always test your queries with different epsilon values to find the right balance between privacy and utility for your use case!

Overview​

Basic Syntax​

Supported Features​

Basic Aggregates​

Arithmetic on Aggregates​

Mathematical Functions​

Mathematical Functions in GROUP BY​

COALESCE Function​

CASE Expressions​

Date Functions​

Complex Combinations​

ORDER BY with Complex Expressions​

Known Limitations​

Datetime Aggregates​

Alternative​

Nested Aggregates​

Advanced Date Functions​

Alternative​

Trigonometric & Logarithmic Functions​

Window Functions​

String Functions​

Subqueries in SELECT​

Set Operations​

Privacy Protection​

Safe Patterns​

Automatic Prevention​

Best Practices​

Epsilon Budget Management​

Column Aliasing​

Effective Grouping​

Handling Nulls​

Meaningful Ranges​

Common Patterns​

Age Demographics​

Salary Analysis​

Time-based Trends​

Performance Categories​

Column Naming & Practical Tips​

Column Naming Behavior​

Error Messages​

Testing Your Queries​

Quick Reference​

Mathematical Functions​

Date Extractions​

Privacy Parameters​

Reserved Keywords to Avoid​

Overview

Basic Syntax

Supported Features

Basic Aggregates

Arithmetic on Aggregates

Mathematical Functions

Mathematical Functions in GROUP BY

COALESCE Function

CASE Expressions

Date Functions

Complex Combinations

ORDER BY with Complex Expressions

Known Limitations

Datetime Aggregates

Alternative

Nested Aggregates

Advanced Date Functions

Alternative

Trigonometric & Logarithmic Functions

Window Functions

String Functions

Subqueries in SELECT

Set Operations

Privacy Protection

Safe Patterns

Automatic Prevention

Best Practices

Epsilon Budget Management

Column Aliasing

Effective Grouping

Handling Nulls

Meaningful Ranges

Common Patterns

Age Demographics

Salary Analysis

Time-based Trends

Performance Categories

Column Naming & Practical Tips

Column Naming Behavior

Error Messages

Testing Your Queries

Quick Reference

Mathematical Functions

Date Extractions

Privacy Parameters

Reserved Keywords to Avoid