Skip to main content

Writing Differential Privacy queries with AGENT SQL

Overview

A comprehensive guide to writing Differential Privacy queries with AGENT SQL.

Basic Syntax

Every query must start with a Differential Privacy clause:

WITH DP(epsilon=0.1, delta=1e-6)
SELECT ...

Parameters:

1
epsilon (or shorthand eps)

Privacy budget (smaller = more private, typically 0.001 to 1.0)

2
delta (or shorthand del)

Failure probability (typically 1e-6 to 1e-9)

Supported Features

Basic Aggregates

All standard aggregation functions work perfectly on numeric columns:

-- Basic aggregates (numeric columns only for MIN/MAX)
WITH DP(epsilon=0.1)
SELECT
COUNT(*) AS total_count,
SUM(salary) AS total_salary,
AVG(age) AS average_age,
MIN(age) AS min_age, -- ✅ Works on numeric
MAX(experience_years) AS max_experience -- ✅ Works on numeric
FROM employees;
MIN/MAX for numeric columns only

MIN/MAX only work on numeric columns, not on datetime or string columns.

Arithmetic on Aggregates

You can perform arithmetic operations on aggregate results:

-- Arithmetic expressions outside aggregates
WITH DP(epsilon=0.1)
SELECT
region,
COUNT(*) + 10 AS adjusted_count,
SUM(salary) * 1.1 AS salary_with_bonus,
AVG(age) - 25 AS age_relative_to_25,
(COUNT(*) * 100) / SUM(salary) AS efficiency_ratio
FROM employees
GROUP BY region;

Mathematical Functions

All common mathematical functions are supported:

-- Mathematical functions on aggregates
WITH DP(epsilon=0.1)
SELECT
department,
POWER(AVG(salary), 2) AS salary_squared,
SQRT(AVG(experience_years)) AS sqrt_experience,
ABS(AVG(score) - 75) AS score_deviation,
FLOOR(AVG(salary) / 1000) * 1000 AS salary_floor_thousands,
CEILING(AVG(bonus_percent)) AS bonus_ceiling,
ROUND(AVG(rating), 2) AS rounded_rating
FROM employees
GROUP BY department;

Supported Math Functions:

1
POWER(x, y)

x raised to power y

2
SQRT(x)

Square root

3
ABS(x)

Absolute value

4
FLOOR(x)

Round down to integer

5
CEILING(x)

Round up to integer

6
ROUND(x, decimals)

Round to specified decimals

7
TRUNC(x, decimals)

Round to specified decimals

Mathematical Functions in GROUP BY

Use math functions to create grouping categories:

-- Age brackets using FLOOR
WITH DP(epsilon=0.1)
SELECT
FLOOR(age / 10) * 10 AS age_decade,
COUNT(*) AS count_in_decade,
AVG(salary) AS avg_salary_in_decade
FROM employees
GROUP BY age_decade
ORDER BY age_decade;

-- Salary tiers using CEILING
WITH DP(epsilon=0.1)
SELECT
CEILING(salary / 25000) * 25000 AS salary_tier,
COUNT(*) AS employees_in_tier,
AVG(experience_years) AS avg_experience
FROM employees
GROUP BY salary_tier;

COALESCE Function

Handle NULL values elegantly in GROUP BY:

-- Basic COALESCE in GROUP BY
WITH DP(epsilon=0.1)
SELECT
COALESCE(race, 'Unknown') AS race_group,
COUNT(*) AS employee_count,
AVG(salary) AS average_salary
FROM employees
GROUP BY race_group
ORDER BY race_group;

-- Multiple COALESCE fallbacks
WITH DP(epsilon=0.1)
SELECT
COALESCE(department, region, 'Unassigned') AS work_group,
COUNT(*) AS employee_count,
AVG(performance_score) AS avg_performance
FROM employees
GROUP BY work_group;

CASE Expressions

Create conditional logic and categories:

-- CASE expressions with aggregates
WITH DP(epsilon=0.1)
SELECT
CASE
WHEN AVG(salary) > 80000 THEN 'High-paying'
WHEN AVG(salary) > 60000 THEN 'Medium-paying'
ELSE 'Entry-level'
END AS department_tier,
COUNT(*) AS department_count
FROM employees
GROUP BY department;

-- Complex CASE in GROUP BY
WITH DP(epsilon=0.1)
SELECT
CASE
WHEN age < 30 AND salary > 70000 THEN 'Young High Earner'
WHEN age >= 50 AND experience_years > 20 THEN 'Senior Expert'
WHEN salary < 40000 THEN 'Entry Level'
ELSE 'Standard Employee'
END AS employee_category,
COUNT(*) AS category_size,
AVG(performance_rating) AS avg_performance
FROM employees
GROUP BY employee_category;

Date Functions

Extract date components for time-based analysis:

-- Extract date parts (avoid reserved keywords like 'year', 'quarter')
WITH DP(epsilon=0.1)
SELECT
EXTRACT(YEAR FROM hire_date) AS hire_year, -- ✅ Use descriptive names
EXTRACT(MONTH FROM hire_date) AS hire_month,
COUNT(*) AS hires_count,
AVG(starting_salary) AS avg_starting_salary
FROM employees
GROUP BY hire_year, hire_month
ORDER BY hire_year, hire_month;

-- Day of week analysis
WITH DP(epsilon=0.1)
SELECT
EXTRACT(DOW FROM hire_date) AS day_of_week, -- 0=Sunday, 6=Saturday
COUNT(*) AS hires_on_day,
AVG(age_at_hire) AS avg_hire_age
FROM employees
GROUP BY day_of_week
ORDER BY day_of_week;

-- Quarterly breakdown
WITH DP(epsilon=0.1)
SELECT
EXTRACT(YEAR FROM hire_date) AS hire_year,
EXTRACT(QUARTER FROM hire_date) AS hire_quarter, -- ✅ Avoid 'quarter'
COUNT(*) AS quarterly_hires,
AVG(salary) AS avg_starting_salary
FROM employees
GROUP BY hire_year, hire_quarter;

Supported Date Extractions:

  • YEAR, MONTH, DAY
  • QUARTER (1-4)
  • DOW (Day of Week: 0=Sunday, 6=Saturday)
  • HOUR, MINUTE, SECOND
  • DECADE
Reserved Keywords

Avoid using SQL reserved words like year, quarter, month as column aliases. Use descriptive names like hire_year, hire_quarter instead.

Complex Combinations

Combine multiple features for powerful analysis:

-- Everything together: COALESCE + Math + CASE + Arithmetic
WITH DP(epsilon=0.2)
SELECT
EXTRACT(YEAR FROM hire_date) AS hire_year,
COALESCE(race, 'Unknown') AS race_group,
CASE
WHEN FLOOR(salary / 10000) * 10000 < 50000 THEN 'Low Income'
WHEN FLOOR(salary / 10000) * 10000 < 80000 THEN 'Middle Income'
ELSE 'High Income'
END AS income_bracket,
COUNT(*) + 5 AS adjusted_count,
ROUND(AVG(salary) * 1.05, 2) AS projected_salary,
POWER(AVG(performance_score), 2) AS performance_squared
FROM employees
WHERE hire_date IS NOT NULL
GROUP BY hire_year, race_group, income_bracket
ORDER BY hire_year, race_group;

ORDER BY with Complex Expressions

Sort results using any supported expression:

-- ORDER BY with COALESCE
WITH DP(epsilon=0.1)
SELECT
COALESCE(department, 'Unknown') AS dept_group,
COUNT(*) AS employee_count,
AVG(salary) AS avg_salary
FROM employees
GROUP BY dept_group
ORDER BY dept_group, avg_salary DESC;

-- ORDER BY with mathematical expressions
WITH DP(epsilon=0.1)
SELECT
FLOOR(experience_years / 5) * 5 AS experience_bracket,
COUNT(*) AS employee_count,
AVG(salary) AS avg_salary
FROM employees
GROUP BY experience_bracket
ORDER BY experience_bracket;

Known Limitations

A number of limitations on the use of SQL statements are listed below. Some are features that are currently not supported, others are incompatible with Differential Privacy. We have given alternative approaches where possible.

Datetime Aggregates

MIN/MAX are not supported on datetime columns:

-- NOT SUPPORTED - MIN/MAX on datetime columns
WITH DP(epsilon=0.1)
SELECT
MIN(hire_date) AS earliest_hire, -- ❌ Expected to fail - datetime not supported
MAX(hire_date) AS latest_hire -- ❌ Expected to fail - datetime not supported
FROM employees;

Alternative

use EXTRACT to convert to a numeric type:

-- SUPPORTED ALTERNATIVE
WITH DP(epsilon=0.1)
SELECT
MIN(EXTRACT(YEAR FROM hire_date)) AS earliest_year, -- ✅ Works on numeric
MAX(EXTRACT(YEAR FROM hire_date)) AS latest_year -- ✅ Works on numeric
FROM employees;

Nested Aggregates

Nested aggregates are not currently supported:

-- NOT SUPPORTED - Nested aggregates
WITH DP(epsilon=0.1)
SELECT
region,
COUNT(SUM(salary)) AS nested_agg -- ❌ Expected to fail - nested aggregates not supported
FROM employees
GROUP BY region;

Advanced Date Functions

DATE_TRUNC, DATE_PART and LAST_DAY are not currently supported:

-- NOT SUPPORTED - These date functions are not supported
WITH DP(epsilon=0.1)
SELECT
DATE_TRUNC('month', hire_date) AS month, -- ❌ Expected to fail
DATE_PART('year', hire_date) AS year, -- ❌ Expected to fail
LAST_DAY(hire_date) AS month_end -- ❌ Expected to fail
FROM employees;

Alternative

Instead use EXTRACT:

-- SUPPORTED ALTERNATIVE
WITH DP(epsilon=0.1)
SELECT
EXTRACT(YEAR FROM hire_date) AS hire_year,
EXTRACT(MONTH FROM hire_date) AS hire_month,
COUNT(*) AS hires
FROM employees
GROUP BY hire_year, hire_month;

Trigonometric & Logarithmic Functions

Trigonometric & Logarithmic Functions are not currently supported:

-- NOT SUPPORTED - Bounds calculation issues
WITH DP(epsilon=0.1)
SELECT
SIN(AVG(score)) AS sin_score, -- ❌ Expected to fail
LOG(AVG(salary)) AS log_salary, -- ❌ Expected to fail
EXP(AVG(rating)) AS exp_rating -- ❌ Expected to fail
FROM employees;

Window Functions

Window functions are not compatible with Differential Privacy:

-- NOT SUPPORTED - Incompatible with Differential Privacy
WITH DP(epsilon=0.1)
SELECT
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary) AS rank -- ❌ Expected to fail
FROM employees;
Why?

Window functions would expose individual row information, violating Differential Privacy.

String Functions

String functions such as UPPER and SUBSTRING are not currently supported:

-- NOT SUPPORTED
WITH DP(epsilon=0.1)
SELECT
UPPER(department) AS dept_upper, -- ❌ Expected to fail
SUBSTRING(name, 1, 3) AS name_prefix -- ❌ Expected to fail
FROM employees;

Subqueries in SELECT

Subqueries in SELECT statements are not currently supported

-- NOT SUPPORTED
WITH DP(epsilon=0.1)
SELECT
department,
COUNT(*) AS dept_count,
(SELECT AVG(salary) FROM employees e2 WHERE e2.region = e1.region) AS region_avg -- ❌ Expected to fail
FROM employees e1
GROUP BY department;

Set Operations

The set operations UNION, INTERSECT and EXCEPT are not currently supported

-- NOT SUPPORTED - DP composition complexity
WITH DP(epsilon=0.1)
(SELECT region, COUNT(*) FROM employees WHERE age < 40 GROUP BY region)
UNION ALL
(SELECT region, COUNT(*) FROM employees WHERE age >= 40 GROUP BY region); -- ❌ Expected to fail

Privacy Protection

AGENT SQL automatically prevents privacy violations:

Safe Patterns

AGENT enforces safe patterns so row-level data cannot be exposed

-- SAFE - Only aggregated results exposed
WITH DP(epsilon=0.1)
SELECT
FLOOR(salary / 10000) * 10000 AS salary_bracket,
COUNT(*) AS employee_count,
AVG(age) AS average_age
FROM employees
GROUP BY salary_bracket;
-- Result columns: [salary_bracket, employee_count, average_age]
-- ✅ No raw 'salary' or 'age' columns exposed

Automatic Prevention

AGENT prevents the following privacy violations:

Best Practices

Epsilon Budget Management

For exploratory analysis, use a higher Epsilon value for lower privacy but with greater accuracy:

-- For exploratory analysis
WITH DP(epsilon=1.0, delta=1e-6) -- Higher epsilon = less privacy, more accuracy

For production or sensitive data, use a lower Epsilon value for greater privacy but with less accuracy:

-- For production/sensitive data
WITH DP(epsilon=0.01, delta=1e-9) -- Lower epsilon = more privacy, less accuracy

Column Aliasing

Avoid using reserved keywords for column aliases as they can cause parsing errors, e.g. AS year, AS quarter, AS month, AS day, AS count, etc.

Instead choose other aliases:

-- GOOD - Avoid reserved keywords
WITH DP(epsilon=0.1)
SELECT
EXTRACT(YEAR FROM hire_date) AS hire_year, -- ✅ Not 'year'
EXTRACT(QUARTER FROM hire_date) AS hire_quarter, -- ✅ Not 'quarter'
COUNT(*) AS employee_count -- ✅ Explicit alias
FROM employees
GROUP BY hire_year, hire_quarter;

Effective Grouping

Create meaningful groups and filter out smaller groups for better privacy:

-- GOOD - Creates meaningful groups
WITH DP(epsilon=0.1)
SELECT
FLOOR(age / 10) * 10 AS age_decade,
COALESCE(department, 'Unknown') AS dept,
COUNT(*) AS employee_count
FROM employees
GROUP BY age_decade, dept
HAVING COUNT(*) > 5; -- Filter out small groups for better privacy

Handling Nulls

Use explicit handling of NULL values:

-- GOOD - Explicit NULL handling
WITH DP(epsilon=0.1)
SELECT
COALESCE(race, 'Not Specified') AS race_category,
COUNT(*) AS employee_count
FROM employees
WHERE salary IS NOT NULL -- Filter NULLs in WHERE clause
GROUP BY race_category;

Meaningful Ranges

Create interpretable ranges:

-- GOOD - Create interpretable ranges
WITH DP(epsilon=0.1)
SELECT
CASE
WHEN age BETWEEN 18 AND 25 THEN '18-25'
WHEN age BETWEEN 26 AND 35 THEN '26-35'
WHEN age BETWEEN 36 AND 50 THEN '36-50'
WHEN age > 50 THEN '50+'
ELSE 'Unknown'
END AS age_group,
AVG(salary) AS avg_salary,
COUNT(*) AS group_size
FROM employees
GROUP BY age_group;

Common Patterns

Here are some examples of common query patterns.

Pattern 1

Age Demographics

WITH DP(epsilon=0.1)
SELECT
FLOOR(age / 10) * 10 AS age_decade,
COALESCE(gender, 'Unknown') AS gender_group,
COUNT(*) AS population,
AVG(salary) AS avg_salary,
ROUND(AVG(performance_score), 1) AS avg_performance
FROM employees
GROUP BY age_decade, gender_group
ORDER BY age_decade, gender_group;
Pattern 2

Salary Analysis

WITH DP(epsilon=0.2)
SELECT
CEILING(salary / 25000) * 25000 AS salary_tier,
EXTRACT(YEAR FROM hire_date) AS hire_year,
COUNT(*) AS employee_count,
AVG(experience_years) AS avg_experience,
POWER(AVG(performance_score), 2) AS performance_index
FROM employees
WHERE salary IS NOT NULL
GROUP BY salary_tier, hire_year
HAVING COUNT(*) > 3
ORDER BY hire_year, salary_tier;
Pattern 3
WITH DP(epsilon=0.15)
SELECT
EXTRACT(YEAR FROM hire_date) AS hire_year,
EXTRACT(QUARTER FROM hire_date) AS hire_quarter,
COALESCE(department, 'Unknown') AS dept,
COUNT(*) + 2 AS adjusted_hires,
ROUND(AVG(starting_salary) * 1.03, 0) AS inflation_adjusted_salary
FROM employees
WHERE hire_date >= '2015-01-01'
GROUP BY hire_year, hire_quarter, dept
ORDER BY hire_year, hire_quarter, dept;
Pattern 4

Performance Categories

WITH DP(epsilon=0.1)
SELECT
CASE
WHEN performance_score >= 4.5 THEN 'Excellent'
WHEN performance_score >= 3.5 THEN 'Good'
WHEN performance_score >= 2.5 THEN 'Satisfactory'
ELSE 'Needs Improvement'
END AS performance_category,
FLOOR(experience_years / 5) * 5 AS experience_bracket,
COUNT(*) AS employee_count,
AVG(salary) AS avg_salary,
SQRT(AVG(bonus_percent)) AS bonus_factor
FROM employees
WHERE performance_score IS NOT NULL
GROUP BY performance_category, experience_bracket;

Column Naming & Practical Tips

Column Naming Behavior

AGENT SQL handles column aliases as follows.

Explicit aliases are preseved exactly:

-- Explicit aliases are preserved exactly
WITH DP(epsilon=0.1)
SELECT
COUNT(*) AS employee_count -- ✅ Result column: 'employee_count'
FROM employees;

Without an explicit alias, an auto-generated alias is produced:

-- Without alias, COUNT(*) becomes 'count_result'
WITH DP(epsilon=0.1)
SELECT
COUNT(*) -- ⚠️ Result column: 'count_result' (auto-generated)
FROM employees;
Recommendation

Always use explicit, descriptive aliases for better code readability and predictable column names.

Error Messages

When queries fail, you may see:

1
Parser errors

"Bad token 'keyword' expecting IDENT" → Usually reserved keyword issues

2
Type errors

"Aggregations must be over numeric or boolean" → Using wrong data types

3
Privacy errors

Automatic prevention of privacy violations


Testing Your Queries

  1. Start with simple queries and build complexity:
-- 1. Basic aggregation
WITH DP(epsilon=0.1)
SELECT COUNT(*) AS total FROM employees;
  1. Add grouping:
WITH DP(epsilon=0.1)
SELECT region, COUNT(*) AS count FROM employees GROUP BY region;
  1. Add complexity gradually:
WITH DP(epsilon=0.1)
SELECT
COALESCE(region, 'Unknown') AS region_group,
FLOOR(age / 10) * 10 AS age_decade,
COUNT(*) AS population
FROM employees
GROUP BY region_group, age_decade;

Quick Reference

Mathematical Functions

FunctionExampleDescription
POWER(x,y)POWER(AVG(score), 2)x raised to power y
SQRT(x)SQRT(AVG(years))Square root
ABS(x)ABS(AVG(score) - 75)Absolute value
FLOOR(x)FLOOR(age / 10) * 10Round down
CEILING(x)CEILING(salary / 1000)Round up
ROUND(x,d)ROUND(AVG(score), 2)Round to d decimals

Date Extractions

ExtractExampleReturns
YEAREXTRACT(YEAR FROM date)2024
MONTHEXTRACT(MONTH FROM date)1-12
QUARTEREXTRACT(QUARTER FROM date)1-4
DOWEXTRACT(DOW FROM date)0-6 (Sun-Sat)
DAYEXTRACT(DAY FROM date)1-31

Privacy Parameters

Use CaseEpsilonDeltaPrivacy Level
Exploration1.01e-6Low privacy, high accuracy
Analysis0.11e-6Balanced
Production0.011e-9High privacy, lower accuracy

Reserved Keywords to Avoid

Avoid these as column aliases: year, quarter, month, day, count, order, group, select, where, from

Use descriptive alternatives: hire_year, hire_quarter, hire_month, hire_day, employee_count

Remember

Always test your queries with different epsilon values to find the right balance between privacy and utility for your use case!