Basic Syntax
Writing Differential Privacy queries with AGENT SQL
Overview
A comprehensive guide to writing Differential Privacy queries with AGENT SQL.
Basic Syntax
Every query must start with a Differential Privacy clause:
WITH DP(epsilon=0.1, delta=1e-6)
SELECT ...
Parameters:
epsilon (or shorthand eps)Privacy budget (smaller = more private, typically 0.001 to 1.0)
delta (or shorthand del)Failure probability (typically 1e-6 to 1e-9)
Supported Features
Basic Aggregates
All standard aggregation functions work perfectly on numeric columns:
-- Basic aggregates (numeric columns only for MIN/MAX)
WITH DP(epsilon=0.1)
SELECT
COUNT(*) AS total_count,
SUM(salary) AS total_salary,
AVG(age) AS average_age,
MIN(age) AS min_age, -- ✅ Works on numeric
MAX(experience_years) AS max_experience -- ✅ Works on numeric
FROM employees;
MIN/MAX only work on numeric columns, not on datetime or string columns.
Arithmetic on Aggregates
You can perform arithmetic operations on aggregate results:
-- Arithmetic expressions outside aggregates
WITH DP(epsilon=0.1)
SELECT
region,
COUNT(*) + 10 AS adjusted_count,
SUM(salary) * 1.1 AS salary_with_bonus,
AVG(age) - 25 AS age_relative_to_25,
(COUNT(*) * 100) / SUM(salary) AS efficiency_ratio
FROM employees
GROUP BY region;
Mathematical Functions
All common mathematical functions are supported:
-- Mathematical functions on aggregates
WITH DP(epsilon=0.1)
SELECT
department,
POWER(AVG(salary), 2) AS salary_squared,
SQRT(AVG(experience_years)) AS sqrt_experience,
ABS(AVG(score) - 75) AS score_deviation,
FLOOR(AVG(salary) / 1000) * 1000 AS salary_floor_thousands,
CEILING(AVG(bonus_percent)) AS bonus_ceiling,
ROUND(AVG(rating), 2) AS rounded_rating
FROM employees
GROUP BY department;
Supported Math Functions:
POWER(x, y)x raised to power y
SQRT(x)Square root
ABS(x)Absolute value
FLOOR(x)Round down to integer
CEILING(x)Round up to integer
ROUND(x, decimals)Round to specified decimals
TRUNC(x, decimals)Round to specified decimals
Mathematical Functions in GROUP BY
Use math functions to create grouping categories:
-- Age brackets using FLOOR
WITH DP(epsilon=0.1)
SELECT
FLOOR(age / 10) * 10 AS age_decade,
COUNT(*) AS count_in_decade,
AVG(salary) AS avg_salary_in_decade
FROM employees
GROUP BY age_decade
ORDER BY age_decade;
-- Salary tiers using CEILING
WITH DP(epsilon=0.1)
SELECT
CEILING(salary / 25000) * 25000 AS salary_tier,
COUNT(*) AS employees_in_tier,
AVG(experience_years) AS avg_experience
FROM employees
GROUP BY salary_tier;
COALESCE Function
Handle NULL values elegantly in GROUP BY:
-- Basic COALESCE in GROUP BY
WITH DP(epsilon=0.1)
SELECT
COALESCE(race, 'Unknown') AS race_group,
COUNT(*) AS employee_count,
AVG(salary) AS average_salary
FROM employees
GROUP BY race_group
ORDER BY race_group;
-- Multiple COALESCE fallbacks
WITH DP(epsilon=0.1)
SELECT
COALESCE(department, region, 'Unassigned') AS work_group,
COUNT(*) AS employee_count,
AVG(performance_score) AS avg_performance
FROM employees
GROUP BY work_group;
CASE Expressions
Create conditional logic and categories:
-- CASE expressions with aggregates
WITH DP(epsilon=0.1)
SELECT
CASE
WHEN AVG(salary) > 80000 THEN 'High-paying'
WHEN AVG(salary) > 60000 THEN 'Medium-paying'
ELSE 'Entry-level'
END AS department_tier,
COUNT(*) AS department_count
FROM employees
GROUP BY department;
-- Complex CASE in GROUP BY
WITH DP(epsilon=0.1)
SELECT
CASE
WHEN age < 30 AND salary > 70000 THEN 'Young High Earner'
WHEN age >= 50 AND experience_years > 20 THEN 'Senior Expert'
WHEN salary < 40000 THEN 'Entry Level'
ELSE 'Standard Employee'
END AS employee_category,
COUNT(*) AS category_size,
AVG(performance_rating) AS avg_performance
FROM employees
GROUP BY employee_category;
Date Functions
Extract date components for time-based analysis:
-- Extract date parts (avoid reserved keywords like 'year', 'quarter')
WITH DP(epsilon=0.1)
SELECT
EXTRACT(YEAR FROM hire_date) AS hire_year, -- ✅ Use descriptive names
EXTRACT(MONTH FROM hire_date) AS hire_month,
COUNT(*) AS hires_count,
AVG(starting_salary) AS avg_starting_salary
FROM employees
GROUP BY hire_year, hire_month
ORDER BY hire_year, hire_month;
-- Day of week analysis
WITH DP(epsilon=0.1)
SELECT
EXTRACT(DOW FROM hire_date) AS day_of_week, -- 0=Sunday, 6=Saturday
COUNT(*) AS hires_on_day,
AVG(age_at_hire) AS avg_hire_age
FROM employees
GROUP BY day_of_week
ORDER BY day_of_week;
-- Quarterly breakdown
WITH DP(epsilon=0.1)
SELECT
EXTRACT(YEAR FROM hire_date) AS hire_year,
EXTRACT(QUARTER FROM hire_date) AS hire_quarter, -- ✅ Avoid 'quarter'
COUNT(*) AS quarterly_hires,
AVG(salary) AS avg_starting_salary
FROM employees
GROUP BY hire_year, hire_quarter;
Supported Date Extractions:
YEAR,MONTH,DAYQUARTER(1-4)DOW(Day of Week: 0=Sunday, 6=Saturday)HOUR,MINUTE,SECONDDECADE
Avoid using SQL reserved words like year, quarter, month as column aliases. Use descriptive names like hire_year, hire_quarter instead.
Complex Combinations
Combine multiple features for powerful analysis:
-- Everything together: COALESCE + Math + CASE + Arithmetic
WITH DP(epsilon=0.2)
SELECT
EXTRACT(YEAR FROM hire_date) AS hire_year,
COALESCE(race, 'Unknown') AS race_group,
CASE
WHEN FLOOR(salary / 10000) * 10000 < 50000 THEN 'Low Income'
WHEN FLOOR(salary / 10000) * 10000 < 80000 THEN 'Middle Income'
ELSE 'High Income'
END AS income_bracket,
COUNT(*) + 5 AS adjusted_count,
ROUND(AVG(salary) * 1.05, 2) AS projected_salary,
POWER(AVG(performance_score), 2) AS performance_squared
FROM employees
WHERE hire_date IS NOT NULL
GROUP BY hire_year, race_group, income_bracket
ORDER BY hire_year, race_group;
ORDER BY with Complex Expressions
Sort results using any supported expression:
-- ORDER BY with COALESCE
WITH DP(epsilon=0.1)
SELECT
COALESCE(department, 'Unknown') AS dept_group,
COUNT(*) AS employee_count,
AVG(salary) AS avg_salary
FROM employees
GROUP BY dept_group
ORDER BY dept_group, avg_salary DESC;
-- ORDER BY with mathematical expressions
WITH DP(epsilon=0.1)
SELECT
FLOOR(experience_years / 5) * 5 AS experience_bracket,
COUNT(*) AS employee_count,
AVG(salary) AS avg_salary
FROM employees
GROUP BY experience_bracket
ORDER BY experience_bracket;
Known Limitations
A number of limitations on the use of SQL statements are listed below. Some are features that are currently not supported, others are incompatible with Differential Privacy. We have given alternative approaches where possible.
Datetime Aggregates
MIN/MAX are not supported on datetime columns:
-- NOT SUPPORTED - MIN/MAX on datetime columns
WITH DP(epsilon=0.1)
SELECT
MIN(hire_date) AS earliest_hire, -- ❌ Expected to fail - datetime not supported
MAX(hire_date) AS latest_hire -- ❌ Expected to fail - datetime not supported
FROM employees;
Alternative
use EXTRACT to convert to a numeric type:
-- SUPPORTED ALTERNATIVE
WITH DP(epsilon=0.1)
SELECT
MIN(EXTRACT(YEAR FROM hire_date)) AS earliest_year, -- ✅ Works on numeric
MAX(EXTRACT(YEAR FROM hire_date)) AS latest_year -- ✅ Works on numeric
FROM employees;
Nested Aggregates
Nested aggregates are not currently supported:
-- NOT SUPPORTED - Nested aggregates
WITH DP(epsilon=0.1)
SELECT
region,
COUNT(SUM(salary)) AS nested_agg -- ❌ Expected to fail - nested aggregates not supported
FROM employees
GROUP BY region;
Advanced Date Functions
DATE_TRUNC, DATE_PART and LAST_DAY are not currently supported:
-- NOT SUPPORTED - These date functions are not supported
WITH DP(epsilon=0.1)
SELECT
DATE_TRUNC('month', hire_date) AS month, -- ❌ Expected to fail
DATE_PART('year', hire_date) AS year, -- ❌ Expected to fail
LAST_DAY(hire_date) AS month_end -- ❌ Expected to fail
FROM employees;
Alternative
Instead use EXTRACT:
-- SUPPORTED ALTERNATIVE
WITH DP(epsilon=0.1)
SELECT
EXTRACT(YEAR FROM hire_date) AS hire_year,
EXTRACT(MONTH FROM hire_date) AS hire_month,
COUNT(*) AS hires
FROM employees
GROUP BY hire_year, hire_month;
Trigonometric & Logarithmic Functions
Trigonometric & Logarithmic Functions are not currently supported:
-- NOT SUPPORTED - Bounds calculation issues
WITH DP(epsilon=0.1)
SELECT
SIN(AVG(score)) AS sin_score, -- ❌ Expected to fail
LOG(AVG(salary)) AS log_salary, -- ❌ Expected to fail
EXP(AVG(rating)) AS exp_rating -- ❌ Expected to fail
FROM employees;
Window Functions
Window functions are not compatible with Differential Privacy:
-- NOT SUPPORTED - Incompatible with Differential Privacy
WITH DP(epsilon=0.1)
SELECT
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary) AS rank -- ❌ Expected to fail
FROM employees;
Window functions would expose individual row information, violating Differential Privacy.
String Functions
String functions such as UPPER and SUBSTRING are not currently supported:
-- NOT SUPPORTED
WITH DP(epsilon=0.1)
SELECT
UPPER(department) AS dept_upper, -- ❌ Expected to fail
SUBSTRING(name, 1, 3) AS name_prefix -- ❌ Expected to fail
FROM employees;
Subqueries in SELECT
Subqueries in SELECT statements are not currently supported
-- NOT SUPPORTED
WITH DP(epsilon=0.1)
SELECT
department,
COUNT(*) AS dept_count,
(SELECT AVG(salary) FROM employees e2 WHERE e2.region = e1.region) AS region_avg -- ❌ Expected to fail
FROM employees e1
GROUP BY department;
Set Operations
The set operations UNION, INTERSECT and EXCEPT are not currently supported
-- NOT SUPPORTED - DP composition complexity
WITH DP(epsilon=0.1)
(SELECT region, COUNT(*) FROM employees WHERE age < 40 GROUP BY region)
UNION ALL
(SELECT region, COUNT(*) FROM employees WHERE age >= 40 GROUP BY region); -- ❌ Expected to fail
Privacy Protection
AGENT SQL automatically prevents privacy violations:
Safe Patterns
AGENT enforces safe patterns so row-level data cannot be exposed
-- SAFE - Only aggregated results exposed
WITH DP(epsilon=0.1)
SELECT
FLOOR(salary / 10000) * 10000 AS salary_bracket,
COUNT(*) AS employee_count,
AVG(age) AS average_age
FROM employees
GROUP BY salary_bracket;
-- Result columns: [salary_bracket, employee_count, average_age]
-- ✅ No raw 'salary' or 'age' columns exposed
Automatic Prevention
AGENT prevents the following privacy violations:
Raw columns in output: (e.g. salary, age, employee_count)
Individual row data exposure
Non-aggregated personal information
Best Practices
Epsilon Budget Management
For exploratory analysis, use a higher Epsilon value for lower privacy but with greater accuracy:
-- For exploratory analysis
WITH DP(epsilon=1.0, delta=1e-6) -- Higher epsilon = less privacy, more accuracy
For production or sensitive data, use a lower Epsilon value for greater privacy but with less accuracy:
-- For production/sensitive data
WITH DP(epsilon=0.01, delta=1e-9) -- Lower epsilon = more privacy, less accuracy
Column Aliasing
Avoid using reserved keywords for column aliases as they can cause parsing errors, e.g. AS year, AS quarter, AS month, AS day, AS count, etc.
Instead choose other aliases:
-- GOOD - Avoid reserved keywords
WITH DP(epsilon=0.1)
SELECT
EXTRACT(YEAR FROM hire_date) AS hire_year, -- ✅ Not 'year'
EXTRACT(QUARTER FROM hire_date) AS hire_quarter, -- ✅ Not 'quarter'
COUNT(*) AS employee_count -- ✅ Explicit alias
FROM employees
GROUP BY hire_year, hire_quarter;
Effective Grouping
Create meaningful groups and filter out smaller groups for better privacy:
-- GOOD - Creates meaningful groups
WITH DP(epsilon=0.1)
SELECT
FLOOR(age / 10) * 10 AS age_decade,
COALESCE(department, 'Unknown') AS dept,
COUNT(*) AS employee_count
FROM employees
GROUP BY age_decade, dept
HAVING COUNT(*) > 5; -- Filter out small groups for better privacy
Handling Nulls
Use explicit handling of NULL values:
-- GOOD - Explicit NULL handling
WITH DP(epsilon=0.1)
SELECT
COALESCE(race, 'Not Specified') AS race_category,
COUNT(*) AS employee_count
FROM employees
WHERE salary IS NOT NULL -- Filter NULLs in WHERE clause
GROUP BY race_category;
Meaningful Ranges
Create interpretable ranges:
-- GOOD - Create interpretable ranges
WITH DP(epsilon=0.1)
SELECT
CASE
WHEN age BETWEEN 18 AND 25 THEN '18-25'
WHEN age BETWEEN 26 AND 35 THEN '26-35'
WHEN age BETWEEN 36 AND 50 THEN '36-50'
WHEN age > 50 THEN '50+'
ELSE 'Unknown'
END AS age_group,
AVG(salary) AS avg_salary,
COUNT(*) AS group_size
FROM employees
GROUP BY age_group;
Common Patterns
Here are some examples of common query patterns.
Age Demographics
WITH DP(epsilon=0.1)
SELECT
FLOOR(age / 10) * 10 AS age_decade,
COALESCE(gender, 'Unknown') AS gender_group,
COUNT(*) AS population,
AVG(salary) AS avg_salary,
ROUND(AVG(performance_score), 1) AS avg_performance
FROM employees
GROUP BY age_decade, gender_group
ORDER BY age_decade, gender_group;
Salary Analysis
WITH DP(epsilon=0.2)
SELECT
CEILING(salary / 25000) * 25000 AS salary_tier,
EXTRACT(YEAR FROM hire_date) AS hire_year,
COUNT(*) AS employee_count,
AVG(experience_years) AS avg_experience,
POWER(AVG(performance_score), 2) AS performance_index
FROM employees
WHERE salary IS NOT NULL
GROUP BY salary_tier, hire_year
HAVING COUNT(*) > 3
ORDER BY hire_year, salary_tier;
Time-based Trends
WITH DP(epsilon=0.15)
SELECT
EXTRACT(YEAR FROM hire_date) AS hire_year,
EXTRACT(QUARTER FROM hire_date) AS hire_quarter,
COALESCE(department, 'Unknown') AS dept,
COUNT(*) + 2 AS adjusted_hires,
ROUND(AVG(starting_salary) * 1.03, 0) AS inflation_adjusted_salary
FROM employees
WHERE hire_date >= '2015-01-01'
GROUP BY hire_year, hire_quarter, dept
ORDER BY hire_year, hire_quarter, dept;
Performance Categories
WITH DP(epsilon=0.1)
SELECT
CASE
WHEN performance_score >= 4.5 THEN 'Excellent'
WHEN performance_score >= 3.5 THEN 'Good'
WHEN performance_score >= 2.5 THEN 'Satisfactory'
ELSE 'Needs Improvement'
END AS performance_category,
FLOOR(experience_years / 5) * 5 AS experience_bracket,
COUNT(*) AS employee_count,
AVG(salary) AS avg_salary,
SQRT(AVG(bonus_percent)) AS bonus_factor
FROM employees
WHERE performance_score IS NOT NULL
GROUP BY performance_category, experience_bracket;
Column Naming & Practical Tips
Column Naming Behavior
AGENT SQL handles column aliases as follows.
Explicit aliases are preseved exactly:
-- Explicit aliases are preserved exactly
WITH DP(epsilon=0.1)
SELECT
COUNT(*) AS employee_count -- ✅ Result column: 'employee_count'
FROM employees;
Without an explicit alias, an auto-generated alias is produced:
-- Without alias, COUNT(*) becomes 'count_result'
WITH DP(epsilon=0.1)
SELECT
COUNT(*) -- ⚠️ Result column: 'count_result' (auto-generated)
FROM employees;
Always use explicit, descriptive aliases for better code readability and predictable column names.
Error Messages
When queries fail, you may see:
"Bad token 'keyword' expecting IDENT" → Usually reserved keyword issues
"Aggregations must be over numeric or boolean" → Using wrong data types
Automatic prevention of privacy violations
Testing Your Queries
- Start with simple queries and build complexity:
-- 1. Basic aggregation
WITH DP(epsilon=0.1)
SELECT COUNT(*) AS total FROM employees;
- Add grouping:
WITH DP(epsilon=0.1)
SELECT region, COUNT(*) AS count FROM employees GROUP BY region;
- Add complexity gradually:
WITH DP(epsilon=0.1)
SELECT
COALESCE(region, 'Unknown') AS region_group,
FLOOR(age / 10) * 10 AS age_decade,
COUNT(*) AS population
FROM employees
GROUP BY region_group, age_decade;
Quick Reference
Mathematical Functions
| Function | Example | Description |
|---|---|---|
POWER(x,y) | POWER(AVG(score), 2) | x raised to power y |
SQRT(x) | SQRT(AVG(years)) | Square root |
ABS(x) | ABS(AVG(score) - 75) | Absolute value |
FLOOR(x) | FLOOR(age / 10) * 10 | Round down |
CEILING(x) | CEILING(salary / 1000) | Round up |
ROUND(x,d) | ROUND(AVG(score), 2) | Round to d decimals |
Date Extractions
| Extract | Example | Returns |
|---|---|---|
YEAR | EXTRACT(YEAR FROM date) | 2024 |
MONTH | EXTRACT(MONTH FROM date) | 1-12 |
QUARTER | EXTRACT(QUARTER FROM date) | 1-4 |
DOW | EXTRACT(DOW FROM date) | 0-6 (Sun-Sat) |
DAY | EXTRACT(DAY FROM date) | 1-31 |
Privacy Parameters
| Use Case | Epsilon | Delta | Privacy Level |
|---|---|---|---|
| Exploration | 1.0 | 1e-6 | Low privacy, high accuracy |
| Analysis | 0.1 | 1e-6 | Balanced |
| Production | 0.01 | 1e-9 | High privacy, lower accuracy |
Reserved Keywords to Avoid
Avoid these as column aliases: year, quarter, month, day, count, order, group, select, where, from
Use descriptive alternatives: hire_year, hire_quarter, hire_month, hire_day, employee_count
Always test your queries with different epsilon values to find the right balance between privacy and utility for your use case!