AGENT SQL Reference
Overview
AGENT SQL is a differentially private SQL framework designed to provide a secure and private way of querying sensitive data. This innovative solution enables users to integrate differential privacy into their SQL workflows, ensuring that their data analysis and querying processes comply with stringent privacy standards.
AGENT SQL allows seamless integration with various SQL client applications, providing a flexible and adaptable solution for data scientists and analysts. By leveraging AGENT SQL, users can perform complex data analysis tasks while maintaining the confidentiality and integrity of their data.
Supported and Unsupported Query Types
This section provides an overview of the query types supported by AGENT SQL, as well as those that are not supported. Understanding these capabilities will help you effectively utilize AGENT SQL for your data querying needs. First and the most important, the dialect of the SQL query should match the dialect of the database you are querying. Unless stated otherwise, AGENT SQL doesn't support any functions which are not supported in the underlying database.
Supported Query Types
| Category | Query Type | Description |
|---|---|---|
| Aggregates | COUNT, SUM, AVG, STDDEV, VAR | Basic aggregate functions for statistical analysis. |
| Subqueries | Subquery in the FROM, WHERE, HAVING clause, Subquery as a Scalar Value | See below for examples of the supported subquery types. |
| Joins | FULL JOIN, OUTER JOIN, LEFT JOIN, RIGHT JOIN | Various types of joins for combining tables within the same dataset. |
| Conditions | WHERE, WHERE IN, WHERE BETWEEN, WHERE IS NULL / IS NOT NULL | Conditional filtering of data. |
| Grouping | GROUP BY | Groups rows that have the same values in specified columns. |
| Aliases | AS | Renames columns or tables in output for better readability. |
| Expressions | Arithmetic Operations | Supports operations like '(+)', '(-)', '(*)', '(/)', '(%)'. |
| Functions | ABS, CEIL, FLOOR, SQRT, etc. | Mathematical functions for data manipulation. |
| Date and Time Functions | CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP, EXTRACT, DAYNAME | Functions for handling and manipulating date and time values. |
Unsupported Query Types
| Category | Query Type | Description |
|---|---|---|
| Aggregates | MEDIAN, MODE | More complex statistical functions are not supported. |
| Subqueries | Correlated Subqueries | Subqueries that reference columns from the outer query are not supported. |
| Joins | Cross Join, Joins across different datasets | Cross joins that produce a Cartesian product of the two tables are not supported. Joins are only supported within the same dataset. |
| DML | MERGE, UPDATE, INSERT, DELETE | Data manipulation operations are not supported. |
| DDL | CREATE, DROP, ALTER, RENAME, TRUNCATE, etc | Manipulating the structure of tables or columns is not supported. |
| Functions | NULL IF | Conditional functions like NULL IF are not supported. |
| Date and Time Functions | DATEADD, DATEDIFF, DATEPART, FORMAT, GETDATE, ISDATE, SYSDATETIME | These functions are not supported. |
| Table References | db_name.schema_name.table_name | This format is not supported; use aliases instead. For single table queries, dataset_name.dataset_name or simply dataset_name is supported. |
Subquery Patterns
AGENT SQL supports several patterns of subquery usage, with specific considerations for Differential Privacy:
Subquery in the FROM Clause:
- These subqueries act as temporary tables that the outer query can operate on.
- Example:
WITH DP (eps = 1, delta = 0.0001, mechanism = 'Laplace')
SELECT SUM(sum_n2)
FROM (SELECT SUM(num1) AS sum_n2 FROM public.my_table GROUP BY num2)
Subquery as a Scalar Value:
- These subqueries return a single value used in the outer query, often in a
SELECTlist or aWHEREclause. TheWITH DPclause can be applied to specify privacy parameters for each subquery. - Example:
WITH DP (eps = 1, delta = 0.0001, mechanism = 'Laplace')
SELECT COUNT(num1),
(
WITH DP (eps = 1, delta = 0.0001, mechanism = 'Laplace')
SELECT COUNT(num1) FROM public.my_table2
) AS b
FROM public.my_table2
WHERE num1 < b
Subquery in the WHERE Clause:
- These subqueries are used to filter results based on a condition that involves another query. The
WITH DPclause can be used to manage privacy parameters. - Example:
SELECT employee_id, employee_name
FROM employees
WHERE department_id IN (
WITH DP (eps = 1, delta = 0.0001, mechanism = 'Laplace')
SELECT department_id
FROM departments
WHERE location = 'New York'
)
Subquery in the HAVING Clause:
- These subqueries are used to filter groups of rows based on aggregate conditions. The
WITH DPclause can be applied to ensure privacy. - Example:
SELECT department_id, COUNT(*)
FROM employees
GROUP BY department_id
HAVING COUNT(*) > (
WITH DP (eps = 1, delta = 0.0001, mechanism = 'Laplace')
SELECT AVG(dept_count)
FROM (
SELECT COUNT(*) AS dept_count
FROM employees
GROUP BY department_id
) AS dept_counts
)
When using subqueries and multiple queries, the WITH DP clause allows specifying Epsilon and Delta for each step. Advanced composition techniques are used to deduct from the privacy budget, ensuring compliance with privacy standards.
Basic aggregate functions such as COUNT, SUM, AVG, STDDEV, and VAR are supported. However, more complex functions like MEDIAN and MODE are not available.
A variety of conditional statements, including comparisons, range checks, and null checks are supported.
A wide range of arithmetic operations and mathematical functions are supported, including date and time functions like CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP, EXTRACT, and DAYNAME.
Various join operations are supported, including FULL, OUTER, LEFT, and RIGHT joins, but only within the same dataset.
Operations like MERGE, UPDATE, INSERT, and RENAME are not supported. AGENT SQL only supports querying capabilities.
For single table queries, dataset_name.dataset_name or simply dataset_name is supported. The db_name.schema_name.table_name format is not supported. Use aliases to reference tables in complex queries.