Skip to main content

Connecting to AGENT SQL

Overview

AGENT SQL allows you to run SQL queries directly on the data stored in the AGENT platform. This guide helps you get started with AGENT SQL. The dataset upload and budget allocation process are the same as described in the Creating Datasets guide.

Requirements

You will need:

  • Your AGENT API key, available in the AGENT UI within your profile.
  • The Console URL of your AGENT server, available in the AGENT UI under the Configuration tab.
  • The AGENT SQL Client Driver, a JDBC driver for SQL client applications like DBeaver, SQL Workbench, etc. This is available to download in the AGENT UI under the Configuration tab.
  • Some additional requirements also exist for specific SQL clients. For example, to connect to Tableau, you need a Tableau connector file.

Connection string

The JDBC connection string follows the format:

jdbc:avatica:remote:url=<console_url>

Replace <console_url> with the console URL as described above, for example:

jdbc:avatica:remote:url=https://agent.yourcompany.com

If your client application or JDBC library is unable to pass connection parameters separately, the API key can be added to the URL as a query parameter, giving a URL of the form:

jdbc:avatica:remote:url=<console_url>?apiKey=<Your_API_Key>

replacing <Your_API_Key> with your API key. Please note the capitalisation of apiKey.

Warning!

Adding the API key to the URL is not recommended when a connection parameter is available.

If apiKey is present as both a connection parameter and in the URL, the query parameter in the URL takes precedence.

Sample queries

Below are sample queries demonstrating the use of Differential Privacy parameters (epsilon and delta) and their shorthand versions (eps and del):

Snowflake

SELECT COUNT(t1."CARD_ID")
FROM "bank_db_snowflake_2"."TRANSACTIONS" AS t1
JOIN "bank_db_snowflake_2"."CREDIT_CARDS" AS t2
ON t1."CARD_ID" = t2."CARD_ID"

PostgreSQL

WITH DP(epsilon=0.001, delta=0.0005)
SELECT
s."age",
AVG(sc."score") AS avg_score
FROM "school"."students" AS s
JOIN "school"."scores" AS sc
ON sc."student_id" = s."id"
GROUP BY s."age"
ORDER BY s."age";

MySQL

WITH DP(epsilon=0.01, delta=0.001)
SELECT
`e`.`department`,
COUNT(*) AS `employee_count`
FROM `employees` AS `e`
JOIN `departments` AS `d`
ON `e`.`department_id` = `d`.`id`
GROUP BY `e`.`department`;

These examples illustrate how to use the WITH DP(...) clause for differential privacy in different SQL environments. Adjust the table and column names as needed for your specific database schema.

WITH DP(epsilon=1.0, delta=0.0001), SELECT COUNT(*) FROM paqu
SELECT COUNT(*) FROM paqu
-- Uses epsilon = 1e-3 and delta = 1e-5 by default.
WITH dp(eps=0.1), SELECT income_category, COUNT(*) AS total
FROM adult
GROUP BY income_category
-- Delta is not specified, so delta = 1e-5 by default.
WITH dp(eps = 2, epsilon=0.1, skip_cache=true), SELECT sex, AVG(age) AS "Average Age"
FROM adult
GROUP BY sex
-- When both epsilon and eps are provided, epsilon is used (0.1 in this example).

Connection parameters

JDBC connection parameters allow you to customize the connection to the AGENT platform. Most SQL client applications allow these to be specified. The following parameters are supported:

Parameterdescription
apiKey
(case-sensitive)
The API key to authenticate the connection. It is also possible to provide this in the connection string (see above). If both are provided, the query parameter in the connection string takes precedence.
epsilonThe epsilon value for the session. This value serves as the default epsilon for all queries executed on this connection. The value provided in an SQL query takes precedence.
deltaThe delta value for the connection. This value serves as the default delta for all queries executed on this connection. The value provided in an SQL query takes precedence.
team_name
(Under Development)
The team name for budget deduction. This value applies to all queries executed.
CACHE_TIMEOUTThe number of seconds to cache the results of a query. Doing this saves privacy budget if the same query is executed multiple times. The cached output (with the same noise) is returned if the results are found in the cache. The default value for this variable is set by your AGENT administrator.
SKIP_CACHEIf set to true, the cache is not used for this session. This is useful when you want to ensure that the results are always fresh and not cached. By default, the cache is not skipped.
Note

Specify these parameters in the connection properties of the SQL client application you are using. To override these values for a single query (except apiKey which cannot be overridden), use the WITH dp(...) clause in the query itself and enter a valid value for these parameters. They will be overridden for just the current query. Unless stated otherwise, the parameter key and values are not case-sensitive.

DP parameters details

1
Parameter names

epsilon and eps (with epsilon taking precedence if both are specified); delta and del (with delta taking precedence). Apart from this, CACHE_TIMEOUT, SKIP_CACHE are also supported.

2
Case insensitivity

The WITH dp(...) clause is case-insensitive.

3
Defaults

epsilon defaults to 1e-3 and delta defaults to 1e-5 if not specified. CACHE_TIMEOUT defaults to 3000 seconds. However, this can be modified by the admin.

4
Syntax

The rest of the query must follow the proper syntax for the underlying data source. The WITH dp(...) clause must be the first clause in the query and is optional if you want to use defaults.

Note

Ensure you have access to the relevant dataset and sufficient epsilon and delta within your privacy budget.

Note

The WITH dp(...) should be present before the query that is used to release the aggregates specified in the main query. Apart from this, it can also be specified inside the subquery which returns a single value. However, in this case you can only specify epsilon, delta in the subquery.

Tip

AGENT SQL supports limited set of SQL queries. More details here.

Frequently Asked Questions (FAQ)

I am seeing more Epsilon deducted from the budget than I specified in the query. Is this normal?

Yes, this is normal. The budget that you specify is just for one query. If your query needs computation of multiple aggregates, then the Epsilon and Delta for each of those aggregates are added to the total spent.

I spent some Epsilon and Delta in a query but the deduction from the total budget is less than what I specified. How is this possible?

We are using advanced composition to calculate the total budget spent. You can read more about it here.

I am 100% sure that my query should return 5 rows but it sometimes returns fewer than 5 rows. Also sometimes, I am getting an empty result set when I select just one aggregate

To protect the sensitive columns, we perform thresholding and drop rows with insignificant values. This is to ensure that addition or removal of a single row in the results doesn't reveal any information about the change in the dataset. You can read this article to know more.

Why am I getting the same results when I am executing the same query again. Shouldn't differentially private results be different every time?

AGENT SQL caches the result set and returns the same result when same query is executed by the same user again with same parameters. This ensures that you don't spend your privacy budget on the same query again and again. The cache is valid for a certain period of time, which can be configured by the admin. You can also skip the cache by setting the SKIP_CACHE parameter to true in the connection properties or in the query itself.