Opacus

Overview

Opacus is a library developed by Meta allowing for differentially private training of PyTorch models.

We have developed a custom library, op_opacus, designed to integrate opacus with the private data objects available in op_pandas. Users familiar with PyTorch and Opacus will find the APIs of our library intuitive and easy to use. Below you can find the details of the methods and classes supported by op_opacus.

PrivateDPDataLoader

This is the counterpart of DPDataLoader within Opacus. It allows for the creation of DataLoaders from PrivateDataFrames for training purposes.

During the training step, Poisson sampling is utilised to select samples from the DataLoader.

op_opacus.PrivateDPDataLoader.from_private_dataframe(
            dataset: Union[
                PrivateDataFrame,
                PrivateSeries,
                List[Union[PrivateDataFrame, PrivateSeries]],
            ],
            dtypes: Any | List[Any] = None,
            batch_size: int = 1,
            num_workers=0,
            pin_memory=False,
            drop_last=False,
            timeout=0,
            multiprocessing_context=None,
            generator=None,
            prefetch_factor=2,
            persistent_workers=False,
            pin_memory_device="",
            execution_engine = None,
    ) -> PrivateDPDataLoader:

The available parameters of from_private_dataframe include:

Parameter	Description
`dataset`	The dataset or list of datasets to be used for creating the DataLoader.
`dtypes`	The DataType of each PrivateDataFrame, which are necessary for converting the respective datasets into tensors.
`execution_engine`	The name of the execution engine where the data is to be moved. For example, "cuda" or "cpu". If left empty, .to() method won't be called on the tensors.

Tip

For details on the remaining arguments, please refer to torch.utils.data.DataLoader.

PrivacyEngine

The main entry point to the op_opacus API is through the PrivacyEngine, which enables differential privacy during model training.

class op_opacus.PrivacyEngine(accountant: str = "prv")

The available parameters include:

Parameter	Description
`accountant`	Accounting mechanism. Currently supported: “rdp”, “prv” and “gdp”.

PrivacyEngine.make_private_with_epsilon(
        module: nn.Module,
        optimizer: optim.Optimizer,
        data_loader: PrivateDPDataLoader,
        target_epsilon: float,
        target_delta: float,
        epochs: int,
        max_grad_norm: Union[float, List[float]],
        batch_first: bool = True,
        loss_reduction: str = "mean",
        poisson_sampling: bool = True,
        clipping: str = "flat",
        noise_generator=None,
        grad_sample_mode: str = "hooks",
    ):

This API attaches the module, optimiser and dataloader to the PrivacyEngine, thereby making them Differentially Private.

It computes the privacy parameters according to a specified Privacy Budget. For additional information, refer to opacus.PrivacyEngine.make_private_with_epsilon.

PrivateLoss

Within op_opacus, it is crucial to privatise loss objects. This allows for the calculation of average loss per epoch during model training

op_opacus.make_loss_private(LossClass):

The available parameters include:

Parameter	Description
`LossClass`	Class definition of pytorch losses.

Example:

%%ag
from torch import nn

PrivateCrossEntropyLoss = make_loss_private(nn.CrossEntropyLoss)
loss_function = PrivateCrossEntropyLoss()   # so that per epoch average loss can be shown

TrainModel

A helper class facilitates the training of a PyTorch model in a differentially private manner:

class op_opacus.TrainModel(privacy_engine: PrivacyEngine, loss_function)

The available parameters include:

Parameter	Description
`privacy_engine`	Instance of PrivacyEngine class of op_opacus, which encompasses the DP module, optimiser, and DataLoader.
`loss_function`	The loss function utilised for model training.

TrainModel.train(train_callable: Callable, verbose: int = 1, include_nan_in_loss: bool = False)

This API is used to train the model through:

train_callable

This is the function which contain the training logic. It can be called using the following arguments: train_callable(model, optimizer, data_loader_batch, loss_function)

verbose

Sets the verbosity level (0, 1, or 2)

If verbose is 0, nothing will be printed while the model is being trained.
If verbose is 1, the epoch number along with the Privacy Budget spent till now will be shown.
If verbose is 2, epoch number, privacy budget spent till now, the average loss for this epoch (only if the loss_function is made private) and time taken for this epoch will be shown.

include_nan_in_loss

boolean indicating whether NaNs should be included in the average loss computation per epoch, applicable when verbose is set to 2.

ApplyModel

A helper function for obtaining model predictions within the torch.no_grad context.

class op_opacus.ApplyModel(model: nn.Module, privacy_engine: PrivacyEngine)

The user has the option to send a PyTorch model or instance of op_opacus PrivacyEngine class. This will be used to get the predictions.

ApplyModel.apply_model_private(private_data: PrivateDataFrame | PrivateSeries,
	                            dtype=None,
	                            output_col_names: list=None,
	                            eps: float = 0.0,
	                            output_bounds: dict = None) -> PrivateDataFrame:

The available parameters include:

Parameter	Description
`private_data`	Private data to be sent as input to get the predictions of the model.
`dtype`	Data type of the private data used to create tensors.
`output_col_names`	This is used to name the columns of output PrivateDataFrame. By default, the columns are be named `col_{i}` , where `{i}` is the `ith` column.
`eps`	This is the Privacy Budget used to calculate the bounds of the output PrivateDataFrame.
`output_bounds`	A dictionary of the type: `{’column_name’: (min_bound, max_bound)}`, containing metadata of the columns for which bounds are already known. No epsilon is spent to calculate the bounds for these columns.

ApplyModel.apply_model_public(data: DataFrame,
		                          dtype=None) -> DataFrame:

The available parameters include:

Parameter	Description
`data`	Public dataframe
`dtype`	Data type of the public data used to create tensors.

Example:

%%ag
test_model = ApplyModel(privacy_engine=privacy_engine)
out = test_model.apply_model_private(test_x, dtype=torch.float, output_col_names=["Iris-setosa", "Iris-versicolor", "Iris-virginica"],eps=1)

Overview​

PrivateDPDataLoader​

PrivacyEngine​

PrivateLoss​

TrainModel​

ApplyModel​

Overview

PrivateDPDataLoader

PrivacyEngine

PrivateLoss

TrainModel

ApplyModel