PrivateSeries
PrivateSeries
The PrivateSeries API is based on pandas.Series, but in this case, all the methods are differentially private. PrivateSeries is available as part of op_pandas library in Antigranular.
Constructor
The PrivateSeries constructor is as follows:
class op_pandas.PrivateSeries(series = None, metadata = None, categorical_metadata = None)
The PrivateSeries parameters are described below:
series: pandas.SeriesA Pandas Series, with data consisting of only strings, integers, floats, and booleans.
metadata: Tuple(float,float)Metadata containing the bounds of the given Series. The metadata should be in the following form: (bound_low, bound_hi).
If the Series contains string data, the metadata should not be provided.
categorical_metadata: ListMetadata containing information about the categorical data of the given Series. The categorical_metadata should be a list containing all the categories in the Series. The data types for all the elements in the list must be the same.
The code blocks below present two distinct examples of PrivateSeries:
Series : [10, 20, 30, 40, 10, 42, 54]
metadata : (0, 60)
categorical_metadata : None
Series : ["a", "b", "a", "b", "a", "a"]
metadata : None
categorical_metadata: ["a", "b"]
General Functions
PrivateSeries provides several internal functions you can use when working with series. The PrivateSeries general functions include:
categorical_metadata
categorical_metadata
This method returns the categorical_metadata of the PrivateSeries
PrivateSeries.categorical_metadata -> List
copy
copy
The copy() method returns a copy of the PrivateSeries.
PrivateSeries.copy() -> PrivateSeries
describe
describe
The describe() method returns a statistical description of the data in the DataFrame.
PrivateSeries.describe(eps, percentiles = None, include = None, exclude = None)
The available parameters of describe() are the following:
eps: floatThe epsilon provided to the differentially private calculation. The eps value must be >=0.
percentiles: list-like of numbers, optionalThe percentiles to include in the output. All should fall between 0 and 1. The default is [.25, .5, .75], which returns the 25th, 50th, and 75th percentiles.
include: ‘all’, list-like of dtypes or None (default), optionalThis option is ignored for Series.
exclude: list-like of dtypes or None (default), optionalA blocked list of data types to omit from the result. The available options are as follows:
- A list-like of dtypes : Excludes the provided data types from the result.
- To exclude numeric types submit
numpy.number. - To exclude object columns submit the data type
numpy.object. - Strings can also be used in the style of
select_dtypes(e.g.df.describe(exclude=['O'])). - To exclude pandas’ categorical columns, use
category.
- To exclude numeric types submit
- None (default): The result will exclude nothing.
dropna
dropna
The dropna() method missing values within a PrivateSeries..
PrivateSeries.dropna(axis=0)
The available parameters of dropna() are the following:
axis: boolean {index (0), columns (1)}, default = 0Not applied in Series.
dtypes
dtypes
The dtypes property returns the data type information of the PrivateSeries.
PrivateSeries.dtypes
isnull
isnull
The isnull() method detects missing values for an array-like object.
PrivateSeries.isnull() -> PrivateSeries:
isna
isna
The isna() method detects missing values for an array-like object.
PrivateSeries.isna() -> PrivateSeries:
isin
isin
The isin() checks if each element in the DataFrame is contained in values.
PrivateSeries.isin(values):
The available parameters of isin() are the following:
values: PrivateDataFrameThe PrivateDataFrame against which each element in the Series is checked for containment.
make_categorical
make_categorical
This method makes the series categorical.
PrivateSeries.make_categorical(categories, inplace=False):
The available parameters of make_categorical() are the following:
categories: ListThe categories to be used in the categorical metadata.
inplace: bool, default = FalseIf True, the operation will modify the data in place.
make_series_non_categorical
make_series_non_categorical
This method makes the series noncategorical.
PrivateSeries.make_series_non_categorical(output_bounds: tuple = None, eps: float = 0.0)
The available parameters of make_series_non_categorical() are the following:
output_bounds: tupleWhen a series contains numerical values but is categorical, this parameter provides output bounds for it. In cases where output bounds for a numerical series aren’t provided, epsilon will be spent to estimate the bounds.
eps: floatThe Epsilon to estimate the output bounds of a numerical column.
map
map
This method maps values of a PrivateSeries according to an input mapping or function.
PrivateSeries.map(arg, eps = 0, output_bounds = None, output_categories = None)
The available parameters of map() are the following:
arg: callable, mapping, pd.Series or PrivateSeriesIf a mapping (dictionary) and the series have categorical data, all the categories in the metadata must have a mapping.
eps: float, default = 0The epsilon provided to the differentially private calculation. The eps value must be >=0. It’s used to calculate the bonds.
output_bounds:Tuple[float, float]Inform the output bounds. If not informed, Epsilon will be spend to get estimated bounds of the applied function.
output_categories: ListInform the output categories if the current series is categorical. If not present, it will be calculated using arg.
If the input is a callable, it should return a single value when applied to each element. The output of the callable should be string, int, float, boolean, or datetime.
It's important to note that if the callable is a function, it will execute within an isolated environment with mypy strict mode enabled. The function must adhere to the following constraints:
- The function can only accept one argument, which would be the individual element the function is being applied on.
- Proper type annotations should be present within the function definition. To utilize datetime and regex, import
datetimeandreto enable their type annotations. For additional examples, access the Pandas quickstart guide.
metadata
metadata
The metadata method returns the metadata/bounds of a numerical series.
PrivateSeries.metadata -> tuple
The code block below presents an example of how to use metadata :
>> train_x.metadata
(0, 60)
notnull
notnull
The notnull() method detects non-missing values for an array-like object.
PrivateSeries.notnull() -> PrivateSeries:
notna
notna
The notna() method detect existing (non-missing) values.
PrivateSeries.notna() -> PrivateSeries:
one_hot_encoding
one_hot_encoding
This method performs one-hot encoding on the PrivateSeries.
PrivateSeries.one_hot_encoding(prefix=None, prefix_sep="_") -> PrivateDataFrame:
The available parameters of one_hot_encoding() are the following:
prefix: str, default NonePrefix to use for the column names.
prefix_sep: str, default '_'Separator to use between the prefix and the column name.
rename
rename
This method renames the column name of the PrivateSeries.
PrivateSeries.rename(name:str) -> PrivateSeries
size
size
The size method returns the differentially private number of elements in the PrivateSeries.
PrivateSeries.size(eps: float = 0) -> int:
The available parameters of size() are the following:
eps: floatThe epsilon provided to the differentially private calculation. The eps value must be >=0.
sample_with_sensitivity
sample_with_sensitivity
The sample_with_sensitivity() method returns a random sample of items from the PrivateSeries,
so that the sensitivity (how many times a user can be present in the dataset) is capped.
PrivateSeries.sample_with_sensitivity(max_sensitivity) -> PrivateSeries:
The available parameters of sample_with_sensitivity() are the following:
The maximum number of times a user can be present in the dataset.
unique
unique
The unique() method returns the unique values in the PrivateSeries.
PrivateDataFrame.unique() -> PrivateSeries:
where
where
The where() method replaces the values of the rows where the condition evaluates to False.
PrivateSeries.where(cond, other = None,inplace = False, axis = None, level = None)
The available parameters of where() are the following:
cond: bool PrivateSeries/PrivateDataFrame,Series/DataFrame array-likeDefines the condition, which should return True or False.
- If True, keep the original value.
- If False, replace it with the corresponding value from the other.
other: NoneCurrently, other tweaking isn’t supported.
inplace: bool, default FalseIndicates whether the operation should modify the data in place.
axis: int, default NoneThis parameter isn’t used for Series. Defaults to 0.
level: int, default NoneAlignment level if needed.
The method returns a PrivateSeries with the result, or None if the inplace parameter is set to True.
Basic statistical methods
count
count
The count() method returns the number of unempty values on the Series.
PrivateSeries.count(eps = 0)
The available parameters of count() are the following:
eps : float, default = 0The epsilon provided to the differentially private calculation. The eps value must be >=0.
mean
mean
The mean() method returns the mean value of the Series.
PrivateSeries.mean(eps = 0)
The available parameters of mean() are the following:
eps : float, default = 0The epsilon provided to the differentially private calculation. The eps value must be >=0.
median
median
The median() method return the the median value of the values of the Series.
PrivateSeries.median(eps = 0)
The available parameters of median() are the following:
eps : float, default = 0The epsilon provided to the differentially private calculation. The eps value must be >=0.
percentile
percentile
This method is a differentially private implementation of the percentile method.
PrivateSeries.percentile(p, eps)
The available parameters of percentile() are the following:
p: floatThe percentile to compute. You must provide a value between 0 and 100.
eps : float, default = 0The epsilon provided to the differentially private calculation. The eps value must be >=0.
quantile
quantile
This method is a differentially private implementation of the quantile method.
PrivateSeries.quantile(q, eps)
The available parameters of **quantile**() are the following:
q: floatInform a value between 0 and 1, which is the quantile to compute.
eps : float, default = 0The epsilon provided to the differentially private calculation. The eps value must be >=0.
standard deviation std
standard deviation std
The std() method returns the standard deviation of the sample data.
PrivateSeries.std(eps = 0, ddof = 1)
The available parameters of std() are the following:
eps : float, default = 0The epsilon provided to the differentially private calculation. The eps value must be >=0.
ddof: int, default 1Delta Degrees of Freedom. The divisor used in calculations is , where N represents the number of elements. Currently, the ddof tweaking is not supported.
sum
sum
The sum() method adds all values in the Series.
PrivateSeries.sum(eps = 0)
The available parameters of sum() are the following:
eps : float, default = 0The epsilon provided to the differentially private calculation. The eps value must be >=0.
variance
variance
The variance() method calculates the variance from the Series.
PrivateSeries.var(eps = 0, ddof = 1)
The available parameters of variance() are the following:
eps : float, default = 0The epsilon provided to the differentially private calculation. The eps value must be >=0.
ddof: int, default 1Delta Degrees of Freedom. The divisor used in calculations is , where N represents the number of elements. Currently, the ddof tweaking is not supported.
Advanced statistical methods
The PrivateSeries basic statical methods include:
covariance cov
covariance cov
The cov() method finds the covariance of two PrivateSeries.
PrivateSeries.cov(other, eps: float, min_periods, ddof = 1)
The available parameters of cov() are the following:
other: PrivateSeriesThe second PrivateSeries.
eps : float, default = 0The epsilon provided to the differentially private calculation. The eps value must be >=0.
min_periods: int, optionalBy default, 1 is used. Currently, min_periods tweaking is not supported.
ddof: int, default 1Delta Degrees of Freedom. The divisor used in calculations is , where N represents the number of elements. Currently, the ddof tweaking is not supported.
skew
skew
The skew() method calculates the skew for the PrivateSeries.
PrivateSeries.skew(eps, axis = 0, skipna = True, numeric_only = True)
The available parameters of skew() are the following:
eps : float, default = 0The epsilon provided to the differentially private calculation. The eps value must be >=0.
axis: boolean {index (0), columns (1)}, default = 0Axis for the function to be applied on.
skipna: bool, default TrueExclude NA/Null values when computing the result.
numeric_only: bool, default NoneInclude only float, int, and boolean columns. If axis = 0, numeric_only is always assumed to be True. Otherwise, you must specify a value.
Histograms
hist
hist
This method draws a a histogram of the PrivateSeries.
PrivateSeries.hist(eps, bins = 10)
The available parameters of hist() are the following:
eps: floatInform the epsilon provided to the differentially private calculation. The eps value must be >=0.
bins: int, default 10Number of histogram bins to be used.
hist2d
hist2d
This method creates a 2d histograma of two PrivateSeries.
PrivateSeries.hist2d(other, eps, bins = 10)
The available parameters of hist2d() are the following:
other: PrivateSeriesThe second PrivateSeries.
eps: floatInform the epsilon provided to the differentially private calculation. The eps value must be >=0.
bins: int, default 10Number of histogram bins to be used.
The PrivateSeries API is based on pandas.Series, but in this case, all the methods are differentially private. PrivateSeries is available as part of op_pandas library in Antigranular.