General Methods

This page showcases some of the most commonly used Panda methods available in op_pandas and their parameters.

`concat`

The concat() function is used to concatenate Panda's objects, such as PrivateSeries and PrivateDataFrames, along a specified axis. This function also supports creating a hierarchical index on the concatenation axis if needed, and handles the set logic of the indexes on the non-concatenation axes through optional union or intersection.

def concat(
    objs,
    *,
    axis=0,
    join="outer",
    ignore_index=False,
    keys=None,
    levels=None,
    names=None,
    verify_integrity=False,
    sort=False,
    copy=None,
)->PrivateData:

Parameters:

objs : array of PrivateSeries | PrivateDataFrame

An array that includes PrivateDataFrames or PrivateSeries for concatenation. If any element within the array is None, it will be silently dropped unless all elements are None, in which case a ValueError will be raised.

axis : {0}, default 0

Specifies the axis along which to concatenate the objects. Currently, only concatenation along axis=0 is allowed.

join : {'inner', 'outer'}, default 'outer'

Dictates how to handle the indexes on the axes other than the concatenation axis.

'outer': Uses the union of indexes. |
'inner': Uses the intersection of indexes. |

ignore_index : bool, default False

If set to True, the index values along the concatenation axis will be ignored. The resulting axis will be labeled from 0 to n - 1. This is particularly useful when the original index does not carry meaningful information for the concatenated result.

keys : sequence, default None

Used to create a hierarchical index on the concatenation axis, with the elements of the sequence forming the outermost level.

levels : list of sequences, default None

Specifies the levels to use for constructing a MultiIndex, if not inferred from the keys.

names : list, default None

Provides names for the levels in the resulting hierarchical index.

verify_integrity : bool, default False

Verification of integrity during concatenation is not supported in this function.

sort : bool, default False

Determines whether to sort the non-concatenation axis if it is not already aligned.

copy : True

The copy parameter is not supported in this version of the function.

Usage:

combined_df = op_pandas.concat([df1, df2], ignore_index=True, join='inner')

Note

The datatypes along a single column must be the same, or the concatenation won't happen.

`merge`

The merge() function facilitates the merging of PrivateDataFrame or named PrivateSeries objects, mimicking database-style joins. This function allows for various types of joins, handling indexes and columns differently based on the type of merge specified.

def merge(
    left,
    right,
    how="inner",
    on=None,
    *args, **kwargs
)-> PrivateData:

Parameters:

left : PrivateDataFrame or named PrivateSeries

The left object in the merge. A named PrivateSeries is treated as a PrivateDataFrame with a single column.

right : PrivateDataFrame or named PrivateSeries

The right object in the merge. Similarly, a named PrivateSeries is treated as a PrivateDataFrame with a single column.

how : {'left', 'right', 'outer', 'inner', 'cross'}, default 'inner'

Specifies the type of merge to perform:

'left': Perform a left outer join, using only keys from the left frame. The order of keys is preserved. |
'right': Perform a right outer join, using only keys from the right frame. The order of keys is preserved. |
'outer': Perform a full outer join, using the union of keys from both frames. Keys are sorted lexicographically. |
'inner': Perform an inner join, using the intersection of keys from both frames. The order of the left keys is preserved. |
'cross': Create a Cartesian product of both frames, preserving the order of the left keys. Note: No columns to merge on can be specified in a cross join. |

Usage:

When columns are specified for a join, index information of the PrivateDataFrames is ignored. However, when joining on indexes, whether with each other or with columns, index information is preserved, which is crucial for alignments where index continuity is necessary.

result = op_pandas.merge(left_df, right_df, how='inner', on='key_column')

`to_datetime`

The to_datetime() function converts an input scalar, array-like, PrivateSeries, or PrivateDataFrame into a Panda's datetime object, handling a wide range of datetime formats and providing various options for customization and error handling.

def to_datetime(
    arg,
    errors="ignore",
    dayfirst=False,
    yearfirst=False,
    utc=False,
    format=None,
    exact=_NoDefault.no_default,
    unit=None,
    infer_datetime_format=_NoDefault.no_default,
    origin="unix",
    cache=True,
)-> PrivateData:

Parameters:

arg : PrivateSeries

The data to convert to datetime format. For DataFrames, it should contain the columns "year", "month", and "day", with years in a four-digit format.

errors : str, default 'ignore'

'ignore': If parsing fails, return the original input.
'raise': Raise an error if parsing fails.
'coerce': Set unparsable entries to NaT (Not a Time).

dayfirst : bool, default False

Influences parsing order if arg is string-like. If True, interprets the first number in a date string as the day (e.g., 10/11/12 becomes 2012-11-10).

yearfirst : bool, default False

Influences parsing order if arg is string-like. If True, interprets the first number in a date string as the year (e.g., 10/11/12 becomes 2010-11-12).

Note

If both dayfirst and yearfirst are True, yearfirst takes precedence, similar to the behavior in dateutil.

utc : bool, default False

If True, returns a UTC-localized Timestamp, Series, or DatetimeIndex.
If False, returns data without timezone conversion, maintaining original time offsets where present.

format : str, default None

The format string to use for parsing dates, like %d/%m/%Y. Special options include:

'ISO8601': Parse any ISO8601 formatted string.
'mixed': Infer the format for each element, use cautiously as recommended by Antigranular.

exact : bool, default True

If True, the format string must be precisely matched.
If False, allows the format to match anywhere in the target string.
Note: Incompatible with format='ISO8601' or format='mixed'.

unit : str, default 'ns'

Defines the unit for numeric input based on the origin. Common units include 'D' (days), 's' (seconds), 'ms' (milliseconds), etc.

infer_datetime_format : bool, default False

When True and no format is specified, attempts to infer the datetime format, potentially speeding up parsing significantly.

origin : scalar, default 'unix'

Defines the reference date for numeric inputs. Possible values:
- 'unix': Start from 1970-01-01.
- 'Julian': Start from Julian Calendar day zero.
- Timestamp convertible values or numeric offsets relative to 1970-01-01.

cache : bool, default True

Utilizes a cache for converted dates to enhance parsing speed for repeated date strings, especially those with timezone offsets. Not effective for out-of-bounds values.

Example Usage:

datetime_data = op_pandas.to_datetime(series_data, errors='coerce', dayfirst=True, format='%d/%m/%Y')

Note

If both day first and year first are True, year first is preceded (same as dateutil).
Cannot be used alongside format='ISO8601' or format='mixed'.

`train_test_split`

The train_test_split() method is used to split the PrivateDataFrame or PrivateSeries into a training set and a testing set, which is essential for training models in a manner that can evaluate their performance effectively.

def train_test_split(
    df,
    test_size=0.25,
    random_state=None,
    stratify=None
)-> Tuple[PrivateData , PrivateData]:

Parameters:

df : list | PrivateDataFrame | PrivateSeries

Accepts either a single PrivateDataFrame, a PrivateSeries, or a list of these. The list does not need to contain elements of the same size; however, if they are of the same size, they will be split in the same way in terms of indices.

test_size : float, default 0.25

This specifies the proportion of the dataset to include in the test split. It must be between 0 and 1.

random_state : int | None, default None

Provides a seed value to ensure reproducibility of the split.

stratify : None

Currently, stratification is not supported, meaning the data will be split without considering the distribution of outcomes across the training and testing sets.

Example Usage:

train_data, test_data = op_pandas.train_test_split(df, test_size=0.3, random_state=42)

`standard_scaler`

This function standardizes features by removing the mean and scaling to unit variance, applying differential privacy techniques to ensure the data privacy is maintained.

def standard_scaler(
    data,
    eps
)-> PrivateData:

Parameters:

data : PrivateDataFrame | PrivateSeries

This is the input data, which should be either a PrivateDataFrame or a PrivateSeries. It contains the features that need to be standardized.

eps : float

Represents the epsilon budget for differential privacy. A smaller epsilon value means stronger privacy guarantees but potentially less accuracy in the scaled data.

Returns:

The function does not explicitly return a type in the signature provided, but it likely returns a PrivateDataFrame or PrivateSeries with the standardized features.

Usage:

scaled_data = op_pandas.standard_scaler(data, eps=0.1)

`label_encoder`

This function performs label encoding on one or more categorical columns of a DataFrame or a Series. It returns a tuple containing the transformed data and a dictionary mapping the original categories to their encoded labels.

def label_encoder(
    df,
    cols = None
) -> Tuple[ PrivateData , dict]:

Parameters:

df : PrivateDataFrame | PrivateSeries

This is the input data which should be of type PrivateDataFrame or PrivateSeries, containing categorical data that needs to be encoded.

cols : List | str | None

Specifies the columns to be label encoded. You can provide a single column name as a string, a list of column names, or None. If None is provided and the input is a DataFrame, No columns are considered for encoding. This parameter is ignored if the input is a PrivateSeries.

Returns:

Tuple[PrivateData, dict]: A tuple where the first element is the label-encoded data (as a PrivateDataFrame or PrivateSeries) and the second element is a dictionary that maps the original categorical values to their respective integer labels.

Usage:

encoded_data, mapping = op_pandas.label_encoder(df, cols=['category_column'])

General Methods​

concat​

Parameters:​

Usage:​

merge​

Parameters:​

Usage:​

to_datetime​

Parameters:​

Example Usage:​

train_test_split​

Parameters:​

Example Usage:​

standard_scaler​

Parameters:​

Returns:​

Usage:​

label_encoder​

Parameters:​

Returns:​

Usage:​

General Methods

`concat`

Parameters:

Usage:

`merge`

Parameters:

Usage:

`to_datetime`

Parameters:

Example Usage:

`train_test_split`

Parameters:

Example Usage:

`standard_scaler`

Parameters:

Returns:

Usage:

`label_encoder`

Parameters:

Returns:

Usage: