Extended DataFrame and data-processing functions

Submodule pyiomica.extendedDataFrame

PyIOmica Dataframe extending Pandas DataFrame with new functions

Classes:

DataFrame([data, index, columns, dtype, copy])

Class based on pandas.DataFrame extending capabilities into the doamin of PyIOmica

Functions:

`mergeDataframes`(listOfDataframes[, axis])	Merge a list of Dataframes (outer join).
`getLombScarglePeriodogramOfDataframe`(df_data)	Calculate Lomb-Scargle periodogram of DataFrame.
`getRandomSpikesCutoffs`(df_data, p_cutoff[, ...])	Calculate spikes cuttoffs from a bootstrap of provided data, gived the significance cutoff p_cutoff.
`getRandomAutocorrelations`(df_data[, ...])	Generate autocorrelation null-distribution from permutated data using Lomb-Scargle Autocorrelation.
`getRandomPeriodograms`(df_data[, ...])	Generate periodograms null-distribution from permutated data using Lomb-Scargle function.

class DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)[source]

Bases: DataFrame

Class based on pandas.DataFrame extending capabilities into the doamin of PyIOmica

Initialization parameters are identical to those in pandas.DataFrame See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html for detail.

Methods:

`__init__`([data, index, columns, dtype, copy])	Initialization method
`filterOutAllZeroSignals`([inplace])	Filter out all-zero signals from a DataFrame.
`filterOutFractionZeroSignals`(...[, inplace])	Filter out fraction-zero signals from a DataFrame.
`filterOutFractionMissingSignals`(...[, inplace])	Filter out fraction-zero signals from a DataFrame.
`filterOutReferencePointZeroSignals`([...])	Filter out out first time point zeros signals from a DataFrame.
`tagValueAsMissing`([value, inplace])	Tag zero values with NaN.
`tagMissingAsValue`([value, inplace])	Tag NaN with zero.
`tagLowValues`(cutoff, replacement[, inplace])	Tag low values with replacement value.
`removeConstantSignals`(theta_cutoff[, inplace])	Remove constant signals.
`boxCoxTransform`([axis, inplace])	Box-cox transform data.
`modifiedZScore`([axis, inplace])	Z-score (Median-based) transform data.
`normalizeSignalsToUnity`([referencePoint, ...])	Normalize signals to unity.
`quantileNormalize`([output_distribution, ...])	Quantile Normalize signals in a DataFrame.
`compareTimeSeriesToPoint`([point, inplace])	Subtract a particular point of each time series (row) of a Dataframe.
`compareTwoTimeSeries`(df[, function, ...])	Create a new Dataframe based on comparison of two existing Dataframes.
`imputeMissingWithMedian`([axis, inplace])	Normalize signals to unity.

__init__(data=None, index=None, columns=None, dtype=None, copy=False)[source]: Initialization method

filterOutAllZeroSignals(inplace=False)[source]

Filter out all-zero signals from a DataFrame.

Parameters:

inplace: boolean, Default False: Whether to modify data in place or return a new one

Returns:

Dataframe or None: Processed data

Usage:

df_data = df_data.filterOutAllZeroSignals()

or

df_data.filterOutAllZeroSignalse(inplace=True)

filterOutFractionZeroSignals(min_fraction_of_non_zeros, inplace=False)[source]

Filter out fraction-zero signals from a DataFrame.

Parameters:

min_fraction_of_non_zeros: float: Maximum fraction of allowed zeros
inplace: boolean, Default False: Whether to modify data in place or return a new one

Returns:

Dataframe or None: Processed data

Usage:

df_data = df_data.filterOutFractionZeroSignals(0.75)

or

df_data.filterOutFractionZeroSignals(0.75, inplace=True)

filterOutFractionMissingSignals(min_fraction_of_non_missing, inplace=False)[source]

Filter out fraction-zero signals from a DataFrame.

Parameters:

min_fraction_of_non_missing: float: Maximum fraction of allowed zeros
inplace: boolean, Default False: Whether to modify data in place or return a new one

Returns:

Dataframe or None: Processed data

Usage:

df_data = df_data.filterOutFractionMissingSignals(0.75)

or

df_data.filterOutFractionMissingSignals(0.75, inplace=True)

filterOutReferencePointZeroSignals(referencePoint=0, inplace=False)[source]

Filter out out first time point zeros signals from a DataFrame.

Parameters:

referencePoint: int, Default 0: Index of the reference point
inplace: boolean, Default False: Whether to modify data in place or return a new one

Returns:

Dataframe or None: Processed data

Usage:

df_data = df_data.filterOutFirstPointZeroSignals()

or

df_data.filterOutFirstPointZeroSignals(inplace=True)

tagValueAsMissing(value=0.0, inplace=False)[source]

Tag zero values with NaN.

Parameters:

inplace: boolean, Default False: Whether to modify data in place or return a new one

Returns:

Dataframe or None: Processed data

Usage:

df_data = df_data.tagValueAsMissing()

or

df_data.tagValueAsMissing(inplace=True)

tagMissingAsValue(value=0.0, inplace=False)[source]

Tag NaN with zero.

Parameters:

inplace: boolean, Default False: Whether to modify data in place or return a new one

Returns:

Dataframe or None: Processed data

Usage:

df_data = df_data.tagMissingAsValue()

or

df_data.tagMissingAsValue(inplace=True)

tagLowValues(cutoff, replacement, inplace=False)[source]

Tag low values with replacement value.

Parameters:

cutoff: float: Values below the “cutoff” are replaced with “replacement” value
replacement: float: Values below the “cutoff” are replaced with “replacement” value
inplace: boolean, Default False: Whether to modify data in place or return a new one

Returns:

Dataframe or None: Processed data

Usage:

df_data = df_data.tagLowValues(1., 1.)

or

df_data.tagLowValues(1., 1., inplace=True)

removeConstantSignals(theta_cutoff, inplace=False)[source]

Remove constant signals.

Parameters:

theta_cutoff: float: Parameter for filtering the signals
inplace: boolean, Default False: Whether to modify data in place or return a new one

Returns:

Dataframe or None: Processed data

Usage:

df_data = df_data.removeConstantSignals(0.3)

or

df_data.removeConstantSignals(0.3, inplace=True)

boxCoxTransform(axis=1, inplace=False)[source]

Box-cox transform data.

Parameters:

axis: int, Default 1: Direction of processing, columns (1) or rows (0)
inplace: boolean, Default False: Whether to modify data in place or return a new one

Returns:

Dataframe or None: Processed data

Usage:

df_data = df_data.boxCoxTransformDataframe()

or

df_data.boxCoxTransformDataframe(inplace=True)

modifiedZScore(axis=0, inplace=False)[source]

Z-score (Median-based) transform data.

Parameters:

axis: int, Default 1: Direction of processing, rows (1) or columns (0)
inplace: boolean, Default False: Whether to modify data in place or return a new one

Returns:

Dataframe or None: Processed data

Usage:

df_data = df_data.modifiedZScoreDataframe()

or

df_data.modifiedZScoreDataframe(inplace=True)

normalizeSignalsToUnity(referencePoint=0, inplace=False)[source]

Normalize signals to unity.

Parameters:

referencePoint: int, Default 0: Index of the reference point
inplace: boolean, Default False: Whether to modify data in place or return a new one

Returns:

Dataframe or None: Processed data

Usage:

df_data = df_data.normalizeSignalsToUnityDataframe()

or

df_data.normalizeSignalsToUnityDataframe(inplace=True)

quantileNormalize(output_distribution='original', averaging=<function mean>, ties='mean', inplace=False)[source]

Quantile Normalize signals in a DataFrame.

Note that it is possible there may be equal values within the dataset. In such a scenario, by default, the quantile normalization implementation considered here works by replacing the degenerate values with the mean over all the degenerate ranks. Note, that for the default option to work the data should not have any missing values. If output_distribution is set to ‘uniform’ or ‘normal’ then the scikit-learn’s Quantile Transformation is used.

Parameters:

output_distribution: str, Default ‘original’: Output distribution. Other options are ‘normal’ and ‘uniform’
averaging: function, Default np.mean: With what value to replace the same-rank elements across samples. Default is to take the mean of same-rank elements
ties: function or str, Default np.mean: Function or name of the function. How ties should be handled. Default is to replace ties with their mean. Other possible options are: ‘mean’, ‘median’, ‘prod’, ‘sum’, etc.
inplace: boolean, Default False: Whether to modify data in place or return a new one

Returns:

Dataframe or None: Processed data

Usage:

df_data = pd.DataFrame(index=[‘Gene 1’,’Gene 2’,’Gene 3’,’Gene 4’], columns=[‘Col 0’,’Col 1’,’Col 2’], data=np.array([[5, 4, 3], [2, 1, 4], [3, 4, 6], [4, 2, 8]]))

df_data = df_data.quantileNormalize()

or

df_data.df_data.quantileNormalize(inplace=True)

compareTimeSeriesToPoint(point='first', inplace=False)[source]

Subtract a particular point of each time series (row) of a Dataframe.

Parameters:

point: str, int or float: Possible options are ‘first’, ‘last’, 0, 1, … , 10, or a value.
inplace: boolean, Default False: Whether to modify data in place or return a new one

Returns:

Dataframe or None: Processed data

Usage:

df_data = df_data.compareTimeSeriesToPoint()

or

df_data.compareTimeSeriesToPoint(df_data)

compareTwoTimeSeries(df, function=<ufunc 'subtract'>, compareAllLevelsInIndex=True, mergeFunction=<function mean>)[source]

Create a new Dataframe based on comparison of two existing Dataframes.

Parameters:

df: pandas.DataFrame: Data to compare
function: function, Default np.subtract: Other options are np.add, np.divide, or another <ufunc>.
compareAllLevelsInIndex: boolean, Default True: Whether to compare all levels in index. If False only “source” and “id” will be compared
mergeFunction: function, Default np.mean: Input Dataframes are merged with this function, i.e. np.mean (default), np.median, np.max, or another <ufunc>.

Returns:

DataFrame or None: Processed data

Usage:

df_data = df_dataH2.compareTwoTimeSeries(df_dataH1, function=np.subtract, compareAllLevelsInIndex=False, mergeFunction=np.median)

imputeMissingWithMedian(axis=1, inplace=False)[source]

Normalize signals to unity.

Parameters:

axis: int, Default 1: Axis to apply trasnformation along
inplace: boolean, Default False: Whether to modify data in place or return a new one

Returns:

Dataframe or None: Processed data

Usage:

df_data = df_data.imputeMissingWithMedian()

or

df_data.imputeMissingWithMedian(inplace=True)

mergeDataframes(listOfDataframes, axis=0)[source]

Merge a list of Dataframes (outer join).

Parameters:

listOfDataframes: list: List of pandas.DataFrames
axis: int, Default 0: Merge direction. 0 to stack vertically, 1 to stack horizontally

Returns:

pandas.Dataframe: Processed data

Usage:

df_data = mergeDataframes([df_data1, df_data2])

getLombScarglePeriodogramOfDataframe(df_data, NumberOfCPUs=4, parallel=True)[source]

Calculate Lomb-Scargle periodogram of DataFrame.

Parameters:

df: pandas.DataFrame: Data to process
parallel: boolean, Default True: Whether to calculate in parallel mode (>1 process)
NumberOfCPUs: int, Default 4: Number of processes to create if parallel is True

Returns:

pandas.Dataframe: Lomb-Scargle periodograms

Usage:

df_periodograms = getLombScarglePeriodogramOfDataframe(df_data)

getRandomSpikesCutoffs(df_data, p_cutoff, NumberOfRandomSamples=1000)[source]

Calculate spikes cuttoffs from a bootstrap of provided data, gived the significance cutoff p_cutoff.

Parameters:

df_data: pandas.DataFrame: Data where rows are normalized signals
p_cutoff: float: p-Value cutoff, e.g. 0.01
NumberOfRandomSamples: int, Default 1000: Size of the bootstrap distribution

Returns:

dictionary: Dictionary of spike cutoffs.

Usage:

cutoffs = getSpikesCutoffs(df_data, 0.01)

getRandomAutocorrelations(df_data, NumberOfRandomSamples=100000, NumberOfCPUs=4, fraction=0.75, referencePoint=0)[source]

Generate autocorrelation null-distribution from permutated data using Lomb-Scargle Autocorrelation. NOTE: there should be already no missing or non-numeric points in the input Series or Dataframe

Parameters:

df_data: pandas.Series or pandas.Dataframe

NumberOfRandomSamples: int, Default 10**5: Size of the distribution to generate
NumberOfCPUs: int, Default 4: Number of processes to run simultaneously

Returns:

pandas.DataFrame: Dataframe containing autocorrelations of null-distribution of data.

Usage:

result = getRandomAutocorrelations(df_data)

getRandomPeriodograms(df_data, NumberOfRandomSamples=100000, NumberOfCPUs=4, fraction=0.75, referencePoint=0)[source]

Generate periodograms null-distribution from permutated data using Lomb-Scargle function.

Parameters:

df_data: pandas.Series or pandas.Dataframe

NumberOfRandomSamples: int, Default 10**5: Size of the distribution to generate
NumberOfCPUs: int, Default 4: Number of processes to run simultaneously

Returns:

pandas.DataFrame: Dataframe containing periodograms

Usage:

result = getRandomPeriodograms(df_data)