• G. Mias Lab »
  • Extended DataFrame and data-processing functions

    Submodule pyiomica.extendedDataFrame

    PyIOmica Dataframe extending Pandas DataFrame with new functions

    Classes:

    DataFrame([data, index, columns, dtype, copy])

    Class based on pandas.DataFrame extending capabilities into the doamin of PyIOmica

    Functions:

    mergeDataframes(listOfDataframes[, axis])

    Merge a list of Dataframes (outer join).

    getLombScarglePeriodogramOfDataframe(df_data)

    Calculate Lomb-Scargle periodogram of DataFrame.

    getRandomSpikesCutoffs(df_data, p_cutoff[, ...])

    Calculate spikes cuttoffs from a bootstrap of provided data, gived the significance cutoff p_cutoff.

    getRandomAutocorrelations(df_data[, ...])

    Generate autocorrelation null-distribution from permutated data using Lomb-Scargle Autocorrelation.

    getRandomPeriodograms(df_data[, ...])

    Generate periodograms null-distribution from permutated data using Lomb-Scargle function.

    class DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)[source]

    Bases: DataFrame

    Class based on pandas.DataFrame extending capabilities into the doamin of PyIOmica

    Initialization parameters are identical to those in pandas.DataFrame See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html for detail.

    Methods:

    __init__([data, index, columns, dtype, copy])

    Initialization method

    filterOutAllZeroSignals([inplace])

    Filter out all-zero signals from a DataFrame.

    filterOutFractionZeroSignals(...[, inplace])

    Filter out fraction-zero signals from a DataFrame.

    filterOutFractionMissingSignals(...[, inplace])

    Filter out fraction-zero signals from a DataFrame.

    filterOutReferencePointZeroSignals([...])

    Filter out out first time point zeros signals from a DataFrame.

    tagValueAsMissing([value, inplace])

    Tag zero values with NaN.

    tagMissingAsValue([value, inplace])

    Tag NaN with zero.

    tagLowValues(cutoff, replacement[, inplace])

    Tag low values with replacement value.

    removeConstantSignals(theta_cutoff[, inplace])

    Remove constant signals.

    boxCoxTransform([axis, inplace])

    Box-cox transform data.

    modifiedZScore([axis, inplace])

    Z-score (Median-based) transform data.

    normalizeSignalsToUnity([referencePoint, ...])

    Normalize signals to unity.

    quantileNormalize([output_distribution, ...])

    Quantile Normalize signals in a DataFrame.

    compareTimeSeriesToPoint([point, inplace])

    Subtract a particular point of each time series (row) of a Dataframe.

    compareTwoTimeSeries(df[, function, ...])

    Create a new Dataframe based on comparison of two existing Dataframes.

    imputeMissingWithMedian([axis, inplace])

    Normalize signals to unity.

    __init__(data=None, index=None, columns=None, dtype=None, copy=False)[source]

    Initialization method

    filterOutAllZeroSignals(inplace=False)[source]

    Filter out all-zero signals from a DataFrame.

    Parameters:
    inplace: boolean, Default False

    Whether to modify data in place or return a new one

    Returns:
    Dataframe or None

    Processed data

    Usage:

    df_data = df_data.filterOutAllZeroSignals()

    or

    df_data.filterOutAllZeroSignalse(inplace=True)

    filterOutFractionZeroSignals(min_fraction_of_non_zeros, inplace=False)[source]

    Filter out fraction-zero signals from a DataFrame.

    Parameters:
    min_fraction_of_non_zeros: float

    Maximum fraction of allowed zeros

    inplace: boolean, Default False

    Whether to modify data in place or return a new one

    Returns:
    Dataframe or None

    Processed data

    Usage:

    df_data = df_data.filterOutFractionZeroSignals(0.75)

    or

    df_data.filterOutFractionZeroSignals(0.75, inplace=True)

    filterOutFractionMissingSignals(min_fraction_of_non_missing, inplace=False)[source]

    Filter out fraction-zero signals from a DataFrame.

    Parameters:
    min_fraction_of_non_missing: float

    Maximum fraction of allowed zeros

    inplace: boolean, Default False

    Whether to modify data in place or return a new one

    Returns:
    Dataframe or None

    Processed data

    Usage:

    df_data = df_data.filterOutFractionMissingSignals(0.75)

    or

    df_data.filterOutFractionMissingSignals(0.75, inplace=True)

    filterOutReferencePointZeroSignals(referencePoint=0, inplace=False)[source]

    Filter out out first time point zeros signals from a DataFrame.

    Parameters:
    referencePoint: int, Default 0

    Index of the reference point

    inplace: boolean, Default False

    Whether to modify data in place or return a new one

    Returns:
    Dataframe or None

    Processed data

    Usage:

    df_data = df_data.filterOutFirstPointZeroSignals()

    or

    df_data.filterOutFirstPointZeroSignals(inplace=True)

    tagValueAsMissing(value=0.0, inplace=False)[source]

    Tag zero values with NaN.

    Parameters:
    inplace: boolean, Default False

    Whether to modify data in place or return a new one

    Returns:
    Dataframe or None

    Processed data

    Usage:

    df_data = df_data.tagValueAsMissing()

    or

    df_data.tagValueAsMissing(inplace=True)

    tagMissingAsValue(value=0.0, inplace=False)[source]

    Tag NaN with zero.

    Parameters:
    inplace: boolean, Default False

    Whether to modify data in place or return a new one

    Returns:
    Dataframe or None

    Processed data

    Usage:

    df_data = df_data.tagMissingAsValue()

    or

    df_data.tagMissingAsValue(inplace=True)

    tagLowValues(cutoff, replacement, inplace=False)[source]

    Tag low values with replacement value.

    Parameters:
    cutoff: float

    Values below the “cutoff” are replaced with “replacement” value

    replacement: float

    Values below the “cutoff” are replaced with “replacement” value

    inplace: boolean, Default False

    Whether to modify data in place or return a new one

    Returns:
    Dataframe or None

    Processed data

    Usage:

    df_data = df_data.tagLowValues(1., 1.)

    or

    df_data.tagLowValues(1., 1., inplace=True)

    removeConstantSignals(theta_cutoff, inplace=False)[source]

    Remove constant signals.

    Parameters:
    theta_cutoff: float

    Parameter for filtering the signals

    inplace: boolean, Default False

    Whether to modify data in place or return a new one

    Returns:
    Dataframe or None

    Processed data

    Usage:

    df_data = df_data.removeConstantSignals(0.3)

    or

    df_data.removeConstantSignals(0.3, inplace=True)

    boxCoxTransform(axis=1, inplace=False)[source]

    Box-cox transform data.

    Parameters:
    axis: int, Default 1

    Direction of processing, columns (1) or rows (0)

    inplace: boolean, Default False

    Whether to modify data in place or return a new one

    Returns:
    Dataframe or None

    Processed data

    Usage:

    df_data = df_data.boxCoxTransformDataframe()

    or

    df_data.boxCoxTransformDataframe(inplace=True)

    modifiedZScore(axis=0, inplace=False)[source]

    Z-score (Median-based) transform data.

    Parameters:
    axis: int, Default 1

    Direction of processing, rows (1) or columns (0)

    inplace: boolean, Default False

    Whether to modify data in place or return a new one

    Returns:
    Dataframe or None

    Processed data

    Usage:

    df_data = df_data.modifiedZScoreDataframe()

    or

    df_data.modifiedZScoreDataframe(inplace=True)

    normalizeSignalsToUnity(referencePoint=0, inplace=False)[source]

    Normalize signals to unity.

    Parameters:
    referencePoint: int, Default 0

    Index of the reference point

    inplace: boolean, Default False

    Whether to modify data in place or return a new one

    Returns:
    Dataframe or None

    Processed data

    Usage:

    df_data = df_data.normalizeSignalsToUnityDataframe()

    or

    df_data.normalizeSignalsToUnityDataframe(inplace=True)

    quantileNormalize(output_distribution='original', averaging=<function mean>, ties=<function mean>, inplace=False)[source]

    Quantile Normalize signals in a DataFrame.

    Note that it is possible there may be equal values within the dataset. In such a scenario, by default, the quantile normalization implementation considered here works by replacing the degenerate values with the mean over all the degenerate ranks. Note, that for the default option to work the data should not have any missing values. If output_distribution is set to ‘uniform’ or ‘normal’ then the scikit-learn’s Quantile Transformation is used.

    Parameters:
    output_distribution: str, Default ‘original’

    Output distribution. Other options are ‘normal’ and ‘uniform’

    averaging: function, Default np.mean

    With what value to replace the same-rank elements across samples. Default is to take the mean of same-rank elements

    ties: function or str, Default np.mean

    Function or name of the function. How ties should be handled. Default is to replace ties with their mean. Other possible options are: ‘mean’, ‘median’, ‘prod’, ‘sum’, etc.

    inplace: boolean, Default False

    Whether to modify data in place or return a new one

    Returns:
    Dataframe or None

    Processed data

    Usage:

    df_data = pd.DataFrame(index=[‘Gene 1’,’Gene 2’,’Gene 3’,’Gene 4’], columns=[‘Col 0’,’Col 1’,’Col 2’], data=np.array([[5, 4, 3], [2, 1, 4], [3, 4, 6], [4, 2, 8]]))

    df_data = df_data.quantileNormalize()

    or

    df_data.df_data.quantileNormalize(inplace=True)

    compareTimeSeriesToPoint(point='first', inplace=False)[source]

    Subtract a particular point of each time series (row) of a Dataframe.

    Parameters:
    point: str, int or float

    Possible options are ‘first’, ‘last’, 0, 1, … , 10, or a value.

    inplace: boolean, Default False

    Whether to modify data in place or return a new one

    Returns:
    Dataframe or None

    Processed data

    Usage:

    df_data = df_data.compareTimeSeriesToPoint()

    or

    df_data.compareTimeSeriesToPoint(df_data)

    compareTwoTimeSeries(df, function=<ufunc 'subtract'>, compareAllLevelsInIndex=True, mergeFunction=<function mean>)[source]

    Create a new Dataframe based on comparison of two existing Dataframes.

    Parameters:
    df: pandas.DataFrame

    Data to compare

    function: function, Default np.subtract

    Other options are np.add, np.divide, or another <ufunc>.

    compareAllLevelsInIndex: boolean, Default True

    Whether to compare all levels in index. If False only “source” and “id” will be compared

    mergeFunction: function, Default np.mean

    Input Dataframes are merged with this function, i.e. np.mean (default), np.median, np.max, or another <ufunc>.

    Returns:
    DataFrame or None

    Processed data

    Usage:

    df_data = df_dataH2.compareTwoTimeSeries(df_dataH1, function=np.subtract, compareAllLevelsInIndex=False, mergeFunction=np.median)

    imputeMissingWithMedian(axis=1, inplace=False)[source]

    Normalize signals to unity.

    Parameters:
    axis: int, Default 1

    Axis to apply trasnformation along

    inplace: boolean, Default False

    Whether to modify data in place or return a new one

    Returns:
    Dataframe or None

    Processed data

    Usage:

    df_data = df_data.imputeMissingWithMedian()

    or

    df_data.imputeMissingWithMedian(inplace=True)

    mergeDataframes(listOfDataframes, axis=0)[source]

    Merge a list of Dataframes (outer join).

    Parameters:
    listOfDataframes: list

    List of pandas.DataFrames

    axis: int, Default 0

    Merge direction. 0 to stack vertically, 1 to stack horizontally

    Returns:
    pandas.Dataframe

    Processed data

    Usage:

    df_data = mergeDataframes([df_data1, df_data2])

    getLombScarglePeriodogramOfDataframe(df_data, NumberOfCPUs=4, parallel=True)[source]

    Calculate Lomb-Scargle periodogram of DataFrame.

    Parameters:
    df: pandas.DataFrame

    Data to process

    parallel: boolean, Default True

    Whether to calculate in parallel mode (>1 process)

    NumberOfCPUs: int, Default 4

    Number of processes to create if parallel is True

    Returns:
    pandas.Dataframe

    Lomb-Scargle periodograms

    Usage:

    df_periodograms = getLombScarglePeriodogramOfDataframe(df_data)

    getRandomSpikesCutoffs(df_data, p_cutoff, NumberOfRandomSamples=1000)[source]

    Calculate spikes cuttoffs from a bootstrap of provided data, gived the significance cutoff p_cutoff.

    Parameters:
    df_data: pandas.DataFrame

    Data where rows are normalized signals

    p_cutoff: float

    p-Value cutoff, e.g. 0.01

    NumberOfRandomSamples: int, Default 1000

    Size of the bootstrap distribution

    Returns:
    dictionary

    Dictionary of spike cutoffs.

    Usage:

    cutoffs = getSpikesCutoffs(df_data, 0.01)

    getRandomAutocorrelations(df_data, NumberOfRandomSamples=100000, NumberOfCPUs=4, fraction=0.75, referencePoint=0)[source]

    Generate autocorrelation null-distribution from permutated data using Lomb-Scargle Autocorrelation. NOTE: there should be already no missing or non-numeric points in the input Series or Dataframe

    Parameters:

    df_data: pandas.Series or pandas.Dataframe

    NumberOfRandomSamples: int, Default 10**5

    Size of the distribution to generate

    NumberOfCPUs: int, Default 4

    Number of processes to run simultaneously

    Returns:
    pandas.DataFrame

    Dataframe containing autocorrelations of null-distribution of data.

    Usage:

    result = getRandomAutocorrelations(df_data)

    getRandomPeriodograms(df_data, NumberOfRandomSamples=100000, NumberOfCPUs=4, fraction=0.75, referencePoint=0)[source]

    Generate periodograms null-distribution from permutated data using Lomb-Scargle function.

    Parameters:

    df_data: pandas.Series or pandas.Dataframe

    NumberOfRandomSamples: int, Default 10**5

    Size of the distribution to generate

    NumberOfCPUs: int, Default 4

    Number of processes to run simultaneously

    Returns:
    pandas.DataFrame

    Dataframe containing periodograms

    Usage:

    result = getRandomPeriodograms(df_data)