• G. Mias Lab »
  • Frequency Based Subject Match

    Submodule pyiomica.frequencySubjectMatch

    Functions:

    bootstrapGeneral(df, N[, shuffling])

    To generate bootstrap samples

    calculateLinksBetweenSubjectsByDistance(df1, ...)

    To calculate the linked time series/Genes from two dataframes base on the Euclidean distance

    calculateLinksBetweenSubjectsByCorrelation(...)

    To calculate the linked time series/Genes from two dataframes base on the pearson correlation

    getCommunityStructure(cs)

    To change community structure from {node1:community1, node2:community2,...} to {community1:[node1, node2,...], community2:[node3, node4,...]}

    getCommunityGenesDict(community_structure, ...)

    To get gene IDs list of each community within selected individuals' category

    splitGenes(community_gene_dict)

    Split gene ids, to seperate the genes name from attached labels

    getCommunityTopGenesByNumber(...[, ...])

    To get the top ranking genes of each community

    getCommunityTopGenesByFrequencyRanking(...)

    To get the top frequency genes of each community

    optimizeK(df, rangeK[, saveFig])

    To optimize the k value of k-mean cluster

    bootstrapGeneral(df, N, shuffling=True)[source]

    To generate bootstrap samples

    Parameters:
    df: pandas dataframe

    the source dataframe using to generate bootstrap samples

    N: integer

    the size of bootstrap samples

    shufflingboolean

    shuffle the data or not, The default is True.

    Returns:
    bootstrapDF: pandas dataframe

    the bootstrap samples

    calculateLinksBetweenSubjectsByDistance(df1, df2, cutoff)[source]

    To calculate the linked time series/Genes from two dataframes base on the Euclidean distance

    Parameters:
    df1: pandas dataframe

    the first time series from df1

    df2: pandas dataframe

    the second time series from df2

    cutoff: float

    if the distance between two time series less than cutoff, the two time series is linked time series

    Returns:
    numlinkedGenes: integer/float

    number of linked time series

    commonGenes: integer/float

    number of common time series in df1 and df2

    linkedGenes: list of string

    the ids/names of linked time series

    calculateLinksBetweenSubjectsByCorrelation(df1, df2, cutoff)[source]

    To calculate the linked time series/Genes from two dataframes base on the pearson correlation

    Parameters:
    df1: pandas dataframe

    the first time series from df1

    df2: pandas dataframe

    the second time series from df2

    cutoff: float

    if the pearson correlation between two time series less than cutoff, the two time series is linked time series

    Returns:
    numlinkedGenes: integer/float

    number of linked time series

    commonGenes: integer/float

    number of common time series in df1 and df2

    linkedGenes: list of string

    the ids/names of linked time series

    getCommunityStructure(cs)[source]

    To change community structure from {node1:community1, node2:community2,…} to {community1:[node1, node2,…], community2:[node3, node4,…]}

    Parameters:
    cs: dictionary

    the community structure as {node1:community1, node2:community2,…}

    Returns:
    community_structure: dictionary

    the community structure as {community1:[node1, node2,…], community2:[node3, node4,…]}

    getCommunityGenesDict(community_structure, genelist, endwithString)[source]

    To get gene IDs list of each community within selected individuals’ category

    Parameters:
    community_structure: dictionary

    the community structure as {community1:[node1, node2,…], community2:[node3, node4,…]}

    genelist: dictionary

    the gene list of each individuals, the key is the id of individual

    endwithString: list of string

    the selected individuals categories, which attached to the end of the individual ids

    Returns:
    community_genes_dict: dictionary

    the genes list of each community

    splitGenes(community_gene_dict)[source]

    Split gene ids, to seperate the genes name from attached labels

    Parameters:
    community_gene_dict: dictionary

    the genes ids list of each community

    Returns:
    new_dict: dictionary

    the gene names list of each community

    getCommunityTopGenesByNumber(community_structure, genelist, endwithString, numberOfTopGenes=500)[source]

    To get the top ranking genes of each community

    Parameters:
    community_structure: dictionary

    the community structure as {community1:[node1, node2,…], community2:[node3, node4,…]}

    genelist: dictionary

    the genes list of each community

    endwithString: list of string

    the selected individuals categories, which attached to the end of the individual ids

    numberOfTopGenes: integer, optional

    the number of top ranking genes. The default is 500.

    Returns:
    community_genes_dict: dictionary

    the top ranking genes of each community

    getCommunityTopGenesByFrequencyRanking(community_structure, genelist, endwithString, frequencyPercentage=50)[source]

    To get the top frequency genes of each community

    Parameters:
    community_structure: dictionary

    the community structure as {community1:[node1, node2,…], community2:[node3, node4,…]}

    genelist: dictionary

    the genes list of each community

    endwithString: list of string

    the selected individuals categories, which attached to the end of the individual ids

    frequencyPercentage: float, optional

    the top percentage frequency of choosed genes, The default is 50.

    Returns:
    community_genes_dict: dictionary

    the top percentage frequency genes of each community

    optimizeK(df, rangeK, saveFig=False, **kargs)[source]

    To optimize the k value of k-mean cluster

    Parameters:
    df: pandas dataframe

    the data source to do k-mean cluster

    rangeK: python range, e.g. rangeK = range(0,10)

    the K value range

    saveFig: boolean, optional

    save figure or not. The default is False.

    **kargs: figure name

    if saveFig is true, the **kargs is the figure name

    Returns:
    optimizek:integer

    the optimized K value