Frequency Based Subject Match¶
Submodule pyiomica.frequencySubjectMatch
Functions:
|
To generate bootstrap samples |
|
To calculate the linked time series/Genes from two dataframes base on the Euclidean distance |
To calculate the linked time series/Genes from two dataframes base on the pearson correlation |
|
To change community structure from {node1:community1, node2:community2,...} to {community1:[node1, node2,...], community2:[node3, node4,...]} |
|
|
To get gene IDs list of each community within selected individuals' category |
|
Split gene ids, to seperate the genes name from attached labels |
|
To get the top ranking genes of each community |
To get the top frequency genes of each community |
|
|
To optimize the k value of k-mean cluster |
- bootstrapGeneral(df, N, shuffling=True)[source]¶
To generate bootstrap samples
- Parameters:
- df: pandas dataframe
the source dataframe using to generate bootstrap samples
- N: integer
the size of bootstrap samples
- shufflingboolean
shuffle the data or not, The default is True.
- Returns:
- bootstrapDF: pandas dataframe
the bootstrap samples
- calculateLinksBetweenSubjectsByDistance(df1, df2, cutoff)[source]¶
To calculate the linked time series/Genes from two dataframes base on the Euclidean distance
- Parameters:
- df1: pandas dataframe
the first time series from df1
- df2: pandas dataframe
the second time series from df2
- cutoff: float
if the distance between two time series less than cutoff, the two time series is linked time series
- Returns:
- numlinkedGenes: integer/float
number of linked time series
- commonGenes: integer/float
number of common time series in df1 and df2
- linkedGenes: list of string
the ids/names of linked time series
- calculateLinksBetweenSubjectsByCorrelation(df1, df2, cutoff)[source]¶
To calculate the linked time series/Genes from two dataframes base on the pearson correlation
- Parameters:
- df1: pandas dataframe
the first time series from df1
- df2: pandas dataframe
the second time series from df2
- cutoff: float
if the pearson correlation between two time series less than cutoff, the two time series is linked time series
- Returns:
- numlinkedGenes: integer/float
number of linked time series
- commonGenes: integer/float
number of common time series in df1 and df2
- linkedGenes: list of string
the ids/names of linked time series
- getCommunityStructure(cs)[source]¶
To change community structure from {node1:community1, node2:community2,…} to {community1:[node1, node2,…], community2:[node3, node4,…]}
- Parameters:
- cs: dictionary
the community structure as {node1:community1, node2:community2,…}
- Returns:
- community_structure: dictionary
the community structure as {community1:[node1, node2,…], community2:[node3, node4,…]}
- getCommunityGenesDict(community_structure, genelist, endwithString)[source]¶
To get gene IDs list of each community within selected individuals’ category
- Parameters:
- community_structure: dictionary
the community structure as {community1:[node1, node2,…], community2:[node3, node4,…]}
- genelist: dictionary
the gene list of each individuals, the key is the id of individual
- endwithString: list of string
the selected individuals categories, which attached to the end of the individual ids
- Returns:
- community_genes_dict: dictionary
the genes list of each community
- splitGenes(community_gene_dict)[source]¶
Split gene ids, to seperate the genes name from attached labels
- Parameters:
- community_gene_dict: dictionary
the genes ids list of each community
- Returns:
- new_dict: dictionary
the gene names list of each community
- getCommunityTopGenesByNumber(community_structure, genelist, endwithString, numberOfTopGenes=500)[source]¶
To get the top ranking genes of each community
- Parameters:
- community_structure: dictionary
the community structure as {community1:[node1, node2,…], community2:[node3, node4,…]}
- genelist: dictionary
the genes list of each community
- endwithString: list of string
the selected individuals categories, which attached to the end of the individual ids
- numberOfTopGenes: integer, optional
the number of top ranking genes. The default is 500.
- Returns:
- community_genes_dict: dictionary
the top ranking genes of each community
- getCommunityTopGenesByFrequencyRanking(community_structure, genelist, endwithString, frequencyPercentage=50)[source]¶
To get the top frequency genes of each community
- Parameters:
- community_structure: dictionary
the community structure as {community1:[node1, node2,…], community2:[node3, node4,…]}
- genelist: dictionary
the genes list of each community
- endwithString: list of string
the selected individuals categories, which attached to the end of the individual ids
- frequencyPercentage: float, optional
the top percentage frequency of choosed genes, The default is 50.
- Returns:
- community_genes_dict: dictionary
the top percentage frequency genes of each community
- optimizeK(df, rangeK, saveFig=False, **kargs)[source]¶
To optimize the k value of k-mean cluster
- Parameters:
- df: pandas dataframe
the data source to do k-mean cluster
- rangeK: python range, e.g. rangeK = range(0,10)
the K value range
- saveFig: boolean, optional
save figure or not. The default is False.
- **kargs: figure name
if saveFig is true, the **kargs is the figure name
- Returns:
- optimizek:integer
the optimized K value