scranpy.feature_set_enrichment package#

Submodules#

scranpy.feature_set_enrichment.hypergeometric_test module#

class scranpy.feature_set_enrichment.hypergeometric_test.HypergeometricTestOptions(log=False, upper_tail=True, num_threads=1)[source]#

Bases: object

Options for hypergeometric_tail().

log#

Whether to report log-transformed p-values.

upper_tail#

Whether to compute the upper tail of the hypergeometric distribution, i.e., test for overrepresentation.

num_threads#

Number of threads to use.

__annotations__ = {'log': <class 'bool'>, 'num_threads': <class 'int'>, 'upper_tail': <class 'bool'>}#
__dataclass_fields__ = {'log': Field(name='log',type=<class 'bool'>,default=False,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'num_threads': Field(name='num_threads',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'upper_tail': Field(name='upper_tail',type=<class 'bool'>,default=True,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
__dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
__eq__(other)#

Return self==value.

__hash__ = None#
__repr__()#

Return repr(self).

log: bool = False#
num_threads: int = 1#
upper_tail: bool = True#
scranpy.feature_set_enrichment.hypergeometric_test.hypergeometric_test(markers_in_set, set_size, total_markers, total_genes, options=HypergeometricTestOptions(log=False, upper_tail=True, num_threads=1))[source]#

Run the hypergeometric test to identify enrichment of interesting (usually marker) genes in a feature set or pathway.

Parameters:
  • markers_in_set (Union[int, Sequence[int]]) – Array containing the number of markers inside the feature sets. Alternatively, a single integer containing this number.

  • set_size (Union[int, Sequence[int]]) – Array containing the sizes of the feature sets. Alternatively, a single integer containing this number.

  • total_markers (Union[int, Sequence[int]]) – Array containing the total number of markers. Alternatively, a single integer containing this number.

  • total_genes (Union[int, Sequence[int]]) – Array containing the total number of genes in the analysis. Alternatively, a single integer containing this number.

  • options – Further options.

Return type:

ndarray

Returns:

Array of p-values of length equal to the length of the input arrays (or 1, if all inputs were scalars).

Each array input is expected to be 1-dimensional and of the same length, where a hypergeometric test is applied on the corresponding values across all arrays. However, any of the arguments may be integers, in which case they are recycled to the length of the arrays for testing.

scranpy.feature_set_enrichment.score_feature_set module#

class scranpy.feature_set_enrichment.score_feature_set.ScoreFeatureSetOptions(block=None, scale=False, assay_type='logcounts', num_threads=1)[source]#

Bases: object

Options to pass to score_feature_set().

block#

Block assignment for each cell. Thresholds are computed within each block to avoid inflated variances from inter-block differences.

If provided, this should have length equal to the number of cells, where cells have the same value if and only if they are in the same block. Defaults to None, indicating all cells are part of the same block.

scale#

Whether to scale the features to unit variance before computing the scores.

assay_type#

Assay to use from input if it is a SummarizedExperiment.

num_threads#

Number of threads to use.

__annotations__ = {'assay_type': typing.Union[int, str], 'block': typing.Optional[typing.Sequence], 'num_threads': <class 'int'>, 'scale': <class 'bool'>}#
__dataclass_fields__ = {'assay_type': Field(name='assay_type',type=typing.Union[int, str],default='logcounts',default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'block': Field(name='block',type=typing.Optional[typing.Sequence],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'num_threads': Field(name='num_threads',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'scale': Field(name='scale',type=<class 'bool'>,default=False,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
__dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
__eq__(other)#

Return self==value.

__hash__ = None#
__repr__()#

Return repr(self).

assay_type: Union[int, str] = 'logcounts'#
block: Optional[Sequence] = None#
num_threads: int = 1#
scale: bool = False#
scranpy.feature_set_enrichment.score_feature_set.score_feature_set(input, subset, options=ScoreFeatureSetOptions(block=None, scale=False, assay_type='logcounts', num_threads=1))[source]#

Compute a score for the activity of a feature set in each cell. This is done using a slightly modified version of the GSDecon algorithm, where we perform a PCA to obtain the rank-1 reconstruction of the feature set’s expression values across all cells; the mean of the reconstructed values serves as the score per cell, while the rotation vector is reported as the weights on the features involved.

Parameters:
  • input (Union[TatamiNumericPointer, SummarizedExperiment]) – Matrix-like object containing cells in columns and features in rows, typically with log-normalized expression data. This should be a matrix class that can be converted into a TatamiNumericPointer. Developers may also provide the TatamiNumericPointer itself.

  • subset (Sequence) – Array of integer indices, specifying the rows of input belonging to the features subset. Alternatively, an array of length equal to the number of rows in input, containing booleans specifying that the corresponding row belongs to the subset.

  • options – Optional parameters.

Return type:

Tuple[ndarray, ndarray]

Returns:

Tuple where the first array is of length equal to the number of columns of input and contains the feature set score for each cell. The second array is of length equal to the number of features in subset and contains the weight for each feature.

Module contents#