scranpy.feature_selection package#
Submodules#
scranpy.feature_selection.choose_hvgs module#
- class scranpy.feature_selection.choose_hvgs.ChooseHvgsOptions(number=2500)[source]#
Bases:
object
Optional arguments for
choose_hvgs()
.- number#
Number of HVGs to retain. Larger values preserve more biological structure at the cost of increasing computational work and random noise from less-variable genes.
Defaults to 2500.
- __annotations__ = {'number': <class 'int'>}#
- __dataclass_fields__ = {'number': Field(name='number',type=<class 'int'>,default=2500,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
- __eq__(other)#
Return self==value.
- __hash__ = None#
- __repr__()#
Return repr(self).
- scranpy.feature_selection.choose_hvgs.choose_hvgs(stat, options=ChooseHvgsOptions(number=2500))[source]#
Choose highly variable genes for high-dimensional downstream steps such as
run_pca()
. This ensures that those steps focus on interesting biology, under the assumption that biological variation is larger than random noise.- Parameters:
stat (
ndarray
) – Array of variance modelling statistics, where larger values correspond to higher variability. This usually contains the residuals of the fitted mean-variance trend frommodel_gene_variances()
.options (
ChooseHvgsOptions
) – Optional parameters.
- Return type:
- Returns:
Array of booleans of length equal to
stat
, specifying whether a given gene is considered to be highly variable.
scranpy.feature_selection.model_gene_variances module#
- class scranpy.feature_selection.model_gene_variances.ModelGeneVariancesOptions(block=None, span=0.3, assay_type='logcounts', feature_names=None, num_threads=1)[source]#
Bases:
object
Optional arguments for
model_gene_variances()
.- block#
Block assignment for each cell. Variance modelling is performed within each block to avoid interference from inter-block differences.
If provided, this should have length equal to the number of cells, where cells have the same value if and only if they are in the same block. Defaults to None, indicating all cells are part of the same block.
- span#
Span to use for the LOWESS trend fitting. Larger values yield a smoother curve and reduces the risk of overfitting, at the cost of being less responsive to local variations. Defaults to 0.3.
- assay_type#
Assay to use from
input
if it is aSummarizedExperiment
.
- feature_names#
Sequence of feature names of length equal to the number of rows in
input
. If provided, this is used as the row names of the output data frames.
- num_threads#
Number of threads to use. Defaults to 1.
- __annotations__ = {'assay_type': typing.Union[int, str], 'block': typing.Optional[typing.Sequence], 'feature_names': typing.Optional[typing.Sequence[str]], 'num_threads': <class 'int'>, 'span': <class 'float'>}#
- __dataclass_fields__ = {'assay_type': Field(name='assay_type',type=typing.Union[int, str],default='logcounts',default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'block': Field(name='block',type=typing.Optional[typing.Sequence],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'feature_names': Field(name='feature_names',type=typing.Optional[typing.Sequence[str]],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'num_threads': Field(name='num_threads',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'span': Field(name='span',type=<class 'float'>,default=0.3,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
- __eq__(other)#
Return self==value.
- __hash__ = None#
- __repr__()#
Return repr(self).
- scranpy.feature_selection.model_gene_variances.model_gene_variances(input, options=ModelGeneVariancesOptions(block=None, span=0.3, assay_type='logcounts', feature_names=None, num_threads=1))[source]#
Compute gene variances and model them with a trend to account for non-trivial mean-variance relationships in count data. The residual from the trend can then be used to identify highly variable genes, e.g., with
choose_hvgs()
.- Parameters:
input (
Union
[TatamiNumericPointer
,SummarizedExperiment
]) –Matrix-like object where rows are features and columns are cells, typically containing log-normalized expression values from
log_norm_counts()
. This should be a matrix class that can be converted into aTatamiNumericPointer
.Alternatively, a
SummarizedExperiment
containing such a matrix in its assays.Developers may also provide a
TatamiNumericPointer
directly.options (
ModelGeneVariancesOptions
) – Optional parameters.
- Return type:
- Returns:
Data frame with variance modelling results for each gene, specifically the mean log-expression, the variance, the fitted value of the mean-variance trend and the residual from the trend. Each row of the data frame corresponds to a row of
input
.For multiple blocks, the data frame’s columns will represent the average across blocks. An extra
per_block
column will also be present containing a nestedBiocFrame
with the same per-block statistics.