scranpy.normalization package#

Submodules#

scranpy.normalization.center_size_factors module#

class scranpy.normalization.center_size_factors.CenterSizeFactorsOptions(block=None, in_place=False, allow_zeros=False, allow_non_finite=False)[source]#

Bases: object

Optional arguments for center_size_factors().

block#

Block assignment for each cell. This is used to adjust the centering of size factors so that higher-coverage blocks are scaled down.

If provided, this should have length equal to the number of cells, where cells have the same value if and only if they are in the same block. Defaults to None, indicating all cells are part of the same block.

in_place#

Whether to modify the size factors in place. If False, a new array is returned. This argument is ignored if the input size_factors are not double-precision, in which case a new array is always returned.

allow_zeros#

Whether to gracefully handle zero size factors. If True, zero size factors are automatically set to the smallest non-zero size factor. If False, an error is raised. Defaults to False.

allow_non_finite#

Whether to gracefully handle missing or infinite size factors. If True, infinite size factors are automatically set to the largest non-zero size factor, while missing values are automatically set to 1. If False, an error is raised.

__annotations__ = {'allow_non_finite': 'bool', 'allow_zeros': 'bool', 'block': 'Optional[Sequence]', 'in_place': 'bool'}#
__dataclass_fields__ = {'allow_non_finite': Field(name='allow_non_finite',type='bool',default=False,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'allow_zeros': Field(name='allow_zeros',type='bool',default=False,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'block': Field(name='block',type='Optional[Sequence]',default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'in_place': Field(name='in_place',type='bool',default=False,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
__dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
__eq__(other)#

Return self==value.

__hash__ = None#
__repr__()#

Return repr(self).

allow_non_finite: bool = False#
allow_zeros: bool = False#
block: Optional[Sequence] = None#
in_place: bool = False#
scranpy.normalization.center_size_factors.center_size_factors(size_factors, options=CenterSizeFactorsOptions(block=None, in_place=False, allow_zeros=False, allow_non_finite=False))[source]#

Center size factors before computing normalized values from the count matrix. This ensures that the normalized values are on the same scale as the original counts for easier interpretation.

Parameters:
Raises:

TypeError, ValueError – If arguments don’t meet expectations.

Return type:

ndarray

Returns:

Array containing centered size factors.

scranpy.normalization.grouped_size_factors module#

class scranpy.normalization.grouped_size_factors.GroupedSizeFactorsOptions(rank=25, groups=None, block=None, initial_factors=None, assay_type=0, num_threads=1)[source]#

Bases: object

Options to pass to grouped_size_factors().

groups#

Sequence of group assignments, of length equal to the number of cells.

rank#

Number of principal components to obtain in the low-dimensional representation prior to clustering. Only used if clusters is None.

block#

Sequence of block assignments, where PCA and clustering is performed within each block. Only used if clusters is None.

initial_factors#

Array of initial size factors to obtain a log-normalized matrix prior to PCA and clustering. Only used if clusters is None.

assay_type#

Assay containing the count matrix, if input is a SummarizedExperiment.

num_threads#

Number of threads to use for the various calculations.

__annotations__ = {'assay_type': typing.Union[int, str], 'block': typing.Optional[typing.Sequence], 'groups': typing.Optional[typing.Sequence], 'initial_factors': typing.Optional[typing.Sequence], 'num_threads': <class 'int'>, 'rank': <class 'int'>}#
__dataclass_fields__ = {'assay_type': Field(name='assay_type',type=typing.Union[int, str],default=0,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'block': Field(name='block',type=typing.Optional[typing.Sequence],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'groups': Field(name='groups',type=typing.Optional[typing.Sequence],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'initial_factors': Field(name='initial_factors',type=typing.Optional[typing.Sequence],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'num_threads': Field(name='num_threads',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'rank': Field(name='rank',type=<class 'int'>,default=25,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
__dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
__eq__(other)#

Return self==value.

__hash__ = None#
__repr__()#

Return repr(self).

assay_type: Union[int, str] = 0#
block: Optional[Sequence] = None#
groups: Optional[Sequence] = None#
initial_factors: Optional[Sequence] = None#
num_threads: int = 1#
rank: int = 25#
scranpy.normalization.grouped_size_factors.grouped_size_factors(input, options=GroupedSizeFactorsOptions(rank=25, groups=None, block=None, initial_factors=None, assay_type=0, num_threads=1))[source]#

Compute grouped size factors to remove composition biases between groups of cells. This sums all cells from the same group into a pseudo-cell, applies median-based normalization between pseudo-cells, and propagates the pseudo-cell size factors back to each cell via library size scaling.

Parameters:
Return type:

ndarray

Returns:

Array of size factors for each cell in input.

scranpy.normalization.log_norm_counts module#

class scranpy.normalization.log_norm_counts.LogNormCountsOptions(block=None, size_factors=None, center=True, center_size_factors_options=<factory>, delayed=True, with_size_factors=False, assay_type=0, num_threads=1)[source]#

Bases: object

Optional arguments for log_norm_counts().

size_factors#

Size factors for each cell. Defaults to None, in which case the library sizes are used.

delayed#

Whether to force the log-normalization to be delayed. This reduces memory usage by avoiding unnecessary copies of the count matrix.

center#

Whether to center the size factors. Defaults to True.

center_size_factors_options#

Optional arguments to pass to center_size_factors() if center = True.

with_size_factors#

Whether to return the (possibly centered) size factors in the output.

assay_type#

Assay to use from input if it is a SummarizedExperiment.

num_threads#

Number of threads to use to compute size factors, if none are provided in size_factors. Defaults to 1.

__annotations__ = {'assay_type': 'Union[str, int]', 'block': 'Optional[Sequence]', 'center': 'bool', 'center_size_factors_options': 'CenterSizeFactorsOptions', 'delayed': 'bool', 'num_threads': 'int', 'size_factors': 'Optional[ndarray]', 'with_size_factors': 'bool'}#
__dataclass_fields__ = {'assay_type': Field(name='assay_type',type='Union[str, int]',default=0,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'block': Field(name='block',type='Optional[Sequence]',default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'center': Field(name='center',type='bool',default=True,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'center_size_factors_options': Field(name='center_size_factors_options',type='CenterSizeFactorsOptions',default=<dataclasses._MISSING_TYPE object>,default_factory=<class 'scranpy.normalization.center_size_factors.CenterSizeFactorsOptions'>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'delayed': Field(name='delayed',type='bool',default=True,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'num_threads': Field(name='num_threads',type='int',default=1,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'size_factors': Field(name='size_factors',type='Optional[ndarray]',default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'with_size_factors': Field(name='with_size_factors',type='bool',default=False,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
__dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
__eq__(other)#

Return self==value.

__hash__ = None#
__repr__()#

Return repr(self).

assay_type: Union[str, int] = 0#
block: Optional[Sequence] = None#
center: bool = True#
center_size_factors_options: CenterSizeFactorsOptions#
delayed: bool = True#
num_threads: int = 1#
size_factors: Optional[ndarray] = None#
with_size_factors: bool = False#
scranpy.normalization.log_norm_counts.log_norm_counts(input, options=LogNormCountsOptions(block=None, size_factors=None, center=True, center_size_factors_options=CenterSizeFactorsOptions(block=None, in_place=False, allow_zeros=False, allow_non_finite=False), delayed=True, with_size_factors=False, assay_type=0, num_threads=1))[source]#

Compute log-transformed normalized values. The normalization removes uninteresting per-cell differences due to sequencing efficiency and library size. The subsequent log-transformation ensures that any differences in the log- values represent log-fold changes in downstream analysis steps; these relative changes in expression are more relevant than absolute changes.

Parameters:
  • input

    Matrix-like object containing cells in columns and features in rows, typically with count data. This should be a matrix class that can be converted into a TatamiNumericPointer. Developers may also provide the TatamiNumericPointer itself.

    Alternatively, a SummarizedExperiment containing such a matrix in its assays.

    Developers may also provide a TatamiNumericPointer directly.

  • options (LogNormCountsOptions) – Optional parameters.

Raises:

TypeError, ValueError – If arguments don’t meet expectations.

Returns:

If options.with_size_factors = False, the log-normalized matrix is directly returned. This is either a TatamiNumericPointer, if input is also a TatamiNumericPointer; as a DelayedArray, if input is array-like and delayed = True; or otherwise, an object of the same type as input.

If options.with_size_factors = True, a 2-tuple is returned containing the log-normalized matrix and an array of (possibly centered) size factors.

Module contents#