scranpy.batch_correction package#
Submodules#
scranpy.batch_correction.mnn_correct module#
- class scranpy.batch_correction.mnn_correct.MnnCorrectOptions(k=15, approximate=True, order=None, reference_policy='max-rss', num_mads=3, mass_cap=-1, num_threads=1)[source]#
Bases:
object
Options to pass to
mnn_correct()
.- k#
Number of neighbors for detecting mutual nearest neighbors.
- approximate#
Whether to perform an approximate nearest neighbor search.
- order#
Ordering of batches to correct. The first entry is used as the initial reference, and all subsequent batches are merged and added to the reference in the specified order. This should contain all unique levels in the
batch
argument supplied tomnn_correct()
. If None, an appropriate ordering is automatically determined.
- reference_policy#
Policy to use for choosing the initial reference batch. This can be one of “max-rss” (maximum residual sum of squares within the batch, which is the default), “max-variance” (maximum variance within the batch), “max-size” (maximum number of cells), or “input” (using the supplied order of levels in
batch
). Only used iforder
is not supplied.
- num_mads#
Number of median absolute deviations, used to define the threshold for outliers when computing the center of mass for each cell involved in a MNN pair. Larger values reduce kissing but may incorporate inappropriately distant subpopulations in a cell’s center of mass.
- mass_cap#
Cap on the number of observations used to compute the center of mass for each MNN-involved observation. The dataset is effectively downsampled to c observations for this calculation, which improves speed at the cost of some precision.
- num_threads#
Number of threads to use for the various MNN calculations.
- __annotations__ = {'approximate': <class 'bool'>, 'k': <class 'int'>, 'mass_cap': <class 'int'>, 'num_mads': <class 'int'>, 'num_threads': <class 'int'>, 'order': typing.Optional[typing.Sequence], 'reference_policy': <class 'str'>}#
- __dataclass_fields__ = {'approximate': Field(name='approximate',type=<class 'bool'>,default=True,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'k': Field(name='k',type=<class 'int'>,default=15,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'mass_cap': Field(name='mass_cap',type=<class 'int'>,default=-1,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'num_mads': Field(name='num_mads',type=<class 'int'>,default=3,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'num_threads': Field(name='num_threads',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'order': Field(name='order',type=typing.Optional[typing.Sequence],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'reference_policy': Field(name='reference_policy',type=<class 'str'>,default='max-rss',default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
- __eq__(other)#
Return self==value.
- __hash__ = None#
- __repr__()#
Return repr(self).
- class scranpy.batch_correction.mnn_correct.MnnCorrectResult(corrected=None, merge_order=None, num_pairs=None)[source]#
Bases:
object
Results from
mnn_correct()
.- corrected#
Matrix of corrected coordinates for each cell (row) and dimension (column). Rows and columns should be in the same order as the input
x
inmnn_correct()
.
- merge_order#
Order of batches used for merging. The first batch is used as the initial reference. The length of this list is equal to the number of batches.
- num_pairs#
Number of MNN pairs detected at each merge step. This has length one less than the number of batches.
- __annotations__ = {'corrected': typing.Optional[numpy.ndarray], 'merge_order': typing.Optional[list], 'num_pairs': typing.Optional[numpy.ndarray]}#
- __dataclass_fields__ = {'corrected': Field(name='corrected',type=typing.Optional[numpy.ndarray],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'merge_order': Field(name='merge_order',type=typing.Optional[list],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'num_pairs': Field(name='num_pairs',type=typing.Optional[numpy.ndarray],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
- __eq__(other)#
Return self==value.
- __hash__ = None#
- __repr__()#
Return repr(self).
- scranpy.batch_correction.mnn_correct.mnn_correct(x, batch, options=MnnCorrectOptions(k=15, approximate=True, order=None, reference_policy='max-rss', num_mads=3, mass_cap=-1, num_threads=1))[source]#
Identify mutual nearest neighbors (MNNs) to correct batch effects in a low-dimensional embedding.
- Parameters:
x (
ndarray
) – Numeric matrix where rows are cells and columns are dimensions, typically generated fromrun_pca()
.batch (
Sequence
) – Sequence of length equal to the number of cells (i.e., rows ofx
), specifying the batch for each cell.options (
MnnCorrectOptions
) – Optional parameters.
- Return type:
- Returns:
The corrected coordinates for each cell, along with some diagnostics about the MNNs involved.