scranpy.batch_correction package#

Submodules#

scranpy.batch_correction.mnn_correct module#

class scranpy.batch_correction.mnn_correct.MnnCorrectOptions(k=15, approximate=True, order=None, reference_policy='max-rss', num_mads=3, mass_cap=-1, num_threads=1)[source]#

Bases: object

Options to pass to mnn_correct().

k#

Number of neighbors for detecting mutual nearest neighbors.

approximate#

Whether to perform an approximate nearest neighbor search.

order#

Ordering of batches to correct. The first entry is used as the initial reference, and all subsequent batches are merged and added to the reference in the specified order. This should contain all unique levels in the batch argument supplied to mnn_correct(). If None, an appropriate ordering is automatically determined.

reference_policy#

Policy to use for choosing the initial reference batch. This can be one of “max-rss” (maximum residual sum of squares within the batch, which is the default), “max-variance” (maximum variance within the batch), “max-size” (maximum number of cells), or “input” (using the supplied order of levels in batch). Only used if order is not supplied.

num_mads#

Number of median absolute deviations, used to define the threshold for outliers when computing the center of mass for each cell involved in a MNN pair. Larger values reduce kissing but may incorporate inappropriately distant subpopulations in a cell’s center of mass.

mass_cap#

Cap on the number of observations used to compute the center of mass for each MNN-involved observation. The dataset is effectively downsampled to c observations for this calculation, which improves speed at the cost of some precision.

num_threads#

Number of threads to use for the various MNN calculations.

__annotations__ = {'approximate': <class 'bool'>, 'k': <class 'int'>, 'mass_cap': <class 'int'>, 'num_mads': <class 'int'>, 'num_threads': <class 'int'>, 'order': typing.Optional[typing.Sequence], 'reference_policy': <class 'str'>}#
__dataclass_fields__ = {'approximate': Field(name='approximate',type=<class 'bool'>,default=True,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'k': Field(name='k',type=<class 'int'>,default=15,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'mass_cap': Field(name='mass_cap',type=<class 'int'>,default=-1,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'num_mads': Field(name='num_mads',type=<class 'int'>,default=3,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'num_threads': Field(name='num_threads',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'order': Field(name='order',type=typing.Optional[typing.Sequence],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'reference_policy': Field(name='reference_policy',type=<class 'str'>,default='max-rss',default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
__dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
__eq__(other)#

Return self==value.

__hash__ = None#
__repr__()#

Return repr(self).

approximate: bool = True#
k: int = 15#
mass_cap: int = -1#
num_mads: int = 3#
num_threads: int = 1#
order: Optional[Sequence] = None#
reference_policy: str = 'max-rss'#
class scranpy.batch_correction.mnn_correct.MnnCorrectResult(corrected=None, merge_order=None, num_pairs=None)[source]#

Bases: object

Results from mnn_correct().

corrected#

Matrix of corrected coordinates for each cell (row) and dimension (column). Rows and columns should be in the same order as the input x in mnn_correct().

merge_order#

Order of batches used for merging. The first batch is used as the initial reference. The length of this list is equal to the number of batches.

num_pairs#

Number of MNN pairs detected at each merge step. This has length one less than the number of batches.

__annotations__ = {'corrected': typing.Optional[numpy.ndarray], 'merge_order': typing.Optional[list], 'num_pairs': typing.Optional[numpy.ndarray]}#
__dataclass_fields__ = {'corrected': Field(name='corrected',type=typing.Optional[numpy.ndarray],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'merge_order': Field(name='merge_order',type=typing.Optional[list],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'num_pairs': Field(name='num_pairs',type=typing.Optional[numpy.ndarray],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
__dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
__eq__(other)#

Return self==value.

__hash__ = None#
__repr__()#

Return repr(self).

corrected: Optional[ndarray] = None#
merge_order: Optional[list] = None#
num_pairs: Optional[ndarray] = None#
scranpy.batch_correction.mnn_correct.mnn_correct(x, batch, options=MnnCorrectOptions(k=15, approximate=True, order=None, reference_policy='max-rss', num_mads=3, mass_cap=-1, num_threads=1))[source]#

Identify mutual nearest neighbors (MNNs) to correct batch effects in a low-dimensional embedding.

Parameters:
  • x (ndarray) – Numeric matrix where rows are cells and columns are dimensions, typically generated from run_pca().

  • batch (Sequence) – Sequence of length equal to the number of cells (i.e., rows of x), specifying the batch for each cell.

  • options (MnnCorrectOptions) – Optional parameters.

Return type:

MnnCorrectResult

Returns:

The corrected coordinates for each cell, along with some diagnostics about the MNNs involved.

Module contents#