scranpy.quality_control package#

Submodules#

scranpy.quality_control.create_adt_qc_filter module#

class scranpy.quality_control.create_adt_qc_filter.CreateAdtQcFilterOptions(block=None)[source]#

Bases: object

Optional arguments for create_adt_qc_filter().

block#

Block assignment for each cell. This should be the same as that used in in suggest_adt_qc_filters().

__annotations__ = {'block': typing.Optional[typing.Sequence]}#
__dataclass_fields__ = {'block': Field(name='block',type=typing.Optional[typing.Sequence],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
__dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
__eq__(other)#

Return self==value.

__hash__ = None#
__repr__()#

Return repr(self).

block: Optional[Sequence] = None#
scranpy.quality_control.create_adt_qc_filter.create_adt_qc_filter(metrics, thresholds, options=CreateAdtQcFilterOptions(block=None))[source]#

Defines a filtering vector based on the RNA-derived per-cell quality control (QC) metrics and thresholds.

Parameters:
  • metrics (BiocFrame) – Data frame of metrics, see per_cell_adt_qc_metrics() for the expected format.

  • thresholds (BiocFrame) – Data frame of filter thresholds, see suggest_adt_qc_filters() for the expected format.

  • options (CreateAdtQcFilterOptions) – Optional parameters.

Return type:

ndarray

Returns:

A boolean array where True entries mark the cells to be discarded.

scranpy.quality_control.create_crispr_qc_filter module#

class scranpy.quality_control.create_crispr_qc_filter.CreateCrisprQcFilterOptions(block=None)[source]#

Bases: object

Optional arguments for create_crispr_qc_filter().

block#

Block assignment for each cell. This should be the same as that used in in suggest_crispr_qc_filters().

__annotations__ = {'block': typing.Optional[typing.Sequence]}#
__dataclass_fields__ = {'block': Field(name='block',type=typing.Optional[typing.Sequence],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
__dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
__eq__(other)#

Return self==value.

__hash__ = None#
__repr__()#

Return repr(self).

block: Optional[Sequence] = None#
scranpy.quality_control.create_crispr_qc_filter.create_crispr_qc_filter(metrics, thresholds, options=CreateCrisprQcFilterOptions(block=None))[source]#

Defines a filtering vector based on the RNA-derived per-cell quality control (QC) metrics and thresholds.

Parameters:
Return type:

ndarray

Returns:

A boolean array where True entries mark the cells to be discarded.

scranpy.quality_control.create_rna_qc_filter module#

class scranpy.quality_control.create_rna_qc_filter.CreateRnaQcFilterOptions(block=None)[source]#

Bases: object

Optional arguments for create_rna_qc_filter().

block#

Block assignment for each cell. This should be the same as that used in in suggest_rna_qc_filters().

__annotations__ = {'block': typing.Optional[typing.Sequence]}#
__dataclass_fields__ = {'block': Field(name='block',type=typing.Optional[typing.Sequence],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
__dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
__eq__(other)#

Return self==value.

__hash__ = None#
__repr__()#

Return repr(self).

block: Optional[Sequence] = None#
scranpy.quality_control.create_rna_qc_filter.create_rna_qc_filter(metrics, thresholds, options=CreateRnaQcFilterOptions(block=None))[source]#

Defines a filtering vector based on the RNA-derived per-cell quality control (QC) metrics and thresholds.

Parameters:
  • metrics (BiocFrame) – Data frame of metrics, see per_cell_rna_qc_metrics() for the expected format.

  • thresholds (BiocFrame) – Data frame of filter thresholds, see suggest_rna_qc_filters() for the expected format.

  • options (CreateRnaQcFilterOptions) – Optional parameters.

Return type:

ndarray

Returns:

A boolean array where True entries mark the cells to be discarded.

scranpy.quality_control.filter_cells module#

class scranpy.quality_control.filter_cells.FilterCellsOptions(discard=True, intersect=False, with_retain_vector=False, delayed=True)[source]#

Bases: object

Optional arguments for filter_cells().

discard#

Whether to discard the cells listed in filter. If False, the specified cells are retained instead, and all other cells are discarded. Defaults to True.

intersect#

Whether to take the intersection or union of multiple filter arrays, to create a combined filtering array. Note that this is orthogonal to discard.

with_retain_vector#

Whether to return a vector specifying which cells are to be retained.

delayed#

Whether to force the filtering operation to be delayed. This reduces memory usage by avoiding unnecessary copies of the count matrix.

__annotations__ = {'delayed': <class 'bool'>, 'discard': <class 'bool'>, 'intersect': <class 'bool'>, 'with_retain_vector': <class 'bool'>}#
__dataclass_fields__ = {'delayed': Field(name='delayed',type=<class 'bool'>,default=True,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'discard': Field(name='discard',type=<class 'bool'>,default=True,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'intersect': Field(name='intersect',type=<class 'bool'>,default=False,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'with_retain_vector': Field(name='with_retain_vector',type=<class 'bool'>,default=False,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
__dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
__eq__(other)#

Return self==value.

__hash__ = None#
__repr__()#

Return repr(self).

delayed: bool = True#
discard: bool = True#
intersect: bool = False#
with_retain_vector: bool = False#
scranpy.quality_control.filter_cells.filter_cells(input, filter, options=FilterCellsOptions(discard=True, intersect=False, with_retain_vector=False, delayed=True))[source]#

Filter out low-quality cells, usually based on metrics and filter thresholds defined from the data, e.g., create_rna_qc_filter().

Parameters:
  • input – Matrix-like object containing cells in columns and features in rows. This should be a matrix class that can be converted into a TatamiNumericPointer. Developers may also provide the TatamiNumericPointer itself.

  • filter (Union[Sequence[int], Sequence[bool], tuple]) –

    Array of integers containing indices to the columns of input to keep/discard.

    Alternatively, an array of booleans of length equal to the number of cells, specifying the columns of input to keep/discard.

    Alternatively, a tuple of such arrays, to be combined into a single filtering vector according to options.intersect.

  • options (FilterCellsOptions) – Optional parameters.

Returns:

If options.with_retain_vector = False, the filtered matrix is directly returned, either as a TatamiNumericPointer if input is also a TatamiNumericPointer; as a DelayedArray, if input is array-like and delayed = True; or an object of the same type as input otherwise.

If options.with_retain_vector = True, a tuple is returned containing the filtered matrix and a NumPy integer array containing the column indices of input for the cells that were retained.

scranpy.quality_control.guess_mito_from_symbols module#

scranpy.quality_control.guess_mito_from_symbols.guess_mito_from_symbols(symbols, prefix)[source]#

Guess mitochondrial genes from their gene symbols.

Parameters:
  • symbols (Sequence[str]) – List of gene symbols.

  • prefix (str) – Case-insensitive prefix to guess mitochondrial genes.

Return type:

Sequence[int]

Returns:

List of integer indices for the guessed mitochondrial genes.

scranpy.quality_control.per_cell_adt_qc_metrics module#

class scranpy.quality_control.per_cell_adt_qc_metrics.PerCellAdtQcMetricsOptions(subsets=None, assay_type=0, cell_names=None, num_threads=1)[source]#

Bases: object

Optional arguments for per_cell_adt_qc_metrics().

subsets#

Dictionary of feature subsets. Each key is the name of the subset and each value is an array.

Each array may contain integer indices to the rows of input belonging to the subset. Alternatively, each array is of length equal to the number of rows in input and contains booleans specifying that the corresponding row belongs to the subset.

Defaults to {}.

assay_type#

Assay to use from input if it is a SummarizedExperiment.

cell_names#

Sequence of cell names of length equal to the number of columns in input. If provided, this is used as the row names of the output data frames.

num_threads#

Number of threads to use. Defaults to 1.

__annotations__ = {'assay_type': typing.Union[int, str], 'cell_names': typing.Optional[typing.Sequence[str]], 'num_threads': <class 'int'>, 'subsets': typing.Optional[typing.Mapping]}#
__dataclass_fields__ = {'assay_type': Field(name='assay_type',type=typing.Union[int, str],default=0,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'cell_names': Field(name='cell_names',type=typing.Optional[typing.Sequence[str]],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'num_threads': Field(name='num_threads',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'subsets': Field(name='subsets',type=typing.Optional[typing.Mapping],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
__dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
__eq__(other)#

Return self==value.

__hash__ = None#
__repr__()#

Return repr(self).

assay_type: Union[int, str] = 0#
cell_names: Optional[Sequence[str]] = None#
num_threads: int = 1#
subsets: Optional[Mapping] = None#
scranpy.quality_control.per_cell_adt_qc_metrics.per_cell_adt_qc_metrics(input, options=PerCellAdtQcMetricsOptions(subsets=None, assay_type=0, cell_names=None, num_threads=1))[source]#

Compute per-cell quality control metrics for ADT data. This includes the number of detected tags per cell, where low values are indicative of problems with transcript capture; and the total count for particular tag subsets, typically isotype controls where high values are indicative of protein aggregates. We also report the total count for each cell for diagnostic purposes.

Parameters:
Raises:

TypeError – If input is not an expected matrix type.

Return type:

BiocFrame

Returns:

A data frame containing one row per cell and the following fields - "sums", the total count for each cell; "detected", the number of detected features for each cell; and "subset_totals", a nested BiocFrame where each column is named after an entry in subsets and contains the proportion of counts in that subset.

scranpy.quality_control.per_cell_crispr_qc_metrics module#

class scranpy.quality_control.per_cell_crispr_qc_metrics.PerCellCrisprQcMetricsOptions(assay_type=0, cell_names=None, num_threads=1)[source]#

Bases: object

Optional arguments for per_cell_crispr_qc_metrics().

assay_type#

Assay to use from input if it is a SummarizedExperiment.

cell_names#

Sequence of cell names of length equal to the number of columns in input. If provided, this is used as the row names of the output data frames.

num_threads#

Number of threads to use. Defaults to 1.

__annotations__ = {'assay_type': typing.Union[int, str], 'cell_names': typing.Optional[typing.Sequence[str]], 'num_threads': <class 'int'>}#
__dataclass_fields__ = {'assay_type': Field(name='assay_type',type=typing.Union[int, str],default=0,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'cell_names': Field(name='cell_names',type=typing.Optional[typing.Sequence[str]],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'num_threads': Field(name='num_threads',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
__dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
__eq__(other)#

Return self==value.

__hash__ = None#
__repr__()#

Return repr(self).

assay_type: Union[int, str] = 0#
cell_names: Optional[Sequence[str]] = None#
num_threads: int = 1#
scranpy.quality_control.per_cell_crispr_qc_metrics.per_cell_crispr_qc_metrics(input, options=PerCellCrisprQcMetricsOptions(assay_type=0, cell_names=None, num_threads=1))[source]#

Compute per-cell quality control metrics for CRISPR data. This includes the total count for each cell, where low values are indicative of unsuccessful transfection or problems with library preparation or sequencing; the number of detected guides per cell, where high values represent multiple transfections; the proportion of counts in the most abundant guide construct, where low values indicate that the cell was transfected with multiple guides. The identity of the most abundant guide is also reported.

Parameters:
Raises:

TypeError – If input is not an expected matrix type.

Return type:

BiocFrame

Returns:

A data frame containing one row per cell and the following fields - "sums", the total count for each cell; "detected", the number of detected features for each cell; "max_proportion", the proportion of counts in the most abundant guide; and "max_index", the row index of the most abundant guide.

scranpy.quality_control.per_cell_rna_qc_metrics module#

class scranpy.quality_control.per_cell_rna_qc_metrics.PerCellRnaQcMetricsOptions(subsets=None, assay_type=0, cell_names=None, num_threads=1)[source]#

Bases: object

Optional arguments for per_cell_rna_qc_metrics().

subsets#

Dictionary of feature subsets. Each key is the name of the subset and each value is an array.

Each array may contain integer indices to the rows of input belonging to the subset. Alternatively, each array is of length equal to the number of rows in input and contains booleans specifying that the corresponding row belongs to the subset.

Defaults to {}.

assay_type#

Assay to use from input if it is a SummarizedExperiment.

cell_names#

Sequence of cell names of length equal to the number of columns in input. If provided, this is used as the row names of the output data frames.

num_threads#

Number of threads to use. Defaults to 1.

__annotations__ = {'assay_type': typing.Union[str, int], 'cell_names': typing.Optional[typing.Sequence[str]], 'num_threads': <class 'int'>, 'subsets': typing.Optional[typing.Mapping]}#
__dataclass_fields__ = {'assay_type': Field(name='assay_type',type=typing.Union[str, int],default=0,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'cell_names': Field(name='cell_names',type=typing.Optional[typing.Sequence[str]],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'num_threads': Field(name='num_threads',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'subsets': Field(name='subsets',type=typing.Optional[typing.Mapping],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
__dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
__eq__(other)#

Return self==value.

__hash__ = None#
__repr__()#

Return repr(self).

assay_type: Union[str, int] = 0#
cell_names: Optional[Sequence[str]] = None#
num_threads: int = 1#
subsets: Optional[Mapping] = None#
scranpy.quality_control.per_cell_rna_qc_metrics.per_cell_rna_qc_metrics(input, options=PerCellRnaQcMetricsOptions(subsets=None, assay_type=0, cell_names=None, num_threads=1))[source]#

Compute per-cell quality control metrics for RNA data. This includes the total count for each cell, where low values are indicative of problems with library preparation or sequencing; the number of detected features per cell, where low values are indicative of problems with transcript capture; and the proportion of counts in particular feature subsets, typically mitochondrial genes where high values are indicative of cell damage.

Parameters:
Raises:

TypeError – If input is not an expected matrix type.

Return type:

BiocFrame

Returns:

A data frame containing one row per cell and the following fields - "sums", the total count for each cell; "detected", the number of detected features for each cell; and "subset_proportions", a nested BiocFrame where each column is named after an entry in subsets and contains the proportion of counts in that subset.

scranpy.quality_control.suggest_adt_qc_filters module#

class scranpy.quality_control.suggest_adt_qc_filters.SuggestAdtQcFiltersOptions(block=None, num_mads=3, custom_thresholds=None)[source]#

Bases: object

Optional arguments for suggest_adt_qc_filters().

block#

Block assignment for each cell. Thresholds are computed within each block to avoid inflated variances from inter-block differences.

If provided, this should have length equal to the number of cells, where cells have the same value if and only if they are in the same block. Defaults to None, indicating all cells are part of the same block.

num_mads#

Number of median absolute deviations for computing an outlier threshold. Larger values will result in a less stringent threshold. Defaults to 3.

custom_thresholds#

Data frame containing one or more columns with the same names as those in the return value of suggest_adt_qc_filters(). If a column is present, it should contain custom thresholds for the corresponding metric and will override any suggested thresholds in the final BiocFrame.

If block = None, this data frame should contain one row. Otherwise, the number of rows should be equal to the number of blocks, where each row contains a block-specific threshold for the relevant metrics. The identity of each block should be stored in the row names.

__annotations__ = {'block': typing.Optional[typing.Sequence], 'custom_thresholds': typing.Optional[biocframe.BiocFrame.BiocFrame], 'num_mads': <class 'int'>}#
__dataclass_fields__ = {'block': Field(name='block',type=typing.Optional[typing.Sequence],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'custom_thresholds': Field(name='custom_thresholds',type=typing.Optional[biocframe.BiocFrame.BiocFrame],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'num_mads': Field(name='num_mads',type=<class 'int'>,default=3,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
__dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
__eq__(other)#

Return self==value.

__hash__ = None#
__repr__()#

Return repr(self).

block: Optional[Sequence] = None#
custom_thresholds: Optional[BiocFrame] = None#
num_mads: int = 3#
scranpy.quality_control.suggest_adt_qc_filters.suggest_adt_qc_filters(metrics, options=SuggestAdtQcFiltersOptions(block=None, num_mads=3, custom_thresholds=None))[source]#

Suggest filter thresholds for ADT-based per-cell quality control (QC) metrics. This identifies outliers on the relevant tail of the distribution of each relevant QC metric (namely the number of detected tags and the isotype subset totals; total counts per cell are diagnostic and are not used here). Outlier cells are considered to be low- quality and should be removed before further analysis.

Parameters:
Raises:

ValueError, TypeError – If provided inputs are incorrect type or do not contain expected metrics.

Return type:

BiocFrame

Returns:

A data frame containing one row per block and the following fields - "detected", the suggested (lower) threshold on the number of detected features for each cell; and "subset_totals", a nested BiocFrame where each column is named after an entry in subsets and contains the suggested (upper) threshold on the total count in that subset.

If options.block is None, all cells are assumed to belong to a single block, and the output BiocFrame contains a single row.

scranpy.quality_control.suggest_crispr_qc_filters module#

class scranpy.quality_control.suggest_crispr_qc_filters.SuggestCrisprQcFiltersOptions(block=None, num_mads=3, custom_thresholds=None)[source]#

Bases: object

Optional arguments for suggest_crispr_qc_filters().

block#

Block assignment for each cell. Thresholds are computed within each block to avoid inflated variances from inter-block differences.

If provided, this should have length equal to the number of cells, where cells have the same value if and only if they are in the same block. Defaults to None, indicating all cells are part of the same block.

num_mads#

Number of median absolute deviations for computing an outlier threshold. Larger values will result in a less stringent threshold. Defaults to 3.

custom_thresholds#

Data frame containing one or more columns with the same names as those in the return value of suggest_crispr_qc_filters(). If a column is present, it should contain custom thresholds for the corresponding metric and will override any suggested thresholds in the final BiocFrame.

If block = None, this data frame should contain one row. Otherwise, the number of rows should be equal to the number of blocks, where each row contains a block-specific threshold for the relevant metrics. The identity of each block should be stored in the row names.

__annotations__ = {'block': typing.Optional[typing.Sequence], 'custom_thresholds': typing.Optional[biocframe.BiocFrame.BiocFrame], 'num_mads': <class 'int'>}#
__dataclass_fields__ = {'block': Field(name='block',type=typing.Optional[typing.Sequence],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'custom_thresholds': Field(name='custom_thresholds',type=typing.Optional[biocframe.BiocFrame.BiocFrame],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'num_mads': Field(name='num_mads',type=<class 'int'>,default=3,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
__dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
__eq__(other)#

Return self==value.

__hash__ = None#
__repr__()#

Return repr(self).

block: Optional[Sequence] = None#
custom_thresholds: Optional[BiocFrame] = None#
num_mads: int = 3#
scranpy.quality_control.suggest_crispr_qc_filters.suggest_crispr_qc_filters(metrics, options=SuggestCrisprQcFiltersOptions(block=None, num_mads=3, custom_thresholds=None))[source]#

Suggest filter thresholds for CRISPR-based per-cell quality control (QC) metrics. This identifies outliers on the low tail of the distribution of the count for the most abundant guide across cells, aiming to remove cells that have low counts due to failed transfection. (Multiple transfections are not considered undesirable at this point.)

Parameters:
Raises:

ValueError, TypeError – If provided inputs are incorrect type or do not contain expected metrics.

Return type:

BiocFrame

Returns:

A data frame containing one row per block and the following fields - "max_count", the suggested (lower) threshold on the maximum count.

If options.block is None, all cells are assumed to belong to a single block, and the output BiocFrame contains a single row.

scranpy.quality_control.suggest_rna_qc_filters module#

class scranpy.quality_control.suggest_rna_qc_filters.SuggestRnaQcFiltersOptions(block=None, num_mads=3, custom_thresholds=None)[source]#

Bases: object

Optional arguments for suggest_rna_qc_filters().

block#

Block assignment for each cell. Thresholds are computed within each block to avoid inflated variances from inter-block differences.

If provided, this should have length equal to the number of cells, where cells have the same value if and only if they are in the same block. Defaults to None, indicating all cells are part of the same block.

num_mads#

Number of median absolute deviations for computing an outlier threshold. Larger values will result in a less stringent threshold. Defaults to 3.

custom_thresholds#

Data frame containing one or more columns with the same names as those in the return value of suggest_rna_qc_filters(). If a column is present, it should contain custom thresholds for the corresponding metric and will override any suggested thresholds in the final BiocFrame.

If block = None, this data frame should contain one row. Otherwise, the number of rows should be equal to the number of blocks, where each row contains a block-specific threshold for the relevant metrics. The identity of each block should be stored in the row names.

__annotations__ = {'block': typing.Optional[typing.Sequence], 'custom_thresholds': typing.Optional[biocframe.BiocFrame.BiocFrame], 'num_mads': <class 'int'>}#
__dataclass_fields__ = {'block': Field(name='block',type=typing.Optional[typing.Sequence],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'custom_thresholds': Field(name='custom_thresholds',type=typing.Optional[biocframe.BiocFrame.BiocFrame],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'num_mads': Field(name='num_mads',type=<class 'int'>,default=3,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
__dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
__eq__(other)#

Return self==value.

__hash__ = None#
__repr__()#

Return repr(self).

block: Optional[Sequence] = None#
custom_thresholds: Optional[BiocFrame] = None#
num_mads: int = 3#
scranpy.quality_control.suggest_rna_qc_filters.suggest_rna_qc_filters(metrics, options=SuggestRnaQcFiltersOptions(block=None, num_mads=3, custom_thresholds=None))[source]#

Suggest filter thresholds for RNA-based per-cell quality control (QC) metrics. This identifies outliers on the relevant tail of the distribution of each QC metric. Outlier cells are considered to be low-quality and should be removed before further analysis.

Parameters:
  • metrics (BiocFrame) – A data frame containing QC metrics for each cell, see the output of per_cell_rna_qc_metrics() for the expected format.

  • options (SuggestRnaQcFiltersOptions) – Optional parameters.

Raises:

ValueError, TypeError – If provided inputs are incorrect type or do not contain expected metrics.

Return type:

BiocFrame

Returns:

A data frame containing one row per block and the following fields - "sums", the suggested (lower) threshold on the total count for each cell; "detected", the suggested (lower) threshold on the number of detected features for each cell; and "subset_proportions", a nested BiocFrame where each column is named after an entry in subsets and contains the suggested (upper) threshold on the proportion of counts in that subset.

If options.block is None, all cells are assumed to belong to a single block, and the output BiocFrame contains a single row.

Module contents#