scranpy.quality_control package#
Submodules#
scranpy.quality_control.create_adt_qc_filter module#
- class scranpy.quality_control.create_adt_qc_filter.CreateAdtQcFilterOptions(block=None)[source]#
Bases:
object
Optional arguments for
create_adt_qc_filter()
.- block#
Block assignment for each cell. This should be the same as that used in in
suggest_adt_qc_filters()
.
- __annotations__ = {'block': typing.Optional[typing.Sequence]}#
- __dataclass_fields__ = {'block': Field(name='block',type=typing.Optional[typing.Sequence],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
- __eq__(other)#
Return self==value.
- __hash__ = None#
- __repr__()#
Return repr(self).
- scranpy.quality_control.create_adt_qc_filter.create_adt_qc_filter(metrics, thresholds, options=CreateAdtQcFilterOptions(block=None))[source]#
Defines a filtering vector based on the RNA-derived per-cell quality control (QC) metrics and thresholds.
- Parameters:
metrics (
BiocFrame
) – Data frame of metrics, seeper_cell_adt_qc_metrics()
for the expected format.thresholds (
BiocFrame
) – Data frame of filter thresholds, seesuggest_adt_qc_filters()
for the expected format.options (
CreateAdtQcFilterOptions
) – Optional parameters.
- Return type:
- Returns:
A boolean array where True entries mark the cells to be discarded.
scranpy.quality_control.create_crispr_qc_filter module#
- class scranpy.quality_control.create_crispr_qc_filter.CreateCrisprQcFilterOptions(block=None)[source]#
Bases:
object
Optional arguments for
create_crispr_qc_filter()
.- block#
Block assignment for each cell. This should be the same as that used in in
suggest_crispr_qc_filters()
.
- __annotations__ = {'block': typing.Optional[typing.Sequence]}#
- __dataclass_fields__ = {'block': Field(name='block',type=typing.Optional[typing.Sequence],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
- __eq__(other)#
Return self==value.
- __hash__ = None#
- __repr__()#
Return repr(self).
- scranpy.quality_control.create_crispr_qc_filter.create_crispr_qc_filter(metrics, thresholds, options=CreateCrisprQcFilterOptions(block=None))[source]#
Defines a filtering vector based on the RNA-derived per-cell quality control (QC) metrics and thresholds.
- Parameters:
metrics (
BiocFrame
) – Data frame of metrics, seeper_cell_crispr_qc_metrics()
for the expected format.thresholds (
BiocFrame
) – Data frame of filter thresholds, seesuggest_crispr_qc_filters()
for the expected format.options (
CreateCrisprQcFilterOptions
) – Optional parameters.
- Return type:
- Returns:
A boolean array where True entries mark the cells to be discarded.
scranpy.quality_control.create_rna_qc_filter module#
- class scranpy.quality_control.create_rna_qc_filter.CreateRnaQcFilterOptions(block=None)[source]#
Bases:
object
Optional arguments for
create_rna_qc_filter()
.- block#
Block assignment for each cell. This should be the same as that used in in
suggest_rna_qc_filters()
.
- __annotations__ = {'block': typing.Optional[typing.Sequence]}#
- __dataclass_fields__ = {'block': Field(name='block',type=typing.Optional[typing.Sequence],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
- __eq__(other)#
Return self==value.
- __hash__ = None#
- __repr__()#
Return repr(self).
- scranpy.quality_control.create_rna_qc_filter.create_rna_qc_filter(metrics, thresholds, options=CreateRnaQcFilterOptions(block=None))[source]#
Defines a filtering vector based on the RNA-derived per-cell quality control (QC) metrics and thresholds.
- Parameters:
metrics (
BiocFrame
) – Data frame of metrics, seeper_cell_rna_qc_metrics()
for the expected format.thresholds (
BiocFrame
) – Data frame of filter thresholds, seesuggest_rna_qc_filters()
for the expected format.options (
CreateRnaQcFilterOptions
) – Optional parameters.
- Return type:
- Returns:
A boolean array where True entries mark the cells to be discarded.
scranpy.quality_control.filter_cells module#
- class scranpy.quality_control.filter_cells.FilterCellsOptions(discard=True, intersect=False, with_retain_vector=False, delayed=True)[source]#
Bases:
object
Optional arguments for
filter_cells()
.- discard#
Whether to discard the cells listed in
filter
. If False, the specified cells are retained instead, and all other cells are discarded. Defaults to True.
- intersect#
Whether to take the intersection or union of multiple
filter
arrays, to create a combined filtering array. Note that this is orthogonal todiscard
.
- with_retain_vector#
Whether to return a vector specifying which cells are to be retained.
- delayed#
Whether to force the filtering operation to be delayed. This reduces memory usage by avoiding unnecessary copies of the count matrix.
- __annotations__ = {'delayed': <class 'bool'>, 'discard': <class 'bool'>, 'intersect': <class 'bool'>, 'with_retain_vector': <class 'bool'>}#
- __dataclass_fields__ = {'delayed': Field(name='delayed',type=<class 'bool'>,default=True,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'discard': Field(name='discard',type=<class 'bool'>,default=True,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'intersect': Field(name='intersect',type=<class 'bool'>,default=False,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'with_retain_vector': Field(name='with_retain_vector',type=<class 'bool'>,default=False,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
- __eq__(other)#
Return self==value.
- __hash__ = None#
- __repr__()#
Return repr(self).
- scranpy.quality_control.filter_cells.filter_cells(input, filter, options=FilterCellsOptions(discard=True, intersect=False, with_retain_vector=False, delayed=True))[source]#
Filter out low-quality cells, usually based on metrics and filter thresholds defined from the data, e.g.,
create_rna_qc_filter()
.- Parameters:
input – Matrix-like object containing cells in columns and features in rows. This should be a matrix class that can be converted into a
TatamiNumericPointer
. Developers may also provide theTatamiNumericPointer
itself.filter (
Union
[Sequence
[int
],Sequence
[bool
],tuple
]) –Array of integers containing indices to the columns of input to keep/discard.
Alternatively, an array of booleans of length equal to the number of cells, specifying the columns of input to keep/discard.
Alternatively, a tuple of such arrays, to be combined into a single filtering vector according to
options.intersect
.options (
FilterCellsOptions
) – Optional parameters.
- Returns:
If
options.with_retain_vector = False
, the filtered matrix is directly returned, either as aTatamiNumericPointer
ifinput
is also aTatamiNumericPointer
; as aDelayedArray
, ifinput
is array-like anddelayed = True
; or an object of the same type asinput
otherwise.If
options.with_retain_vector = True
, a tuple is returned containing the filtered matrix and a NumPy integer array containing the column indices ofinput
for the cells that were retained.
scranpy.quality_control.guess_mito_from_symbols module#
scranpy.quality_control.per_cell_adt_qc_metrics module#
- class scranpy.quality_control.per_cell_adt_qc_metrics.PerCellAdtQcMetricsOptions(subsets=None, assay_type=0, cell_names=None, num_threads=1)[source]#
Bases:
object
Optional arguments for
per_cell_adt_qc_metrics()
.- subsets#
Dictionary of feature subsets. Each key is the name of the subset and each value is an array.
Each array may contain integer indices to the rows of input belonging to the subset. Alternatively, each array is of length equal to the number of rows in
input
and contains booleans specifying that the corresponding row belongs to the subset.Defaults to {}.
- assay_type#
Assay to use from
input
if it is aSummarizedExperiment
.
- cell_names#
Sequence of cell names of length equal to the number of columns in
input
. If provided, this is used as the row names of the output data frames.
- num_threads#
Number of threads to use. Defaults to 1.
- __annotations__ = {'assay_type': typing.Union[int, str], 'cell_names': typing.Optional[typing.Sequence[str]], 'num_threads': <class 'int'>, 'subsets': typing.Optional[typing.Mapping]}#
- __dataclass_fields__ = {'assay_type': Field(name='assay_type',type=typing.Union[int, str],default=0,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'cell_names': Field(name='cell_names',type=typing.Optional[typing.Sequence[str]],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'num_threads': Field(name='num_threads',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'subsets': Field(name='subsets',type=typing.Optional[typing.Mapping],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
- __eq__(other)#
Return self==value.
- __hash__ = None#
- __repr__()#
Return repr(self).
- scranpy.quality_control.per_cell_adt_qc_metrics.per_cell_adt_qc_metrics(input, options=PerCellAdtQcMetricsOptions(subsets=None, assay_type=0, cell_names=None, num_threads=1))[source]#
Compute per-cell quality control metrics for ADT data. This includes the number of detected tags per cell, where low values are indicative of problems with transcript capture; and the total count for particular tag subsets, typically isotype controls where high values are indicative of protein aggregates. We also report the total count for each cell for diagnostic purposes.
- Parameters:
input (
Union
[TatamiNumericPointer
,SummarizedExperiment
]) –Matrix-like object where rows are features and columns are cells, typically containing expression values of some kind. This should be a matrix class that can be converted into a
TatamiNumericPointer
.Alternatively, a
SummarizedExperiment
containing such a matrix in its assays.Developers may also provide a
TatamiNumericPointer
directly.options (
PerCellAdtQcMetricsOptions
) – Optional parameters.
- Raises:
TypeError – If
input
is not an expected matrix type.- Return type:
- Returns:
A data frame containing one row per cell and the following fields -
"sums"
, the total count for each cell;"detected"
, the number of detected features for each cell; and"subset_totals"
, a nested BiocFrame where each column is named after an entry insubsets
and contains the proportion of counts in that subset.
scranpy.quality_control.per_cell_crispr_qc_metrics module#
- class scranpy.quality_control.per_cell_crispr_qc_metrics.PerCellCrisprQcMetricsOptions(assay_type=0, cell_names=None, num_threads=1)[source]#
Bases:
object
Optional arguments for
per_cell_crispr_qc_metrics()
.- assay_type#
Assay to use from
input
if it is aSummarizedExperiment
.
- cell_names#
Sequence of cell names of length equal to the number of columns in
input
. If provided, this is used as the row names of the output data frames.
- num_threads#
Number of threads to use. Defaults to 1.
- __annotations__ = {'assay_type': typing.Union[int, str], 'cell_names': typing.Optional[typing.Sequence[str]], 'num_threads': <class 'int'>}#
- __dataclass_fields__ = {'assay_type': Field(name='assay_type',type=typing.Union[int, str],default=0,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'cell_names': Field(name='cell_names',type=typing.Optional[typing.Sequence[str]],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'num_threads': Field(name='num_threads',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
- __eq__(other)#
Return self==value.
- __hash__ = None#
- __repr__()#
Return repr(self).
- scranpy.quality_control.per_cell_crispr_qc_metrics.per_cell_crispr_qc_metrics(input, options=PerCellCrisprQcMetricsOptions(assay_type=0, cell_names=None, num_threads=1))[source]#
Compute per-cell quality control metrics for CRISPR data. This includes the total count for each cell, where low values are indicative of unsuccessful transfection or problems with library preparation or sequencing; the number of detected guides per cell, where high values represent multiple transfections; the proportion of counts in the most abundant guide construct, where low values indicate that the cell was transfected with multiple guides. The identity of the most abundant guide is also reported.
- Parameters:
input (
Union
[TatamiNumericPointer
,SummarizedExperiment
]) –Matrix-like object where rows are features and columns are cells, typically containing expression values of some kind. This should be a matrix class that can be converted into a
TatamiNumericPointer
.Alternatively, a
SummarizedExperiment
containing such a matrix in its assays.Developers may also provide a
TatamiNumericPointer
directly.options (
PerCellCrisprQcMetricsOptions
) – Optional parameters.
- Raises:
TypeError – If
input
is not an expected matrix type.- Return type:
- Returns:
A data frame containing one row per cell and the following fields -
"sums"
, the total count for each cell;"detected"
, the number of detected features for each cell;"max_proportion"
, the proportion of counts in the most abundant guide; and"max_index"
, the row index of the most abundant guide.
scranpy.quality_control.per_cell_rna_qc_metrics module#
- class scranpy.quality_control.per_cell_rna_qc_metrics.PerCellRnaQcMetricsOptions(subsets=None, assay_type=0, cell_names=None, num_threads=1)[source]#
Bases:
object
Optional arguments for
per_cell_rna_qc_metrics()
.- subsets#
Dictionary of feature subsets. Each key is the name of the subset and each value is an array.
Each array may contain integer indices to the rows of input belonging to the subset. Alternatively, each array is of length equal to the number of rows in
input
and contains booleans specifying that the corresponding row belongs to the subset.Defaults to {}.
- assay_type#
Assay to use from
input
if it is aSummarizedExperiment
.
- cell_names#
Sequence of cell names of length equal to the number of columns in
input
. If provided, this is used as the row names of the output data frames.
- num_threads#
Number of threads to use. Defaults to 1.
- __annotations__ = {'assay_type': typing.Union[str, int], 'cell_names': typing.Optional[typing.Sequence[str]], 'num_threads': <class 'int'>, 'subsets': typing.Optional[typing.Mapping]}#
- __dataclass_fields__ = {'assay_type': Field(name='assay_type',type=typing.Union[str, int],default=0,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'cell_names': Field(name='cell_names',type=typing.Optional[typing.Sequence[str]],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'num_threads': Field(name='num_threads',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'subsets': Field(name='subsets',type=typing.Optional[typing.Mapping],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
- __eq__(other)#
Return self==value.
- __hash__ = None#
- __repr__()#
Return repr(self).
- scranpy.quality_control.per_cell_rna_qc_metrics.per_cell_rna_qc_metrics(input, options=PerCellRnaQcMetricsOptions(subsets=None, assay_type=0, cell_names=None, num_threads=1))[source]#
Compute per-cell quality control metrics for RNA data. This includes the total count for each cell, where low values are indicative of problems with library preparation or sequencing; the number of detected features per cell, where low values are indicative of problems with transcript capture; and the proportion of counts in particular feature subsets, typically mitochondrial genes where high values are indicative of cell damage.
- Parameters:
input (
Union
[TatamiNumericPointer
,SummarizedExperiment
]) –Matrix-like object where rows are features and columns are cells, typically containing expression values of some kind. This should be a matrix class that can be converted into a
TatamiNumericPointer
.Alternatively, a
SummarizedExperiment
containing such a matrix in its assays.Developers may also provide a
TatamiNumericPointer
directly.options (
PerCellRnaQcMetricsOptions
) – Optional parameters.
- Raises:
TypeError – If
input
is not an expected matrix type.- Return type:
- Returns:
A data frame containing one row per cell and the following fields -
"sums"
, the total count for each cell;"detected"
, the number of detected features for each cell; and"subset_proportions"
, a nested BiocFrame where each column is named after an entry insubsets
and contains the proportion of counts in that subset.
scranpy.quality_control.suggest_adt_qc_filters module#
- class scranpy.quality_control.suggest_adt_qc_filters.SuggestAdtQcFiltersOptions(block=None, num_mads=3, custom_thresholds=None)[source]#
Bases:
object
Optional arguments for
suggest_adt_qc_filters()
.- block#
Block assignment for each cell. Thresholds are computed within each block to avoid inflated variances from inter-block differences.
If provided, this should have length equal to the number of cells, where cells have the same value if and only if they are in the same block. Defaults to None, indicating all cells are part of the same block.
- num_mads#
Number of median absolute deviations for computing an outlier threshold. Larger values will result in a less stringent threshold. Defaults to 3.
- custom_thresholds#
Data frame containing one or more columns with the same names as those in the return value of
suggest_adt_qc_filters()
. If a column is present, it should contain custom thresholds for the corresponding metric and will override any suggested thresholds in the final BiocFrame.If
block = None
, this data frame should contain one row. Otherwise, the number of rows should be equal to the number of blocks, where each row contains a block-specific threshold for the relevant metrics. The identity of each block should be stored in the row names.
- __annotations__ = {'block': typing.Optional[typing.Sequence], 'custom_thresholds': typing.Optional[biocframe.BiocFrame.BiocFrame], 'num_mads': <class 'int'>}#
- __dataclass_fields__ = {'block': Field(name='block',type=typing.Optional[typing.Sequence],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'custom_thresholds': Field(name='custom_thresholds',type=typing.Optional[biocframe.BiocFrame.BiocFrame],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'num_mads': Field(name='num_mads',type=<class 'int'>,default=3,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
- __eq__(other)#
Return self==value.
- __hash__ = None#
- __repr__()#
Return repr(self).
- scranpy.quality_control.suggest_adt_qc_filters.suggest_adt_qc_filters(metrics, options=SuggestAdtQcFiltersOptions(block=None, num_mads=3, custom_thresholds=None))[source]#
Suggest filter thresholds for ADT-based per-cell quality control (QC) metrics. This identifies outliers on the relevant tail of the distribution of each relevant QC metric (namely the number of detected tags and the isotype subset totals; total counts per cell are diagnostic and are not used here). Outlier cells are considered to be low- quality and should be removed before further analysis.
- Parameters:
metrics (
BiocFrame
) – A data frame containing QC metrics for each cell, see the output ofper_cell_adt_qc_metrics()
for the expected format.options (
SuggestAdtQcFiltersOptions
) – Optional parameters.
- Raises:
ValueError, TypeError – If provided
inputs
are incorrect type or do not contain expected metrics.- Return type:
- Returns:
A data frame containing one row per block and the following fields -
"detected"
, the suggested (lower) threshold on the number of detected features for each cell; and"subset_totals"
, a nested BiocFrame where each column is named after an entry insubsets
and contains the suggested (upper) threshold on the total count in that subset.If
options.block
is None, all cells are assumed to belong to a single block, and the output BiocFrame contains a single row.
scranpy.quality_control.suggest_crispr_qc_filters module#
- class scranpy.quality_control.suggest_crispr_qc_filters.SuggestCrisprQcFiltersOptions(block=None, num_mads=3, custom_thresholds=None)[source]#
Bases:
object
Optional arguments for
suggest_crispr_qc_filters()
.- block#
Block assignment for each cell. Thresholds are computed within each block to avoid inflated variances from inter-block differences.
If provided, this should have length equal to the number of cells, where cells have the same value if and only if they are in the same block. Defaults to None, indicating all cells are part of the same block.
- num_mads#
Number of median absolute deviations for computing an outlier threshold. Larger values will result in a less stringent threshold. Defaults to 3.
- custom_thresholds#
Data frame containing one or more columns with the same names as those in the return value of
suggest_crispr_qc_filters()
. If a column is present, it should contain custom thresholds for the corresponding metric and will override any suggested thresholds in the final BiocFrame.If
block = None
, this data frame should contain one row. Otherwise, the number of rows should be equal to the number of blocks, where each row contains a block-specific threshold for the relevant metrics. The identity of each block should be stored in the row names.
- __annotations__ = {'block': typing.Optional[typing.Sequence], 'custom_thresholds': typing.Optional[biocframe.BiocFrame.BiocFrame], 'num_mads': <class 'int'>}#
- __dataclass_fields__ = {'block': Field(name='block',type=typing.Optional[typing.Sequence],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'custom_thresholds': Field(name='custom_thresholds',type=typing.Optional[biocframe.BiocFrame.BiocFrame],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'num_mads': Field(name='num_mads',type=<class 'int'>,default=3,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
- __eq__(other)#
Return self==value.
- __hash__ = None#
- __repr__()#
Return repr(self).
- scranpy.quality_control.suggest_crispr_qc_filters.suggest_crispr_qc_filters(metrics, options=SuggestCrisprQcFiltersOptions(block=None, num_mads=3, custom_thresholds=None))[source]#
Suggest filter thresholds for CRISPR-based per-cell quality control (QC) metrics. This identifies outliers on the low tail of the distribution of the count for the most abundant guide across cells, aiming to remove cells that have low counts due to failed transfection. (Multiple transfections are not considered undesirable at this point.)
- Parameters:
metrics (
BiocFrame
) – A data frame containing QC metrics for each cell, see the output ofper_cell_crispr_qc_metrics()
for the expected format.options (
SuggestCrisprQcFiltersOptions
) – Optional parameters.
- Raises:
ValueError, TypeError – If provided
inputs
are incorrect type or do not contain expected metrics.- Return type:
- Returns:
A data frame containing one row per block and the following fields -
"max_count"
, the suggested (lower) threshold on the maximum count.If
options.block
is None, all cells are assumed to belong to a single block, and the output BiocFrame contains a single row.
scranpy.quality_control.suggest_rna_qc_filters module#
- class scranpy.quality_control.suggest_rna_qc_filters.SuggestRnaQcFiltersOptions(block=None, num_mads=3, custom_thresholds=None)[source]#
Bases:
object
Optional arguments for
suggest_rna_qc_filters()
.- block#
Block assignment for each cell. Thresholds are computed within each block to avoid inflated variances from inter-block differences.
If provided, this should have length equal to the number of cells, where cells have the same value if and only if they are in the same block. Defaults to None, indicating all cells are part of the same block.
- num_mads#
Number of median absolute deviations for computing an outlier threshold. Larger values will result in a less stringent threshold. Defaults to 3.
- custom_thresholds#
Data frame containing one or more columns with the same names as those in the return value of
suggest_rna_qc_filters()
. If a column is present, it should contain custom thresholds for the corresponding metric and will override any suggested thresholds in the final BiocFrame.If
block = None
, this data frame should contain one row. Otherwise, the number of rows should be equal to the number of blocks, where each row contains a block-specific threshold for the relevant metrics. The identity of each block should be stored in the row names.
- __annotations__ = {'block': typing.Optional[typing.Sequence], 'custom_thresholds': typing.Optional[biocframe.BiocFrame.BiocFrame], 'num_mads': <class 'int'>}#
- __dataclass_fields__ = {'block': Field(name='block',type=typing.Optional[typing.Sequence],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'custom_thresholds': Field(name='custom_thresholds',type=typing.Optional[biocframe.BiocFrame.BiocFrame],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'num_mads': Field(name='num_mads',type=<class 'int'>,default=3,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
- __eq__(other)#
Return self==value.
- __hash__ = None#
- __repr__()#
Return repr(self).
- scranpy.quality_control.suggest_rna_qc_filters.suggest_rna_qc_filters(metrics, options=SuggestRnaQcFiltersOptions(block=None, num_mads=3, custom_thresholds=None))[source]#
Suggest filter thresholds for RNA-based per-cell quality control (QC) metrics. This identifies outliers on the relevant tail of the distribution of each QC metric. Outlier cells are considered to be low-quality and should be removed before further analysis.
- Parameters:
metrics (
BiocFrame
) – A data frame containing QC metrics for each cell, see the output ofper_cell_rna_qc_metrics()
for the expected format.options (
SuggestRnaQcFiltersOptions
) – Optional parameters.
- Raises:
ValueError, TypeError – If provided
inputs
are incorrect type or do not contain expected metrics.- Return type:
- Returns:
A data frame containing one row per block and the following fields -
"sums"
, the suggested (lower) threshold on the total count for each cell;"detected"
, the suggested (lower) threshold on the number of detected features for each cell; and"subset_proportions"
, a nested BiocFrame where each column is named after an entry insubsets
and contains the suggested (upper) threshold on the proportion of counts in that subset.If
options.block
is None, all cells are assumed to belong to a single block, and the output BiocFrame contains a single row.