singler package¶
Submodules¶
singler.annotate_integrated module¶
- singler.annotate_integrated.annotate_integrated(test_data, ref_data_list, test_features=None, ref_labels_list=None, ref_features_list=None, test_assay_type=0, test_check_missing=True, ref_assay_type='logcounts', ref_check_missing=True, build_single_args={}, classify_single_args={}, build_integrated_args={}, classify_integrated_args={}, num_threads=1)[source]¶
Annotate a single-cell expression dataset based on the correlation of each cell to profiles in multiple labelled references, where the annotation from each reference is then integrated across references.
- Parameters:
test_data (
Any
) –A matrix-like object representing the test dataset, where rows are features and columns are samples (usually cells). Entries should be expression values; only the ranking within each column will be used.
Alternatively, a
SummarizedExperiment
containing such a matrix in one of its assays.test_features (
Union
[Sequence
,str
,None
]) –Sequence of length equal to the number of rows in
test_data
, containing the feature identifier for each row.Alternatively, if
test_data
is aSummarizedExperiment
,test_features
may be a string speciying the column name in row_data that contains the features. It can also be set to None, to use the row_names of the experiment as features.ref_data_list (
Sequence
[Union
[Any
,str
]]) –Sequence consisting of one or more of the following:
A matrix-like object representing the reference dataset, where rows are features and columns are samples. Entries should be expression values, usually log-transformed (see comments for the
ref
argument inbuild_single_reference()
).A
SummarizedExperiment
object containing such a matrix in its assays.A string that can be passed as
name
tofetch_github_reference()
. This will use the specified dataset as the reference.
ref_labels_list (
Union
[str
,None
,Sequence
[Union
[Sequence
,str
]]]) –Sequence of the same length as
ref_data
, where the contents depend on the type of value in the corresponding entry ofref_data
:If
ref_data_list[i]
is a matrix-like object,ref_labels_list[i]
should be a sequence of length equal to the number of columns ofref_data_list[i]
, containing the label associated with each column.If
ref_data_list[i]
is a string,ref_labels_list[i]
should be a string specifying the label type to use, e.g., “main”, “fine”, “ont”. If a single string is supplied, it is recycled for allref_data
.If
ref_data_list[i]
is aSummarizedExperiment
,ref_labels_list[i]
may be a string speciying the column name in column_data that contains the features. It can also be set to None, to use the `column_names`of the experiment as features.
ref_features_list (
Union
[str
,None
,Sequence
[Union
[Sequence
,str
]]]) –Sequence of the same length as
ref_data_list
, where the contents depend on the type of value in the corresponding entry ofref_data
:If
ref_data_list[i]
is a matrix-like object,ref_features_list[i]
should be a sequence of length equal to the number of rows ofref_data_list[i]
, containing the feature identifier associated with each row.If
ref_data_list[i]
is a string,ref_features_list[i]
should be a string specifying the feature type to use, e.g., “ensembl”, “symbol”. If a single string is supplied, it is recycled for allref_data
.If
ref_data_list[i]
is aSummarizedExperiment
,ref_features_list[i]
may be a string speciying the column name in row_data that contains the features. It can also be set to None, to use the row_names of the experiment as features.
test_assay_type (
Union
[str
,int
]) – Assay oftest_data
containing the expression matrix, iftest_data
is aSummarizedExperiment
.test_check_missing (
bool
) – Whether to check for and remove missing (i.e., NaN) values from the test dataset.ref_assay_type (
Union
[str
,int
]) – Assay containing the expression matrix for any entry ofref_data_list
that is aSummarizedExperiment
.ref_check_missing (
bool
) – Whether to check for and remove missing (i.e., NaN) values from the reference datasets.build_single_args (
dict
) – Further arguments to pass tobuild_single_reference()
.classify_single_args (
dict
) – Further arguments to pass toclassify_single_reference()
.build_integrated_args (
dict
) – Further arguments to pass tobuild_integrated_references()
.classify_integrated_args (
dict
) – Further arguments to pass toclassify_integrated_references()
.num_threads (
int
) – Number of threads to use for the various steps.
- Return type:
- Returns:
Tuple where the first element contains per-reference results (i.e. a list of BiocFrame outputs equivalent to running
annotate_single()
on each reference) and the second element contains integrated results across references (i.e., a BiocFrame fromclassify_integrated_references()
).
singler.annotate_single module¶
- singler.annotate_single.annotate_single(test_data, ref_data, ref_labels, test_features=None, ref_features=None, build_args={}, classify_args={}, num_threads=1)[source]¶
Annotate a single-cell expression dataset based on the correlation of each cell to profiles in a labelled reference.
- Parameters:
test_data (
Any
) –A matrix-like object representing the test dataset, where rows are features and columns are samples (usually cells). Entries should be expression values; only the ranking within each column will be used.
Alternatively, a
SummarizedExperiment
containing such a matrix in one of its assays. Non-default assay types can be specified inclassify_args
.test_features (
Union
[Sequence
,str
,None
]) –Sequence of length equal to the number of rows in
test_data
, containing the feature identifier for each row.Alternatively, if
test_data
is aSummarizedExperiment
,test_features
may be a string speciying the column name in row_data that contains the features. It can also be set to None, to use the row_names of the experiment as features.ref_data (
Any
) –A matrix-like object representing the reference dataset, where rows are features and columns are samples. Entries should be expression values, usually log-transformed (see comments for the
ref
argument inbuild_single_reference()
).Alternatively, a
SummarizedExperiment
containing such a matrix in one of its assays. Non-default assay types can be specified inclassify_args
.ref_labels (
Union
[Sequence
,str
,None
]) –If
ref_data
is a matrix-like object,ref_labels
should be a sequence of length equal to the number of columns ofref_data
, containing the label associated with each column.Alternatively, if
ref_data
is aSummarizedExperiment
,ref_labels
may be a string specifying the label type to use, e.g., “main”, “fine”, “ont”. It can also be set to None, to use the row_names of the experiment as features.ref_features (
Union
[Sequence
,str
,None
]) –If
ref_data
is a matrix-like object,ref_features
should be a sequence of length equal to the number of rows ofref_data
, containing the feature identifier associated with each row.Alternatively, if
ref_data
is aSummarizedExperiment
,ref_features
may be a string speciying the column name in column_data that contains the features. It can also be set to None, to use the row_names of the experiment as features.build_args (
dict
) – Further arguments to pass tobuild_single_reference()
.classify_args (
dict
) – Further arguments to pass toclassify_single_reference()
.num_threads (
int
) – Number of threads to use for the various steps.
- Return type:
- Returns:
A data frame containing the labelling results, see
classify_single_reference()
for details. The metadata also contains amarkers
dictionary, specifying the markers that were used for each pairwise comparison between labels; and a list ofunique_markers
across all labels.
singler.build_integrated_references module¶
- class singler.build_integrated_references.IntegratedReferences(ptr, ref_names, ref_labels, test_features)[source]¶
Bases:
object
Object containing integrated references, typically constructed by
build_integrated_references()
.- property reference_labels: list¶
List of lists containing the names of the labels for each reference.
Each entry corresponds to a reference in
reference_names
, ifreference_names
is not None.
- singler.build_integrated_references.build_integrated_references(test_features, ref_data_list, ref_labels_list, ref_features_list, ref_prebuilt_list, ref_names=None, assay_type='logcounts', check_missing=True, num_threads=1)[source]¶
Build a set of integrated references for classification of a test dataset.
- Parameters:
test_features (
Sequence
) – Sequence of features for the test dataset.ref_data_list (
dict
) – List of reference datasets, where each entry is equivalent toref_data
inbuild_single_reference()
.ref_labels_list (
list
[Sequence
]) – List of reference labels, where each entry is equivalent toref_labels
inbuild_single_reference()
.ref_features_list (
list
[Sequence
]) – List of reference features, where each entry is equivalent toref_features
inbuild_single_reference()
.ref_prebuilt_list (
list
[SinglePrebuiltReference
]) – List of prebuilt references, typically created by callingbuild_single_reference()
on the corresponding elements ofref_data_list
,ref_labels_list
andref_features_list
.ref_names (
Optional
[Sequence
[str
]]) – Sequence of names for the references. If None, these are automatically generated.assasy_type – Assay containing the expression matrix for any entry of
ref_data_list
that is aSummarizedExperiment
.check_missing (
bool
) – Whether to check for and remove rows with missing (NaN) values from each entry ofref_data_list
.num_threads (
int
) – Number of threads.
- Return type:
- Returns:
Integrated references for classification with
classify_integrated_references()
.
singler.build_single_reference module¶
- class singler.build_single_reference.SinglePrebuiltReference(ptr, labels, features, markers)[source]¶
Bases:
object
A prebuilt reference object, typically created by
build_single_reference()
. This is intended for advanced users only and should not be serialized.- property features: Sequence¶
Returns: The universe of features known to this reference, usually as strings.
- marker_subset(indices_only=False)[source]¶
- Parameters:
indices_only (
bool
) – Whether to return the markers as indices intofeatures
, or as a list of feature identifiers.- Return type:
- Returns:
If
indices_only = False
, a list of feature identifiers for the markers.If
indices_only = True
, a NumPy array containing the integer indices of features infeatures
that were chosen as markers.
- property markers: dict[Any, dict[Any, Sequence]]¶
Returns: Markers for every pairwise comparison between labels.
- num_markers()[source]¶
- Return type:
- Returns:
Number of markers to be used for classification. This is the same as the size of the array from
marker_subset()
.
- singler.build_single_reference.build_single_reference(ref_data, ref_labels, ref_features, assay_type='logcounts', check_missing=True, restrict_to=None, markers=None, marker_method='classic', marker_args={}, approximate=True, num_threads=1)[source]¶
Build a single reference dataset in preparation for classification.
- Parameters:
ref_data (
Any
) –A matrix-like object where rows are features, columns are reference profiles, and each entry is the expression value. If markers is not provided, expression should be normalized and log-transformed in preparation for marker prioritization via differential expression analyses. Otherwise, any expression values are acceptable as only the ranking within each column is used.
Alternatively, a
SummarizedExperiment
containing such a matrix in one of its assays.labels – Sequence of labels for each reference profile, i.e., column in
ref
.features – Sequence of identifiers for each feature, i.e., row in
ref
.assay_type (
Union
[str
,int
]) – Assay containing the expression matrix, if ref_data is aSummarizedExperiment
.check_missing (
bool
) – Whether to check for and remove rows with missing (NaN) values fromref_data
.restrict_to (
Union
[set
,dict
,None
]) – Subset of available features to restrict to. Only features inrestrict_to
will be used in the reference building. If None, no restriction is performed.markers (
Optional
[dict
[Any
,dict
[Any
,Sequence
]]]) – Upregulated markers for each pairwise comparison between labels. Specifically,markers[a][b]
should be a sequence of features that are upregulated ina
compared tob
. All such features should be present infeatures
, and all labels inlabels
should have keys in the inner and outer dictionaries.marker_method (
Literal
['classic'
]) – Method to identify markers from each pairwise comparisons between labels inref_data
. If “classic”, we callget_classic_markers()
. Only used ifmarkers
is not supplied.marker_args (
dict
) – Further arguments to pass to the chosen marker detection method. Only used ifmarkers
is not supplied.approximate (
bool
) – Whether to use an approximate neighbor search to compute scores during classification.num_threads (
int
) – Number of threads to use for reference building.
- Return type:
- Returns:
The pre-built reference, ready for use in downstream methods like
classify_single_reference()
.
singler.classify_integrated_references module¶
- singler.classify_integrated_references.classify_integrated_references(test_data, results, integrated_prebuilt, assay_type=0, quantile=0.8, num_threads=1)[source]¶
Integrate classification results across multiple references for a single test dataset.
- Parameters:
test_data (
Any
) –A matrix-like object where each row is a feature and each column is a test sample (usually a single cell), containing expression values. Normalized and/or transformed expression values are also acceptable as only the ranking is used within this function.
Alternatively, a
SummarizedExperiment
containing such a matrix in one of its assays.results (
list
[Union
[BiocFrame
,Sequence
]]) – List of classification results generated by runningclassify_single_reference()
ontest_data
with each reference. This may be either the full data frame or just the"best"
column. References should be ordered as inintegrated_prebuilt.reference_names
.integrated_prebuilt (
IntegratedReferences
) – Integrated reference object, constructed withbuild_integrated_references()
.assay_type (
Union
[str
,int
]) – Assay containing the expression matrix, if test_data is aSummarizedExperiment
.quantile (
float
) – Quantile of the correlation distribution for computing the score for each label. Larger values increase sensitivity of matches at the expense of similarity to the average behavior of each label.num_threads (
int
) – Number of threads to use during classification.
- Return type:
- Returns:
A data frame containing the
best_label
across all references, defined as the assigned label in the best reference; the identity of thebest_reference
, either as a name string or an integer index; thescores
for each reference, as a nested BiocFrame; and thedelta
from the best to the second-best reference. Each row corresponds to a column oftest
.
singler.classify_single_reference module¶
- singler.classify_single_reference.classify_single_reference(test_data, test_features, ref_prebuilt, assay_type=0, check_missing=True, quantile=0.8, use_fine_tune=True, fine_tune_threshold=0.05, num_threads=1)[source]¶
Classify a test dataset against a reference by assigning labels from the latter to each column of the former using the SingleR algorithm.
- Parameters:
test_data (
Any
) –A matrix-like object where each row is a feature and each column is a test sample (usually a single cell), containing expression values. Normalized and transformed expression values are also acceptable as only the ranking is used within this function.
Alternatively, a
SummarizedExperiment
containing such a matrix in one of its assays.test_features (
Sequence
) –Sequence of identifiers for each feature in the test dataset, i.e., row in
test_data
.If
test_data
is aSummarizedExperiment
,test_features
may be a string speciying the column name in row_data`that contains the features. Alternatively can be set to `None, to use the row_names of the experiment as used as features.ref_prebuilt (
SinglePrebuiltReference
) – A pre-built reference created withbuild_single_reference()
.assay_type (
Union
[str
,int
]) – Assay containing the expression matrix, if test_data is aSummarizedExperiment
.check_missing (
bool
) – Whether to check for and remove rows with missing (NaN) values fromtest_data
.quantile (
float
) – Quantile of the correlation distribution for computing the score for each label. Larger values increase sensitivity of matches at the expense of similarity to the average behavior of each label.use_fine_tune (
bool
) – Whether fine-tuning should be performed. This improves accuracy for distinguishing between similar labels but requires more computational work.fine_tune_threshold (
float
) – Maximum difference from the maximum correlation to use in fine-tuning. All labels above this threshold are used for another round of fine-tuning.num_threads (
int
) – Number of threads to use during classification.
- Return type:
- Returns:
A data frame containing the
best
label, thescores
for each label (as a nested BiocFrame), and thedelta
from the best to the second-best label. Each row corresponds to a column oftest
.
singler.get_classic_markers module¶
- singler.get_classic_markers.get_classic_markers(ref_data, ref_labels, ref_features, assay_type='logcounts', check_missing=True, restrict_to=None, num_de=None, num_threads=1)[source]¶
Compute markers from a reference using the classic SingleR algorithm. This is typically done for reference datasets derived from replicated bulk transcriptomic experiments.
- Parameters:
ref_data (
Union
[Any
,list
[Any
]]) –A matrix-like object containing the log-normalized expression values of a reference dataset. Each column is a sample and each row is a feature.
Alternatively, this can be a
SummarizedExperiment
containing a matrix-like object in one of its assays.Alternatively, a list of such matrices or
SummarizedExperiment
objects, typically for multiple batches of the same reference; it is assumed that different batches exhibit at least some overlap in theirfeatures
andlabels
.ref_labels (
Union
[Sequence
,list
[Sequence
]]) – A sequence of length equal to the number of columns ofref
, containing a label (usually a string) for each column. Alternatively, a list of such sequences of length equal to that of a listref
; each sequence should have length equal to the number of columns of the corresponding entry ofref
.ref_features (
Union
[Sequence
,list
[Sequence
]]) – A sequence of length equal to the number of rows ofref
, containing the feature name (usually a string) for each row. Alternatively, a list of such sequences of length equal to that of a listref
; each sequence should have length equal to the number of rows of the corresponding entry ofref
.assay_type (
Union
[str
,int
]) – Name or index of the assay containing the assay of interest, ifref
is or containsSummarizedExperiment
objects.check_missing (
bool
) – Whether to check for and remove rows with missing (NaN) values in the reference matrices. This can be set to False if it is known that no NaN values exist.restrict_to (
Union
[set
,dict
,None
]) – Subset of available features to restrict to. Only features inrestrict_to
will be used in the reference building. If None, no restriction is performed.num_de (
Optional
[int
]) – Number of differentially expressed genes to use as markers for each pairwise comparison between labels. If None, an appropriate number of genes is automatically determined.num_threads (
int
) – Number of threads to use for the calculations.
- Return type:
- Returns:
A dictionary of dictionary of lists containing the markers for each pairwise comparison between labels, i.e.,
markers[a][b]
contains the upregulated markers for labela
over labelb
.