singler package¶

Submodules¶

singler.annotate_integrated module¶

singler.annotate_integrated.annotate_integrated(test_data, ref_data_list, test_features=None, ref_labels_list=None, ref_features_list=None, test_assay_type=0, test_check_missing=True, ref_assay_type='logcounts', ref_check_missing=True, build_single_args={}, classify_single_args={}, build_integrated_args={}, classify_integrated_args={}, num_threads=1)[source]¶

Annotate a single-cell expression dataset based on the correlation of each cell to profiles in multiple labelled references, where the annotation from each reference is then integrated across references.

Parameters:

test_data (Any) –
A matrix-like object representing the test dataset, where rows are features and columns are samples (usually cells). Entries should be expression values; only the ranking within each column will be used.

Alternatively, a SummarizedExperiment containing such a matrix in one of its assays.
test_features (Union[Sequence, str, None]) –
Sequence of length equal to the number of rows in test_data, containing the feature identifier for each row.

Alternatively, if test_data is a SummarizedExperiment, test_features may be a string speciying the column name in row_data that contains the features. It can also be set to None, to use the row_names of the experiment as features.
ref_data_list (Sequence[Union[Any, str]]) –
Sequence consisting of one or more of the following:
- A matrix-like object representing the reference dataset, where rows are features and columns are samples. Entries should be expression values, usually log-transformed (see comments for the ref argument in build_single_reference()).
- A SummarizedExperiment object containing such a matrix in its assays.
- A string that can be passed as name to fetch_github_reference(). This will use the specified dataset as the reference.
ref_labels_list (Union[str, None, Sequence[Union[Sequence, str]]]) –
Sequence of the same length as ref_data, where the contents depend on the type of value in the corresponding entry of ref_data:
- If ref_data_list[i] is a matrix-like object, ref_labels_list[i] should be a sequence of length equal to the number of columns of ref_data_list[i], containing the label associated with each column.
- If ref_data_list[i] is a string, ref_labels_list[i] should be a string specifying the label type to use, e.g., “main”, “fine”, “ont”. If a single string is supplied, it is recycled for all ref_data.
- If ref_data_list[i] is a SummarizedExperiment, ref_labels_list[i] may be a string speciying the column name in column_data that contains the features. It can also be set to None, to use the `column_names`of the experiment as features.
ref_features_list (Union[str, None, Sequence[Union[Sequence, str]]]) –
Sequence of the same length as ref_data_list, where the contents depend on the type of value in the corresponding entry of ref_data:
- If ref_data_list[i] is a matrix-like object, ref_features_list[i] should be a sequence of length equal to the number of rows of ref_data_list[i], containing the feature identifier associated with each row.
- If ref_data_list[i] is a string, ref_features_list[i] should be a string specifying the feature type to use, e.g., “ensembl”, “symbol”. If a single string is supplied, it is recycled for all ref_data.
- If ref_data_list[i] is a SummarizedExperiment, ref_features_list[i] may be a string speciying the column name in row_data that contains the features. It can also be set to None, to use the row_names of the experiment as features.
test_assay_type (Union[str, int]) – Assay of test_data containing the expression matrix, if test_data is a SummarizedExperiment.
test_check_missing (bool) – Whether to check for and remove missing (i.e., NaN) values from the test dataset.
ref_assay_type (Union[str, int]) – Assay containing the expression matrix for any entry of ref_data_list that is a SummarizedExperiment.
ref_check_missing (bool) – Whether to check for and remove missing (i.e., NaN) values from the reference datasets.
build_single_args (dict) – Further arguments to pass to build_single_reference().
classify_single_args (dict) – Further arguments to pass to classify_single_reference().
build_integrated_args (dict) – Further arguments to pass to build_integrated_references().
classify_integrated_args (dict) – Further arguments to pass to classify_integrated_references().
num_threads (int) – Number of threads to use for the various steps.

Return type:

Tuple[list[BiocFrame], BiocFrame]

Returns:

Tuple where the first element contains per-reference results (i.e. a list of BiocFrame outputs equivalent to running annotate_single() on each reference) and the second element contains integrated results across references (i.e., a BiocFrame from classify_integrated_references()).

singler.annotate_single module¶

singler.annotate_single.annotate_single(test_data, ref_data, ref_labels, test_features=None, ref_features=None, build_args={}, classify_args={}, num_threads=1)[source]¶

Annotate a single-cell expression dataset based on the correlation of each cell to profiles in a labelled reference.

Parameters:

test_data (Any) –
A matrix-like object representing the test dataset, where rows are features and columns are samples (usually cells). Entries should be expression values; only the ranking within each column will be used.

Alternatively, a SummarizedExperiment containing such a matrix in one of its assays. Non-default assay types can be specified in classify_args.
test_features (Union[Sequence, str, None]) –
Sequence of length equal to the number of rows in test_data, containing the feature identifier for each row.

Alternatively, if test_data is a SummarizedExperiment, test_features may be a string speciying the column name in row_data that contains the features. It can also be set to None, to use the row_names of the experiment as features.
ref_data (Any) –
A matrix-like object representing the reference dataset, where rows are features and columns are samples. Entries should be expression values, usually log-transformed (see comments for the ref argument in build_single_reference()).

Alternatively, a SummarizedExperiment containing such a matrix in one of its assays. Non-default assay types can be specified in classify_args.
ref_labels (Union[Sequence, str, None]) –
If ref_data is a matrix-like object, ref_labels should be a sequence of length equal to the number of columns of ref_data, containing the label associated with each column.

Alternatively, if ref_data is a SummarizedExperiment, ref_labels may be a string specifying the label type to use, e.g., “main”, “fine”, “ont”. It can also be set to None, to use the row_names of the experiment as features.
ref_features (Union[Sequence, str, None]) –
If ref_data is a matrix-like object, ref_features should be a sequence of length equal to the number of rows of ref_data, containing the feature identifier associated with each row.

Alternatively, if ref_data is a SummarizedExperiment, ref_features may be a string speciying the column name in column_data that contains the features. It can also be set to None, to use the row_names of the experiment as features.
build_args (dict) – Further arguments to pass to build_single_reference().
classify_args (dict) – Further arguments to pass to classify_single_reference().
num_threads (int) – Number of threads to use for the various steps.

Return type:

BiocFrame

Returns:

A data frame containing the labelling results, see classify_single_reference() for details. The metadata also contains a markers dictionary, specifying the markers that were used for each pairwise comparison between labels; and a list of unique_markers across all labels.

singler.build_integrated_references module¶

class singler.build_integrated_references.IntegratedReferences(ptr, ref_names, ref_labels, test_features)[source]¶

Bases: object

Object containing integrated references, typically constructed by build_integrated_references().

property reference_labels: list¶

List of lists containing the names of the labels for each reference.

Each entry corresponds to a reference in reference_names, if reference_names is not None.

property reference_names: Sequence[str] | None¶: Sequence containing the names of the references. Alternatively None, if no names were supplied.

property test_features: list[str]¶: Sequence containing the names of the test features.

singler.build_integrated_references.build_integrated_references(test_features, ref_data_list, ref_labels_list, ref_features_list, ref_prebuilt_list, ref_names=None, assay_type='logcounts', check_missing=True, num_threads=1)[source]¶

Build a set of integrated references for classification of a test dataset.

Parameters:

test_features (Sequence) – Sequence of features for the test dataset.
ref_data_list (dict) – List of reference datasets, where each entry is equivalent to ref_data in build_single_reference().
ref_labels_list (list[Sequence]) – List of reference labels, where each entry is equivalent to ref_labels in build_single_reference().
ref_features_list (list[Sequence]) – List of reference features, where each entry is equivalent to ref_features in build_single_reference().
ref_prebuilt_list (list[SinglePrebuiltReference]) – List of prebuilt references, typically created by calling build_single_reference() on the corresponding elements of ref_data_list, ref_labels_list and ref_features_list.
ref_names (Optional[Sequence[str]]) – Sequence of names for the references. If None, these are automatically generated.
assasy_type – Assay containing the expression matrix for any entry of ref_data_list that is a SummarizedExperiment.
check_missing (bool) – Whether to check for and remove rows with missing (NaN) values from each entry of ref_data_list.
num_threads (int) – Number of threads.

Return type:

IntegratedReferences

Returns:

Integrated references for classification with classify_integrated_references().

singler.build_single_reference module¶

class singler.build_single_reference.SinglePrebuiltReference(ptr, labels, features, markers)[source]¶

Bases: object

A prebuilt reference object, typically created by build_single_reference(). This is intended for advanced users only and should not be serialized.

property features: Sequence¶: Returns: The universe of features known to this reference, usually as strings.

property labels: Sequence¶: Returns: Unique labels in this reference.

marker_subset(indices_only=False)[source]¶

Parameters:

indices_only (bool) – Whether to return the markers as indices into features, or as a list of feature identifiers.

Return type:

Union[ndarray, list]

Returns:

If indices_only = False, a list of feature identifiers for the markers.

If indices_only = True, a NumPy array containing the integer indices of features in features that were chosen as markers.

property markers: dict[Any, dict[Any, Sequence]]¶: Returns: Markers for every pairwise comparison between labels.

num_labels()[source]¶

Return type:: int
Returns:: Number of unique labels in this reference.

num_markers()[source]¶

Return type:: int
Returns:: Number of markers to be used for classification. This is the same as the size of the array from marker_subset().

singler.build_single_reference.build_single_reference(ref_data, ref_labels, ref_features, assay_type='logcounts', check_missing=True, restrict_to=None, markers=None, marker_method='classic', marker_args={}, approximate=True, num_threads=1)[source]¶

Build a single reference dataset in preparation for classification.

Parameters:

ref_data (Any) –
A matrix-like object where rows are features, columns are reference profiles, and each entry is the expression value. If markers is not provided, expression should be normalized and log-transformed in preparation for marker prioritization via differential expression analyses. Otherwise, any expression values are acceptable as only the ranking within each column is used.

Alternatively, a SummarizedExperiment containing such a matrix in one of its assays.
labels – Sequence of labels for each reference profile, i.e., column in ref.
features – Sequence of identifiers for each feature, i.e., row in ref.
assay_type (Union[str, int]) – Assay containing the expression matrix, if ref_data is a SummarizedExperiment.
check_missing (bool) – Whether to check for and remove rows with missing (NaN) values from ref_data.
restrict_to (Union[set, dict, None]) – Subset of available features to restrict to. Only features in restrict_to will be used in the reference building. If None, no restriction is performed.
markers (Optional[dict[Any, dict[Any, Sequence]]]) – Upregulated markers for each pairwise comparison between labels. Specifically, markers[a][b] should be a sequence of features that are upregulated in a compared to b. All such features should be present in features, and all labels in labels should have keys in the inner and outer dictionaries.
marker_method (Literal['classic']) – Method to identify markers from each pairwise comparisons between labels in ref_data. If “classic”, we call get_classic_markers(). Only used if markers is not supplied.
marker_args (dict) – Further arguments to pass to the chosen marker detection method. Only used if markers is not supplied.
approximate (bool) – Whether to use an approximate neighbor search to compute scores during classification.
num_threads (int) – Number of threads to use for reference building.

Return type:

SinglePrebuiltReference

Returns:

The pre-built reference, ready for use in downstream methods like classify_single_reference().

singler.classify_integrated_references module¶

singler.classify_integrated_references.classify_integrated_references(test_data, results, integrated_prebuilt, assay_type=0, quantile=0.8, num_threads=1)[source]¶

Integrate classification results across multiple references for a single test dataset.

Parameters:

test_data (Any) –
A matrix-like object where each row is a feature and each column is a test sample (usually a single cell), containing expression values. Normalized and/or transformed expression values are also acceptable as only the ranking is used within this function.

Alternatively, a SummarizedExperiment containing such a matrix in one of its assays.
results (list[Union[BiocFrame, Sequence]]) – List of classification results generated by running classify_single_reference() on test_data with each reference. This may be either the full data frame or just the "best" column. References should be ordered as in integrated_prebuilt.reference_names.
integrated_prebuilt (IntegratedReferences) – Integrated reference object, constructed with build_integrated_references().
assay_type (Union[str, int]) – Assay containing the expression matrix, if test_data is a SummarizedExperiment.
quantile (float) – Quantile of the correlation distribution for computing the score for each label. Larger values increase sensitivity of matches at the expense of similarity to the average behavior of each label.
num_threads (int) – Number of threads to use during classification.

Return type:

BiocFrame

Returns:

A data frame containing the best_label across all references, defined as the assigned label in the best reference; the identity of the best_reference, either as a name string or an integer index; the scores for each reference, as a nested BiocFrame; and the delta from the best to the second-best reference. Each row corresponds to a column of test.

singler.classify_single_reference module¶

singler.classify_single_reference.classify_single_reference(test_data, test_features, ref_prebuilt, assay_type=0, check_missing=True, quantile=0.8, use_fine_tune=True, fine_tune_threshold=0.05, num_threads=1)[source]¶

Classify a test dataset against a reference by assigning labels from the latter to each column of the former using the SingleR algorithm.

Parameters:

test_data (Any) –
A matrix-like object where each row is a feature and each column is a test sample (usually a single cell), containing expression values. Normalized and transformed expression values are also acceptable as only the ranking is used within this function.

Alternatively, a SummarizedExperiment containing such a matrix in one of its assays.
test_features (Sequence) –
Sequence of identifiers for each feature in the test dataset, i.e., row in test_data.

If test_data is a SummarizedExperiment, test_features may be a string speciying the column name in row_data`that contains the features. Alternatively can be set to `None, to use the row_names of the experiment as used as features.
ref_prebuilt (SinglePrebuiltReference) – A pre-built reference created with build_single_reference().
assay_type (Union[str, int]) – Assay containing the expression matrix, if test_data is a SummarizedExperiment.
check_missing (bool) – Whether to check for and remove rows with missing (NaN) values from test_data.
quantile (float) – Quantile of the correlation distribution for computing the score for each label. Larger values increase sensitivity of matches at the expense of similarity to the average behavior of each label.
use_fine_tune (bool) – Whether fine-tuning should be performed. This improves accuracy for distinguishing between similar labels but requires more computational work.
fine_tune_threshold (float) – Maximum difference from the maximum correlation to use in fine-tuning. All labels above this threshold are used for another round of fine-tuning.
num_threads (int) – Number of threads to use during classification.

Return type:

BiocFrame

Returns:

A data frame containing the best label, the scores for each label (as a nested BiocFrame), and the delta from the best to the second-best label. Each row corresponds to a column of test.

singler.get_classic_markers module¶

singler.get_classic_markers.get_classic_markers(ref_data, ref_labels, ref_features, assay_type='logcounts', check_missing=True, restrict_to=None, num_de=None, num_threads=1)[source]¶

Compute markers from a reference using the classic SingleR algorithm. This is typically done for reference datasets derived from replicated bulk transcriptomic experiments.

Parameters:

ref_data (Union[Any, list[Any]]) –
A matrix-like object containing the log-normalized expression values of a reference dataset. Each column is a sample and each row is a feature.

Alternatively, this can be a SummarizedExperiment containing a matrix-like object in one of its assays.

Alternatively, a list of such matrices or SummarizedExperiment objects, typically for multiple batches of the same reference; it is assumed that different batches exhibit at least some overlap in their features and labels.
ref_labels (Union[Sequence, list[Sequence]]) – A sequence of length equal to the number of columns of ref, containing a label (usually a string) for each column. Alternatively, a list of such sequences of length equal to that of a list ref; each sequence should have length equal to the number of columns of the corresponding entry of ref.
ref_features (Union[Sequence, list[Sequence]]) – A sequence of length equal to the number of rows of ref, containing the feature name (usually a string) for each row. Alternatively, a list of such sequences of length equal to that of a list ref; each sequence should have length equal to the number of rows of the corresponding entry of ref.
assay_type (Union[str, int]) – Name or index of the assay containing the assay of interest, if ref is or contains SummarizedExperiment objects.
check_missing (bool) – Whether to check for and remove rows with missing (NaN) values in the reference matrices. This can be set to False if it is known that no NaN values exist.
restrict_to (Union[set, dict, None]) – Subset of available features to restrict to. Only features in restrict_to will be used in the reference building. If None, no restriction is performed.
num_de (Optional[int]) – Number of differentially expressed genes to use as markers for each pairwise comparison between labels. If None, an appropriate number of genes is automatically determined.
num_threads (int) – Number of threads to use for the calculations.

Return type:

dict[Any, dict[Any, list]]

Returns:

A dictionary of dictionary of lists containing the markers for each pairwise comparison between labels, i.e., markers[a][b] contains the upregulated markers for label a over label b.

singler.get_classic_markers.number_of_classic_markers(num_labels)[source]¶

Compute the number of markers to detect for a given number of labels, using the classic SingleR marker detection algorithm.

Parameters:: num_labels (int) – Number of labels.
Returns:: Number of markers.
Return type:: int

singler package¶

Submodules¶

singler.annotate_integrated module¶

singler.annotate_single module¶

singler.build_integrated_references module¶

singler.build_single_reference module¶

singler.classify_integrated_references module¶

singler.classify_single_reference module¶

singler.get_classic_markers module¶

Module contents¶