singler package

Submodules

singler.annotate_integrated module

singler.annotate_integrated.annotate_integrated(test_data, ref_data_list, test_features=None, ref_labels_list=None, ref_features_list=None, test_assay_type=0, test_check_missing=True, ref_assay_type='logcounts', ref_check_missing=True, build_single_args={}, classify_single_args={}, build_integrated_args={}, classify_integrated_args={}, num_threads=1)[source]

Annotate a single-cell expression dataset based on the correlation of each cell to profiles in multiple labelled references, where the annotation from each reference is then integrated across references.

Parameters:
  • test_data (Any) –

    A matrix-like object representing the test dataset, where rows are features and columns are samples (usually cells). Entries should be expression values; only the ranking within each column will be used.

    Alternatively, a SummarizedExperiment containing such a matrix in one of its assays.

  • test_features (Union[Sequence, str, None]) –

    Sequence of length equal to the number of rows in test_data, containing the feature identifier for each row.

    Alternatively, if test_data is a SummarizedExperiment, test_features may be a string speciying the column name in row_data that contains the features. It can also be set to None, to use the row_names of the experiment as features.

  • ref_data_list (Sequence[Union[Any, str]]) –

    Sequence consisting of one or more of the following:

    • A matrix-like object representing the reference dataset, where rows are features and columns are samples. Entries should be expression values, usually log-transformed (see comments for the ref argument in build_single_reference()).

    • A SummarizedExperiment object containing such a matrix in its assays.

    • A string that can be passed as name to fetch_github_reference(). This will use the specified dataset as the reference.

  • ref_labels_list (Union[str, None, Sequence[Union[Sequence, str]]]) –

    Sequence of the same length as ref_data, where the contents depend on the type of value in the corresponding entry of ref_data:

    • If ref_data_list[i] is a matrix-like object, ref_labels_list[i] should be a sequence of length equal to the number of columns of ref_data_list[i], containing the label associated with each column.

    • If ref_data_list[i] is a string, ref_labels_list[i] should be a string specifying the label type to use, e.g., “main”, “fine”, “ont”. If a single string is supplied, it is recycled for all ref_data.

    • If ref_data_list[i] is a SummarizedExperiment, ref_labels_list[i] may be a string speciying the column name in column_data that contains the features. It can also be set to None, to use the `column_names`of the experiment as features.

  • ref_features_list (Union[str, None, Sequence[Union[Sequence, str]]]) –

    Sequence of the same length as ref_data_list, where the contents depend on the type of value in the corresponding entry of ref_data:

    • If ref_data_list[i] is a matrix-like object, ref_features_list[i] should be a sequence of length equal to the number of rows of ref_data_list[i], containing the feature identifier associated with each row.

    • If ref_data_list[i] is a string, ref_features_list[i] should be a string specifying the feature type to use, e.g., “ensembl”, “symbol”. If a single string is supplied, it is recycled for all ref_data.

    • If ref_data_list[i] is a SummarizedExperiment, ref_features_list[i] may be a string speciying the column name in row_data that contains the features. It can also be set to None, to use the row_names of the experiment as features.

  • test_assay_type (Union[str, int]) – Assay of test_data containing the expression matrix, if test_data is a SummarizedExperiment.

  • test_check_missing (bool) – Whether to check for and remove missing (i.e., NaN) values from the test dataset.

  • ref_assay_type (Union[str, int]) – Assay containing the expression matrix for any entry of ref_data_list that is a SummarizedExperiment.

  • ref_check_missing (bool) – Whether to check for and remove missing (i.e., NaN) values from the reference datasets.

  • build_single_args (dict) – Further arguments to pass to build_single_reference().

  • classify_single_args (dict) – Further arguments to pass to classify_single_reference().

  • build_integrated_args (dict) – Further arguments to pass to build_integrated_references().

  • classify_integrated_args (dict) – Further arguments to pass to classify_integrated_references().

  • num_threads (int) – Number of threads to use for the various steps.

Return type:

Tuple[list[BiocFrame], BiocFrame]

Returns:

Tuple where the first element contains per-reference results (i.e. a list of BiocFrame outputs equivalent to running annotate_single() on each reference) and the second element contains integrated results across references (i.e., a BiocFrame from classify_integrated_references()).

singler.annotate_single module

singler.annotate_single.annotate_single(test_data, ref_data, ref_labels, test_features=None, ref_features=None, build_args={}, classify_args={}, num_threads=1)[source]

Annotate a single-cell expression dataset based on the correlation of each cell to profiles in a labelled reference.

Parameters:
  • test_data (Any) –

    A matrix-like object representing the test dataset, where rows are features and columns are samples (usually cells). Entries should be expression values; only the ranking within each column will be used.

    Alternatively, a SummarizedExperiment containing such a matrix in one of its assays. Non-default assay types can be specified in classify_args.

  • test_features (Union[Sequence, str, None]) –

    Sequence of length equal to the number of rows in test_data, containing the feature identifier for each row.

    Alternatively, if test_data is a SummarizedExperiment, test_features may be a string speciying the column name in row_data that contains the features. It can also be set to None, to use the row_names of the experiment as features.

  • ref_data (Any) –

    A matrix-like object representing the reference dataset, where rows are features and columns are samples. Entries should be expression values, usually log-transformed (see comments for the ref argument in build_single_reference()).

    Alternatively, a SummarizedExperiment containing such a matrix in one of its assays. Non-default assay types can be specified in classify_args.

  • ref_labels (Union[Sequence, str, None]) –

    If ref_data is a matrix-like object, ref_labels should be a sequence of length equal to the number of columns of ref_data, containing the label associated with each column.

    Alternatively, if ref_data is a SummarizedExperiment, ref_labels may be a string specifying the label type to use, e.g., “main”, “fine”, “ont”. It can also be set to None, to use the row_names of the experiment as features.

  • ref_features (Union[Sequence, str, None]) –

    If ref_data is a matrix-like object, ref_features should be a sequence of length equal to the number of rows of ref_data, containing the feature identifier associated with each row.

    Alternatively, if ref_data is a SummarizedExperiment, ref_features may be a string speciying the column name in column_data that contains the features. It can also be set to None, to use the row_names of the experiment as features.

  • build_args (dict) – Further arguments to pass to build_single_reference().

  • classify_args (dict) – Further arguments to pass to classify_single_reference().

  • num_threads (int) – Number of threads to use for the various steps.

Return type:

BiocFrame

Returns:

A data frame containing the labelling results, see classify_single_reference() for details. The metadata also contains a markers dictionary, specifying the markers that were used for each pairwise comparison between labels; and a list of unique_markers across all labels.

singler.build_integrated_references module

class singler.build_integrated_references.IntegratedReferences(ptr, ref_names, ref_labels, test_features)[source]

Bases: object

Object containing integrated references, typically constructed by build_integrated_references().

property reference_labels: list

List of lists containing the names of the labels for each reference.

Each entry corresponds to a reference in reference_names, if reference_names is not None.

property reference_names: Sequence[str] | None

Sequence containing the names of the references. Alternatively None, if no names were supplied.

property test_features: list[str]

Sequence containing the names of the test features.

singler.build_integrated_references.build_integrated_references(test_features, ref_data_list, ref_labels_list, ref_features_list, ref_prebuilt_list, ref_names=None, assay_type='logcounts', check_missing=True, num_threads=1)[source]

Build a set of integrated references for classification of a test dataset.

Parameters:
  • test_features (Sequence) – Sequence of features for the test dataset.

  • ref_data_list (dict) – List of reference datasets, where each entry is equivalent to ref_data in build_single_reference().

  • ref_labels_list (list[Sequence]) – List of reference labels, where each entry is equivalent to ref_labels in build_single_reference().

  • ref_features_list (list[Sequence]) – List of reference features, where each entry is equivalent to ref_features in build_single_reference().

  • ref_prebuilt_list (list[SinglePrebuiltReference]) – List of prebuilt references, typically created by calling build_single_reference() on the corresponding elements of ref_data_list, ref_labels_list and ref_features_list.

  • ref_names (Optional[Sequence[str]]) – Sequence of names for the references. If None, these are automatically generated.

  • assasy_type – Assay containing the expression matrix for any entry of ref_data_list that is a SummarizedExperiment.

  • check_missing (bool) – Whether to check for and remove rows with missing (NaN) values from each entry of ref_data_list.

  • num_threads (int) – Number of threads.

Return type:

IntegratedReferences

Returns:

Integrated references for classification with classify_integrated_references().

singler.build_single_reference module

class singler.build_single_reference.SinglePrebuiltReference(ptr, labels, features, markers)[source]

Bases: object

A prebuilt reference object, typically created by build_single_reference(). This is intended for advanced users only and should not be serialized.

property features: Sequence

Returns: The universe of features known to this reference, usually as strings.

property labels: Sequence

Returns: Unique labels in this reference.

marker_subset(indices_only=False)[source]
Parameters:

indices_only (bool) – Whether to return the markers as indices into features, or as a list of feature identifiers.

Return type:

Union[ndarray, list]

Returns:

If indices_only = False, a list of feature identifiers for the markers.

If indices_only = True, a NumPy array containing the integer indices of features in features that were chosen as markers.

property markers: dict[Any, dict[Any, Sequence]]

Returns: Markers for every pairwise comparison between labels.

num_labels()[source]
Return type:

int

Returns:

Number of unique labels in this reference.

num_markers()[source]
Return type:

int

Returns:

Number of markers to be used for classification. This is the same as the size of the array from marker_subset().

singler.build_single_reference.build_single_reference(ref_data, ref_labels, ref_features, assay_type='logcounts', check_missing=True, restrict_to=None, markers=None, marker_method='classic', marker_args={}, approximate=True, num_threads=1)[source]

Build a single reference dataset in preparation for classification.

Parameters:
  • ref_data (Any) –

    A matrix-like object where rows are features, columns are reference profiles, and each entry is the expression value. If markers is not provided, expression should be normalized and log-transformed in preparation for marker prioritization via differential expression analyses. Otherwise, any expression values are acceptable as only the ranking within each column is used.

    Alternatively, a SummarizedExperiment containing such a matrix in one of its assays.

  • labels – Sequence of labels for each reference profile, i.e., column in ref.

  • features – Sequence of identifiers for each feature, i.e., row in ref.

  • assay_type (Union[str, int]) – Assay containing the expression matrix, if ref_data is a SummarizedExperiment.

  • check_missing (bool) – Whether to check for and remove rows with missing (NaN) values from ref_data.

  • restrict_to (Union[set, dict, None]) – Subset of available features to restrict to. Only features in restrict_to will be used in the reference building. If None, no restriction is performed.

  • markers (Optional[dict[Any, dict[Any, Sequence]]]) – Upregulated markers for each pairwise comparison between labels. Specifically, markers[a][b] should be a sequence of features that are upregulated in a compared to b. All such features should be present in features, and all labels in labels should have keys in the inner and outer dictionaries.

  • marker_method (Literal['classic']) – Method to identify markers from each pairwise comparisons between labels in ref_data. If “classic”, we call get_classic_markers(). Only used if markers is not supplied.

  • marker_args (dict) – Further arguments to pass to the chosen marker detection method. Only used if markers is not supplied.

  • approximate (bool) – Whether to use an approximate neighbor search to compute scores during classification.

  • num_threads (int) – Number of threads to use for reference building.

Return type:

SinglePrebuiltReference

Returns:

The pre-built reference, ready for use in downstream methods like classify_single_reference().

singler.classify_integrated_references module

singler.classify_integrated_references.classify_integrated_references(test_data, results, integrated_prebuilt, assay_type=0, quantile=0.8, num_threads=1)[source]

Integrate classification results across multiple references for a single test dataset.

Parameters:
  • test_data (Any) –

    A matrix-like object where each row is a feature and each column is a test sample (usually a single cell), containing expression values. Normalized and/or transformed expression values are also acceptable as only the ranking is used within this function.

    Alternatively, a SummarizedExperiment containing such a matrix in one of its assays.

  • results (list[Union[BiocFrame, Sequence]]) – List of classification results generated by running classify_single_reference() on test_data with each reference. This may be either the full data frame or just the "best" column. References should be ordered as in integrated_prebuilt.reference_names.

  • integrated_prebuilt (IntegratedReferences) – Integrated reference object, constructed with build_integrated_references().

  • assay_type (Union[str, int]) – Assay containing the expression matrix, if test_data is a SummarizedExperiment.

  • quantile (float) – Quantile of the correlation distribution for computing the score for each label. Larger values increase sensitivity of matches at the expense of similarity to the average behavior of each label.

  • num_threads (int) – Number of threads to use during classification.

Return type:

BiocFrame

Returns:

A data frame containing the best_label across all references, defined as the assigned label in the best reference; the identity of the best_reference, either as a name string or an integer index; the scores for each reference, as a nested BiocFrame; and the delta from the best to the second-best reference. Each row corresponds to a column of test.

singler.classify_single_reference module

singler.classify_single_reference.classify_single_reference(test_data, test_features, ref_prebuilt, assay_type=0, check_missing=True, quantile=0.8, use_fine_tune=True, fine_tune_threshold=0.05, num_threads=1)[source]

Classify a test dataset against a reference by assigning labels from the latter to each column of the former using the SingleR algorithm.

Parameters:
  • test_data (Any) –

    A matrix-like object where each row is a feature and each column is a test sample (usually a single cell), containing expression values. Normalized and transformed expression values are also acceptable as only the ranking is used within this function.

    Alternatively, a SummarizedExperiment containing such a matrix in one of its assays.

  • test_features (Sequence) –

    Sequence of identifiers for each feature in the test dataset, i.e., row in test_data.

    If test_data is a SummarizedExperiment, test_features may be a string speciying the column name in row_data`that contains the features. Alternatively can be set to `None, to use the row_names of the experiment as used as features.

  • ref_prebuilt (SinglePrebuiltReference) – A pre-built reference created with build_single_reference().

  • assay_type (Union[str, int]) – Assay containing the expression matrix, if test_data is a SummarizedExperiment.

  • check_missing (bool) – Whether to check for and remove rows with missing (NaN) values from test_data.

  • quantile (float) – Quantile of the correlation distribution for computing the score for each label. Larger values increase sensitivity of matches at the expense of similarity to the average behavior of each label.

  • use_fine_tune (bool) – Whether fine-tuning should be performed. This improves accuracy for distinguishing between similar labels but requires more computational work.

  • fine_tune_threshold (float) – Maximum difference from the maximum correlation to use in fine-tuning. All labels above this threshold are used for another round of fine-tuning.

  • num_threads (int) – Number of threads to use during classification.

Return type:

BiocFrame

Returns:

A data frame containing the best label, the scores for each label (as a nested BiocFrame), and the delta from the best to the second-best label. Each row corresponds to a column of test.

singler.get_classic_markers module

singler.get_classic_markers.get_classic_markers(ref_data, ref_labels, ref_features, assay_type='logcounts', check_missing=True, restrict_to=None, num_de=None, num_threads=1)[source]

Compute markers from a reference using the classic SingleR algorithm. This is typically done for reference datasets derived from replicated bulk transcriptomic experiments.

Parameters:
  • ref_data (Union[Any, list[Any]]) –

    A matrix-like object containing the log-normalized expression values of a reference dataset. Each column is a sample and each row is a feature.

    Alternatively, this can be a SummarizedExperiment containing a matrix-like object in one of its assays.

    Alternatively, a list of such matrices or SummarizedExperiment objects, typically for multiple batches of the same reference; it is assumed that different batches exhibit at least some overlap in their features and labels.

  • ref_labels (Union[Sequence, list[Sequence]]) – A sequence of length equal to the number of columns of ref, containing a label (usually a string) for each column. Alternatively, a list of such sequences of length equal to that of a list ref; each sequence should have length equal to the number of columns of the corresponding entry of ref.

  • ref_features (Union[Sequence, list[Sequence]]) – A sequence of length equal to the number of rows of ref, containing the feature name (usually a string) for each row. Alternatively, a list of such sequences of length equal to that of a list ref; each sequence should have length equal to the number of rows of the corresponding entry of ref.

  • assay_type (Union[str, int]) – Name or index of the assay containing the assay of interest, if ref is or contains SummarizedExperiment objects.

  • check_missing (bool) – Whether to check for and remove rows with missing (NaN) values in the reference matrices. This can be set to False if it is known that no NaN values exist.

  • restrict_to (Union[set, dict, None]) – Subset of available features to restrict to. Only features in restrict_to will be used in the reference building. If None, no restriction is performed.

  • num_de (Optional[int]) – Number of differentially expressed genes to use as markers for each pairwise comparison between labels. If None, an appropriate number of genes is automatically determined.

  • num_threads (int) – Number of threads to use for the calculations.

Return type:

dict[Any, dict[Any, list]]

Returns:

A dictionary of dictionary of lists containing the markers for each pairwise comparison between labels, i.e., markers[a][b] contains the upregulated markers for label a over label b.

singler.get_classic_markers.number_of_classic_markers(num_labels)[source]

Compute the number of markers to detect for a given number of labels, using the classic SingleR marker detection algorithm.

Parameters:

num_labels (int) – Number of labels.

Returns:

Number of markers.

Return type:

int

Module contents