scranpy.dimensionality_reduction package#
Submodules#
scranpy.dimensionality_reduction.combine_embeddings module#
- class scranpy.dimensionality_reduction.combine_embeddings.CombineEmbeddingsOptions(neighbors=20, approximate=True, weights=None, num_threads=True)[source]#
Bases:
object
Options for
combine_embeddings()
.- neighbors#
Number of neighbors to use for approximating the relative variance.
- approximate#
Whether to perform an approximate neighbor search.
- weights#
Weights to apply to each entry of
embeddings
. If None, all embeddings recieve equal weight. If any weight is zero, the corresponding embedding is omitted from the return value.
- num_threads#
Number of threads to use for the neighbor search.
- __annotations__ = {'approximate': <class 'bool'>, 'neighbors': <class 'int'>, 'num_threads': <class 'int'>, 'weights': typing.Optional[typing.List[float]]}#
- __dataclass_fields__ = {'approximate': Field(name='approximate',type=<class 'bool'>,default=True,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'neighbors': Field(name='neighbors',type=<class 'int'>,default=20,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'num_threads': Field(name='num_threads',type=<class 'int'>,default=True,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'weights': Field(name='weights',type=typing.Optional[typing.List[float]],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
- __eq__(other)#
Return self==value.
- __hash__ = None#
- __repr__()#
Return repr(self).
- scranpy.dimensionality_reduction.combine_embeddings.combine_embeddings(embeddings, options=CombineEmbeddingsOptions(neighbors=20, approximate=True, weights=None, num_threads=True))[source]#
Combine multiple embeddings for the same set of cells (e.g., from multi-modal datasets) for integrated downstream analyses like clustering and visualization. This is done after adjusting for differences in local variance between embeddings.
- Parameters:
embeddings (
List
[ndarray
]) – List of embeddings to be combined. Each embedding should be a row-major matrix where rows are cells and columns are dimensions. All embeddings should have the same number of rows.options (
CombineEmbeddingsOptions
) – Optional parameters.
- Return type:
- Returns:
Array containing the combined embedding, where rows are cells and columns are the dimensions from all embeddings with non-zero weight.
scranpy.dimensionality_reduction.run_pca module#
- class scranpy.dimensionality_reduction.run_pca.PcaResult(principal_components, variance_explained)#
Bases:
tuple
Named tuple of results from
run_pca()
.- principal_components:
Matrix of principal component (PC) coordinates, where the rows are cells and columns are PCs.
- variance_explained:
Array of length equal to the number of PCs, containing the percentage of variance explained by each PC.
- __getnewargs__()#
Return self as a plain tuple. Used by copy and pickle.
- static __new__(_cls, principal_components, variance_explained)#
Create new instance of PcaResult(principal_components, variance_explained)
- __repr__()#
Return a nicely formatted representation string
- __slots__ = ()#
- principal_components#
Alias for field number 0
- variance_explained#
Alias for field number 1
- class scranpy.dimensionality_reduction.run_pca.RunPcaOptions(rank=25, subset=None, block=None, scale=False, block_method='project', block_weights=True, num_threads=1, assay_type='logcounts')[source]#
Bases:
object
Optional arguments for
run_pca()
.- rank#
Number of top PCs to compute. Larger values capture more biological structure at the cost of increasing computational work and absorbing more random noise. Defaults to 25.
- subset#
Array specifying which features should be used in the PCA (e.g., highly variable genes from
choose_hvgs()
). This may contain integer indices or booleans. Defaults to None, in which all features are used.
- block#
Block assignment for each cell. This can be used to reduce the effect of inter-block differences on the PCA (see
block_method
for more details).If provided, this should have length equal to the number of cells, where cells have the same value if and only if they are in the same block. Defaults to None, indicating all cells are part of the same block.
- scale#
Whether to scale each feature to unit variance. This improves robustness (i.e., reduces sensitivity) to a small number of highly variable features. Defaults to False.
- block_method#
How to adjust the PCA for the blocking factor.
"regress"
will regress out the factor, effectively performing a PCA onthe residuals. This only makes sense in limited cases, e.g., inter-block differences are linear and the composition of each block is the same.
"project"
will compute the rotation vectors from the residuals butwill project the cells onto the PC space. This focuses the PCA on within-block variance while avoiding any assumptions about the nature of the inter-block differences. Any removal of block effects should be performed separately.
"none"
will ignore any blocking factor, i.e., as ifblock = null
.Any inter-block differences will both contribute to the determination of the rotation vectors and also be preserved in the PC space. Any removal of block effects should be performed separately.
This option is only used if
block
is not null. Defaults to “project”.
- block_weights#
Whether to weight each block so that it contributes the same number of effective observations to the covariance matrix. Defaults to True.
- num_threads#
Number of threads to use. Defaults to 1.
- assay_type#
Assay to use from
input
if it is aSummarizedExperiment
.
- Raises:
ValueError – If
block_method
is not an expected value.
- __annotations__ = {'assay_type': typing.Union[int, str], 'block': typing.Optional[typing.Sequence], 'block_method': typing.Literal['none', 'project', 'regress'], 'block_weights': <class 'bool'>, 'num_threads': <class 'int'>, 'rank': <class 'int'>, 'scale': <class 'bool'>, 'subset': typing.Optional[numpy.ndarray]}#
- __dataclass_fields__ = {'assay_type': Field(name='assay_type',type=typing.Union[int, str],default='logcounts',default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'block': Field(name='block',type=typing.Optional[typing.Sequence],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'block_method': Field(name='block_method',type=typing.Literal['none', 'project', 'regress'],default='project',default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'block_weights': Field(name='block_weights',type=<class 'bool'>,default=True,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'num_threads': Field(name='num_threads',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'rank': Field(name='rank',type=<class 'int'>,default=25,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'scale': Field(name='scale',type=<class 'bool'>,default=False,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'subset': Field(name='subset',type=typing.Optional[numpy.ndarray],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
- __eq__(other)#
Return self==value.
- __hash__ = None#
- __repr__()#
Return repr(self).
- scranpy.dimensionality_reduction.run_pca.run_pca(input, options=RunPcaOptions(rank=25, subset=None, block=None, scale=False, block_method='project', block_weights=True, num_threads=1, assay_type='logcounts'))[source]#
Perform a principal component analysis (PCA) to retain the top PCs. This is used to denoise and compact a dataset by removing later PCs associated with random noise, under the assumption that interesting biological heterogeneity is the major source of variation in the dataset.
- Parameters:
input (
Union
[TatamiNumericPointer
,SummarizedExperiment
]) –Matrix-like object where rows are features and columns are cells, typically containing log-normalized values. This should be a matrix class that can be converted into a
TatamiNumericPointer
.Alternatively, a
SummarizedExperiment
containing such a matrix in its assays.Developers may also provide the
TatamiNumericPointer
itself.options (
RunPcaOptions
) – Optional parameters.
- Raises:
TypeError – If
input
is not an expected type.ValueError – If
options.block
does not match the number of cells.
- Return type:
- Returns:
Object containing the PC coordinates and the variance explained by each PC. The number of PCs is determined by
options.rank
; unless this is larger than the smallest dimension ofinput
, in which case the number of PCs is equal to the smallest dimension instead.
scranpy.dimensionality_reduction.run_tsne module#
- class scranpy.dimensionality_reduction.run_tsne.InitializeTsneOptions(perplexity=30, seed=42, num_threads=1)[source]#
Bases:
object
Optional arguments for
initialize_tsne()
.- perplexity#
Perplexity to use when computing neighbor probabilities. Larger values cause the embedding to focus more on broad structure instead of local structure. Defaults to 30.
- num_threads#
Number of threads to use for the neighbor search and t-SNE iterations. Defaults to 1.
- seed#
Seed to use for random initialization of the t-SNE coordinates. Defaults to 42.
- __annotations__ = {'num_threads': <class 'int'>, 'perplexity': <class 'int'>, 'seed': <class 'int'>}#
- __dataclass_fields__ = {'num_threads': Field(name='num_threads',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'perplexity': Field(name='perplexity',type=<class 'int'>,default=30,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'seed': Field(name='seed',type=<class 'int'>,default=42,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
- __eq__(other)#
Return self==value.
- __hash__ = None#
- __repr__()#
Return repr(self).
- class scranpy.dimensionality_reduction.run_tsne.RunTsneOptions(max_iterations=500, initialize_tsne=<factory>)[source]#
Bases:
object
Optional arguments for
run_tsne()
.- max_iterations#
Maximum number of iterations. Larger numbers improve convergence at the cost of compute time. Defaults to 500.
- initialize_tsne#
Optional arguments for
initialize_tsne()
.
- __annotations__ = {'initialize_tsne': <class 'scranpy.dimensionality_reduction.run_tsne.InitializeTsneOptions'>, 'max_iterations': <class 'int'>}#
- __dataclass_fields__ = {'initialize_tsne': Field(name='initialize_tsne',type=<class 'scranpy.dimensionality_reduction.run_tsne.InitializeTsneOptions'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<class 'scranpy.dimensionality_reduction.run_tsne.InitializeTsneOptions'>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'max_iterations': Field(name='max_iterations',type=<class 'int'>,default=500,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
- __eq__(other)#
Return self==value.
- __hash__ = None#
- __repr__()#
Return repr(self).
-
initialize_tsne:
InitializeTsneOptions
#
- class scranpy.dimensionality_reduction.run_tsne.TsneEmbedding(x, y)#
Bases:
tuple
Named tuple of t-SNE coordinates.
- x:
A NumPy view of length equal to the number of cells, containing the coordinate on the first dimension for each cell.
- y:
A NumPy view of length equal to the number of cells, containing the coordinate on the second dimension for each cell.
- __getnewargs__()#
Return self as a plain tuple. Used by copy and pickle.
- static __new__(_cls, x, y)#
Create new instance of TsneEmbedding(x, y)
- __repr__()#
Return a nicely formatted representation string
- __slots__ = ()#
- x#
Alias for field number 0
- y#
Alias for field number 1
- class scranpy.dimensionality_reduction.run_tsne.TsneStatus(ptr, coordinates)[source]#
Bases:
object
Status of a t-SNE run.
This should not be constructed manually but should be returned by
initialize_tsne()
.- clone()[source]#
Create a deep copy of the current state.
- Return type:
- Returns:
Copy of the current state.
- extract()[source]#
Extract the t-SNE coordinates for each cell at the current iteration.
- Return type:
- Returns:
‘x’ and ‘y’ t-SNE coordinates for all cells.
- iteration()[source]#
Get the current iteration number.
- Return type:
- Returns:
The current iteration number.
- run(iteration)[source]#
Run the t-SNE algorithm up to the specified number of iterations.
- Parameters:
iteration (
int
) – Number of iterations to run to. This should be greater than the current iteration number initeration()
.
- scranpy.dimensionality_reduction.run_tsne.initialize_tsne(input, options=InitializeTsneOptions(perplexity=30, seed=42, num_threads=1))[source]#
Initialize the t-SNE algorithm. This is useful for fine-tuned control over the progress of the algorithm, e.g., to pause/resume the optimization of the coordinates.
- Parameters:
input (
Union
[NeighborIndex
,NeighborResults
,ndarray
]) –Object containing per-cell nearest neighbor results or data that can be used to derive them.
This may be a a 2-dimensional
ndarray
containing per-cell coordinates, where rows are cells and columns are dimensions. This is most typically the result ofrun_pca()
.Alternatively,
input
may be a pre-built neighbor search index (NeighborIndex
) for the dataset, typically constructed from the PC coordinates for all cells.Alternatively,
input
may be pre-computed neighbor search results (NeighborResults
). for all cells in the dataset. The number of neighbors should be consistent with the perplexity provided inInitializeTsneOptions
(see alsotsne_perplexity_to_neighbors()
).options (
InitializeTsneOptions
) – Optional parameters.
- Raises:
TypeError – If
input
is not an expected type.- Return type:
- Returns:
A t-SNE status object for further iterations.
- scranpy.dimensionality_reduction.run_tsne.run_tsne(input, options=RunTsneOptions(max_iterations=500, initialize_tsne=InitializeTsneOptions(perplexity=30, seed=42, num_threads=1)))[source]#
Compute a two-dimensional t-SNE embedding for the cells. Neighboring cells in high-dimensional space are placed next to each other on the embedding for intuitive visualization. This function is a wrapper around
initialize_tsne()
with invocations of therun()
method to the specified number of iterations.- Parameters:
input (
Union
[NeighborResults
,NeighborIndex
,ndarray
]) –Object containing per-cell nearest neighbor results or data that can be used to derive them.
This may be a a 2-dimensional
ndarray
containing per-cell coordinates, where rows are cells and columns are features/dimensions. This is most typically the result ofrun_pca()
.Alternatively,
input
may be a pre-built neighbor search index (NeighborIndex
) for the dataset, typically constructed from the PC coordinates for all cells.Alternatively,
input
may be pre-computed neighbor search results (NeighborResults
). for all cells in the dataset. The number of neighbors should be consistent with the perplexity provided inInitializeTsneOptions
(see alsotsne_perplexity_to_neighbors()
).options (
RunTsneOptions
) – Optional parameters.
- Return type:
- Returns:
Result containing first two dimensions.
- scranpy.dimensionality_reduction.run_tsne.tsne_perplexity_to_neighbors(perplexity)[source]#
Convert the t-SNE perplexity to the required number of neighbors. This is typically used to perform a separate call to
find_nearest_neighbors()
before passing the nearest neighbor results to t-SNE functions.
scranpy.dimensionality_reduction.run_umap module#
- class scranpy.dimensionality_reduction.run_umap.InitializeUmapOptions(min_dist=0.1, num_neighbors=15, num_epochs=500, seed=42, num_threads=1)[source]#
Bases:
object
Optional arguments for
initialize_umap()
.- Parameters:
min_dist (
float
) – Minimum distance between points. Larger values yield more inflated clumps of cells. Defaults to 0.1.num_neighbors (
int
) – Number of neighbors to use in the UMAP algorithm. Larger values focus more on global structure than local structure. Ignored ifinput
is aNeighborResults
object. Defaults to 15.num_epochs (
int
) – Number of epochs to run. Larger values improve convergence at the cost of compute time. Defaults to 500.num_threads (
int
) – Number of threads to use for neighbor detection and the UMAP initialization. Defaults to 1.seed (
int
) – Seed to use for random number generation. Defaults to 42.
- __annotations__ = {'min_dist': <class 'float'>, 'num_epochs': <class 'int'>, 'num_neighbors': <class 'int'>, 'num_threads': <class 'int'>, 'seed': <class 'int'>}#
- __dataclass_fields__ = {'min_dist': Field(name='min_dist',type=<class 'float'>,default=0.1,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'num_epochs': Field(name='num_epochs',type=<class 'int'>,default=500,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'num_neighbors': Field(name='num_neighbors',type=<class 'int'>,default=15,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'num_threads': Field(name='num_threads',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'seed': Field(name='seed',type=<class 'int'>,default=42,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
- __eq__(other)#
Return self==value.
- __hash__ = None#
- __repr__()#
Return repr(self).
- class scranpy.dimensionality_reduction.run_umap.RunUmapOptions(initialize_umap=<factory>)[source]#
Bases:
object
Optional arguments for
run_umap()
.- initialize_umap#
Optional arguments for
initialize_umap()
.
- __annotations__ = {'initialize_umap': <class 'scranpy.dimensionality_reduction.run_umap.InitializeUmapOptions'>}#
- __dataclass_fields__ = {'initialize_umap': Field(name='initialize_umap',type=<class 'scranpy.dimensionality_reduction.run_umap.InitializeUmapOptions'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<class 'scranpy.dimensionality_reduction.run_umap.InitializeUmapOptions'>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}#
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)#
- __eq__(other)#
Return self==value.
- __hash__ = None#
- __repr__()#
Return repr(self).
-
initialize_umap:
InitializeUmapOptions
#
- class scranpy.dimensionality_reduction.run_umap.UmapEmbedding(x, y)#
Bases:
tuple
Named tuple of UMAP coordinates.
- x:
A NumPy view of length equal to the number of cells, containing the coordinate on the first dimension for each cell.
- y:
A NumPy view of length equal to the number of cells, containing the coordinate on the second dimension for each cell.
- __getnewargs__()#
Return self as a plain tuple. Used by copy and pickle.
- static __new__(_cls, x, y)#
Create new instance of UmapEmbedding(x, y)
- __repr__()#
Return a nicely formatted representation string
- __slots__ = ()#
- x#
Alias for field number 0
- y#
Alias for field number 1
- class scranpy.dimensionality_reduction.run_umap.UmapStatus(ptr, coordinates)[source]#
Bases:
object
Status of a UMAP run.
This should not be constructed manually but should be returned by
initialize_umap()
.- clone()[source]#
Create a deep copy of the current state.
- Return type:
- Returns:
Copy of the current state.
- extract()[source]#
Extract the UMAP coordinates for each cell at the current epoch.
- Return type:
- Returns:
x and y UMAP coordinates for all cells.
- scranpy.dimensionality_reduction.run_umap.initialize_umap(input, options=InitializeUmapOptions(min_dist=0.1, num_neighbors=15, num_epochs=500, seed=42, num_threads=1))[source]#
Initialize the UMAP algorithm. This is useful for fine-tuned control over the progress of the algorithm, e.g., to pause/resume the optimization of the coordinates.
input
is either a pre-built neighbor search index for the dataset (NeighborIndex
), or a pre-computed set of neighbor search results for all cells (NeighborResults
). Ifinput
is a matrix (numpy.ndarray
), we compute the nearest neighbors for each cell, assuming it represents the coordinates for each cell, usually the result of PCA step (run_pca()
).- Parameters:
input (
Union
[NeighborResults
,NeighborIndex
,ndarray
]) –Object containing per-cell nearest neighbor results or data that can be used to derive them.
This may be a a 2-dimensional
ndarray
containing per-cell coordinates, where rows are cells and columns are dimensions. This is most typically the result ofrun_pca()
.Alternatively,
input
may be a pre-built neighbor search index (NeighborIndex
) for the dataset, typically constructed from the PC coordinates for all cells.Alternatively,
input
may be pre-computed neighbor search results (NeighborResults
). for all cells in the dataset.options (
InitializeUmapOptions
) – Optional parameters.
- Raises:
TypeError – If
input
is not an expected type.- Return type:
- Returns:
A UMAP status object for iteration through the epochs.
- scranpy.dimensionality_reduction.run_umap.run_umap(input, options=RunUmapOptions(initialize_umap=InitializeUmapOptions(min_dist=0.1, num_neighbors=15, num_epochs=500, seed=42, num_threads=1)))[source]#
Compute a two-dimensional UMAP embedding for the cells. Neighboring cells in high-dimensional space are placed next to each other on the embedding for intuitive visualization. This function is a wrapper around
initialize_umap()
with invocations of therun()
method to the maximum number of epochs.- Parameters:
input (
Union
[NeighborResults
,NeighborIndex
,ndarray
]) –Object containing per-cell nearest neighbor results or data that can be used to derive them.
This may be a a 2-dimensional
ndarray
containing per-cell coordinates, where rows are cells and columns are features/dimensions. This is most typically the result ofrun_pca()
.Alternatively,
input
may be a pre-built neighbor search index (NeighborIndex
) for the dataset, typically constructed from the PC coordinates for all cells.Alternatively,
input
may be pre-computed neighbor search results (NeighborResults
). for all cells in the dataset.options (
RunUmapOptions
) – Optional parameters.
- Return type:
- Returns:
Result containing the first two dimensions.