summarizedexperiment package¶
Submodules¶
summarizedexperiment.BaseSE module¶
- class summarizedexperiment.BaseSE.BaseSE(assays=None, row_data=None, column_data=None, row_names=None, column_names=None, metadata=None, validate=True)[source]¶
Bases:
object
Base class for
SummarizedExperiment
. This class provides common properties and methods that can be utilized across all derived classes.This container represents genomic experiment data in the form of
assays
, features inrow_data
, sample data incolumn_data
, and any other relevantmetadata
.If row_names are not provided, the row_names from row_data are used as the experiment’s row names. Similarly if column_names are not provided the row_names of the column_data are used as the experiment’s column names.
- __getitem__(args)[source]¶
Subset a
SummarizedExperiment
.- Parameters:
args (
Union
[int
,str
,Sequence
,tuple
]) –Integer indices, a boolean filter, or (if the current object is named) names specifying the ranges to be extracted, see
normalize_subscript()
.Alternatively a tuple of length 1. The first entry specifies the rows to retain based on their names or indices.
Alternatively a tuple of length 2. The first entry specifies the rows to retain, while the second entry specifies the columns to retain, based on their names or indices.
- Raises:
ValueError – If too many or too few slices provided.
- Return type:
- Returns:
Same type as caller with the sliced rows and columns.
- __init__(assays=None, row_data=None, column_data=None, row_names=None, column_names=None, metadata=None, validate=True)[source]¶
Initialize an instance of
BaseSE
.- Parameters:
A dictionary containing matrices, with assay names as keys and 2-dimensional matrices represented as either
ndarray
orspmatrix
.Alternatively, you may use any 2-dimensional matrix that has the
shape
property and implements the slice operation using the__getitem__
dunder method.All matrices in assays must be 2-dimensional and have the same shape (number of rows, number of columns).
row_data (
Optional
[BiocFrame
]) –Features, must be the same length as the number of rows of the matrices in assays.
Feature information is coerced to a
BiocFrame
. Defaults to None.column_data (
Optional
[BiocFrame
]) –Sample data, must be the same length as the number of columns of the matrices in assays.
Sample information is coerced to a
BiocFrame
. Defaults to None.row_names (
Optional
[List
[str
]]) –A list of strings, same as the number of rows.
If
row_names
are not provided, these are inferred fromrow_data
.Defaults to None.
column_names (
Optional
[List
[str
]]) –A list of string, same as the number of columns.
if
column_names
are not provided, these are inferred fromcolumn_data
.Defaults to None.
metadata (
Optional
[dict
]) – Additional experimental metadata describing the methods. Defaults to None.validate (
bool
) – Internal use only.
- property assay_names: List[str]¶
Alias for
get_assay_names
.
- property assays: Dict[str, Any]¶
Alias for
get_assays()
.
- property col_names: Names | None¶
Alias for
get_column_names
, provided for back-compatibility.
- property colnames: Names | None¶
Alias for
get_column_names
, provided for back-compatibility.
- property column_names: Names | None¶
Alias for
get_column_names
, provided for back-compatibility.
- property columnnames: Names | None¶
Alias for
get_column_names
, provided for back-compatibility.
- copy()[source]¶
Alias for
__copy__()
.
- get_assay(assay)[source]¶
Convenience method to access an
assays
by name or index.- Parameters:
assay (
Union
[int
,str
]) – Name or index position of the assay.- Raises:
AttributeError – If the assay name does not exist.
IndexError – If index is greater than the number of assays.
- Return type:
- Returns:
Experiment data.
- get_row_data(replace_row_names=True)[source]¶
Get features, the row_names of row_data are replaced by the row_names from the experiment.
- get_slice(rows, columns)[source]¶
Alias for
__getitem__
, for back-compatibility.- Return type:
- property metadata: dict¶
Alias for
get_metadata
.
- property row_names: Names | None¶
Alias for
get_row_names
, provided for back-compatibility.
- property rownames: Names | None¶
Alias for
get_row_names
, provided for back-compatibility.
- set_assay(name, assay, in_place=False)[source]¶
Add or replace
assays
’s.- Parameters:
name (
str
) – New or existing assay name.assay (
Any
) –A 2-dimensional matrix represented as either
ndarray
orspmatrix
.Alternatively, you may use any 2-dimensional matrix that has the
shape
property and implements the slice operation using the__getitem__
dunder method.Dimensions of the matrix must match the shape of the current experiment (number of rows, number of columns).
in_place (
bool
) – Whether to modify theBaseSE
in place.
- Return type:
- Returns:
A modified
BaseSE
object, either as a copy of the original or as a reference to the (in-place-modified) original.
- set_column_data(cols, replace_column_names=False, in_place=False)[source]¶
Set sample data.
- Parameters:
- Return type:
- Returns:
A modified
BaseSE
object, either as a copy of the original or as a reference to the (in-place-modified) original.
- set_column_names(names, in_place=False)[source]¶
Set new column names.
- Parameters:
- Return type:
- Returns:
A modified
BaseSE
object, either as a copy of the original or as a reference to the (in-place-modified) original.
- set_row_data(rows, replace_row_names=False, in_place=False)[source]¶
Set new feature information.
- Parameters:
- Return type:
- Returns:
A modified
BaseSE
object, either as a copy of the original or as a reference to the (in-place-modified) original.
- set_row_names(names, in_place=False)[source]¶
Set new row names.
- Parameters:
- Return type:
- Returns:
A modified
BaseSE
object, either as a copy of the original or as a reference to the (in-place-modified) original.
- subset_assays(rows, columns)[source]¶
Subset all assays by the slice defined by rows and columns.
If both
row_indices
andcol_indices
are None, a shallow copy of the current assays is returned.- Parameters:
rows (
Union
[str
,int
,bool
,Sequence
,None
]) –Row indices to subset.
Integer indices, a boolean filter, or (if the current object is named) names specifying the ranges to be extracted, see
normalize_subscript()
.columns (
Union
[str
,int
,bool
,Sequence
,None
]) –Column indices to subset.
Integer indices, a boolean filter, or (if the current object is named) names specifying the ranges to be extracted, see
normalize_subscript()
.
- Return type:
- Returns:
Sliced experiment data.
- summarizedexperiment.BaseSE.SliceResult¶
alias of
SlicerResult
summarizedexperiment.RangedSummarizedExperiment module¶
- class summarizedexperiment.RangedSummarizedExperiment.RangedSummarizedExperiment(assays=None, row_ranges=None, row_data=None, column_data=None, row_names=None, column_names=None, metadata=None, validate=True)[source]¶
Bases:
SummarizedExperiment
RangedSummarizedExperiment class to represent genomic experiment data, genomic features as
GenomicRanges
orGenomicRangesList
sample data and any additional experimental metadata.Note: If
row_ranges
is empty, None or not agenomicranges.GenomicRanges.GenomicRanges
object, use aSummarizedExperiment
instead.- __annotations__ = {}¶
- __deepcopy__(memo=None, _nil=[])[source]¶
- Returns:
A deep copy of the current
RangedSummarizedExperiment
.
- __init__(assays=None, row_ranges=None, row_data=None, column_data=None, row_names=None, column_names=None, metadata=None, validate=True)[source]¶
Initialize a RangedSummarizedExperiment (RSE) object.
- Parameters:
A dictionary containing matrices, with assay names as keys and 2-dimensional matrices represented as either
ndarray
orspmatrix
.Alternatively, you may use any 2-dimensional matrix that has the
shape
property and implements the slice operation using the__getitem__
dunder method.All matrices in assays must be 2-dimensional and have the same shape (number of rows, number of columns).
row_ranges (
Union
[GenomicRanges
,GenomicRangesList
,None
]) – Genomic features, must be the same length as the number of rows of the matrices in assays.row_data (
Optional
[BiocFrame
]) –Features, must be the same length as the number of rows of the matrices in assays.
Feature information is coerced to a
BiocFrame
. Defaults to None.column_data (
Optional
[BiocFrame
]) –Sample data, must be the same length as the number of columns of the matrices in assays.
Sample information is coerced to a
BiocFrame
. Defaults to None.row_names (
Optional
[List
[str
]]) –A list of strings, same as the number of rows.
If
row_names
are not provided, these are inferred fromrow_data
.Defaults to None.
column_names (
Optional
[List
[str
]]) –A list of string, same as the number of columns.
if
column_names
are not provided, these are inferred fromcolumn_data
.Defaults to None.
metadata (
Optional
[dict
]) – Additional experimental metadata describing the methods. Defaults to None.validate (
bool
) – Internal use only.
- combine_columns(*other)[source]¶
Wrapper around
combine_columns()
.- Return type:
- combine_rows(*other)[source]¶
Wrapper around
combine_rows()
.- Return type:
- copy()[source]¶
Alias for
__copy__()
.
- coverage(shift=0, width=None, weight=1)[source]¶
Calculate coverage for each chromosome.
- Parameters:
- Return type:
- Returns:
A dictionary with chromosome names as keys and the coverage vector as value.
- property end: ndarray¶
Get genomic end positions for each feature or row in experimental data.
- Returns:
A
numpy.ndarray
of end positions.
- find_overlaps(query, query_type='any', select='all', max_gap=-1, min_overlap=1, ignore_strand=False)[source]¶
Find overlaps between subject (self) and query ranges.
- Parameters:
query (
Union
[GenomicRanges
,GenomicRangesList
,RangedSummarizedExperiment
]) –Query intervals to find nearest positions.
query
may be aGenomicRanges
or aRangedSummarizedExperiment
object.query_type (
str
) –Overlap query type, must be one of
”any”: Any overlap is good
”start”: Overlap at the beginning of the intervals
”end”: Must overlap at the end of the intervals
”within”: Fully contain the query interval
Defaults to “any”.
select (
Literal
['all'
,'first'
,'last'
,'arbitrary'
]) – Determine what hit to choose when there are multiple hits for an interval insubject
.max_gap (
int
) – Maximum gap allowed in the overlap. Defaults to -1 (no gap allowed).min_overlap (
int
) – Minimum overlap with query. Defaults to 1.ignore_strand (
bool
) – Whether to ignore strands. Defaults to False.
- Raises:
TypeError – If query is not a RangedSummarizedExperiment or GenomicRanges.
- Return type:
- Returns:
A list with the same length as
query
, containing hits to overlapping indices.
- flank(width, start=True, both=False, ignore_strand=False, in_place=False)[source]¶
Compute flanking ranges for each range.
Refer to either
flank()
or the Bioconductor documentation for more details.- Parameters:
width (
int
) – Width to flank by. May be negative.start (
bool
) – Whether to only flank starts. Defaults to True.both (
bool
) – Whether to flank both starts and ends. Defaults to False.ignore_strand (
bool
) – Whether to ignore strands. Defaults to False.in_place (
bool
) – Whether to modify theGenomicRanges
object in place.
- Return type:
- Returns:
A new RangedSummarizedExperiment object with the flanked ranges, either as a copy of the original or as a reference to the (in-place-modified) original.
- follow(query, select='all', ignore_strand=False)[source]¶
Search nearest positions only upstream that overlap with each range in
query
.- Parameters:
query (
Union
[GenomicRanges
,GenomicRangesList
,RangedSummarizedExperiment
]) –Query intervals to find nearest positions.
query
may be aGenomicRanges
or aRangedSummarizedExperiment
object.select (
Literal
['all'
,'arbitrary'
]) – Determine what hit to choose when there are multiple hits for an interval inquery
.ignore_strand (
bool
) – Whether to ignore strand. Defaults to False.
- Raises:
If query is not a RangedSummarizedExperiment or –
GenomicRanges` –
- Return type:
- Returns:
A List with the same length as
query
, containing hits to nearest indices.
- get_row_ranges()[source]¶
Get genomic feature information.
- Return type:
- Returns:
Genomic feature information.
- narrow(start=None, width=None, end=None, in_place=False)[source]¶
Narrow genomic positions by provided
start
,width
andend
parameters.Important: these parameters are relative shift in positions for each range.
- Parameters:
start (
Union
[int
,List
[int
],ndarray
,None
]) – Relative start position. Defaults to None.width (
Union
[int
,List
[int
],ndarray
,None
]) – Relative end position. Defaults to None.end (
Union
[int
,List
[int
],ndarray
,None
]) – Relative width of the interval. Defaults to None.in_place (
bool
) – Whether to modify theGenomicRanges
object in place.
- Return type:
- Returns:
A new RangedSummarizedExperiment object with narrow positions, either as a copy of the original or as a reference to the (in-place-modified) original.
- nearest(query, select='all', ignore_strand=False)[source]¶
Search nearest positions both upstream and downstream that overlap with each range in
query
.- Parameters:
query (
Union
[GenomicRanges
,GenomicRangesList
,RangedSummarizedExperiment
]) –Query intervals to find nearest positions.
query
may be aGenomicRanges
or aRangedSummarizedExperiment
object.select (
Literal
['all'
,'arbitrary'
]) – Determine what hit to choose when there are multiple hits for an interval inquery
.ignore_strand (
bool
) – Whether to ignore strand. Defaults to False.
- Raises:
TypeError – If
query
is not aRangedSummarizedExperiment
orGenomicRanges
.- Return type:
- Returns:
A list with the same length as
query
, containing hits to nearest indices.
- precede(query, select='all', ignore_strand=False)[source]¶
Search nearest positions only downstream that overlap with each range in
query
.- Parameters:
query (
Union
[GenomicRanges
,GenomicRangesList
,RangedSummarizedExperiment
]) –Query intervals to find nearest positions.
query
may be aGenomicRanges
or aRangedSummarizedExperiment
object.select (
Literal
['all'
,'arbitrary'
]) – Determine what hit to choose when there are multiple hits for an interval inquery
.ignore_strand (
bool
) – Whether to ignore strand. Defaults to False.
- Raises:
TypeError – If
query
is not aRangedSummarizedExperiment
orGenomicRanges
.- Return type:
- Returns:
A List with the same length as
query
, containing hits to nearest indices.
- promoters(upstream=2000, downstream=200, in_place=False)[source]¶
Extend intervals to promoter regions.
- Parameters:
- Return type:
- Returns:
A new RangedSummarizedExperiment object with the extended ranges for promoter regions, either as a copy of the original or as a reference to the (in-place-modified) original.
- relaxed_combine_columns(*other)[source]¶
Wrapper around
relaxed_combine_columns()
.- Return type:
- relaxed_combine_rows(*other)[source]¶
Wrapper around
relaxed_combine_rows()
.- Return type:
- resize(width, fix='start', ignore_strand=False, in_place=False)[source]¶
Resize ranges to the specified
width
where either thestart
,end
, orcenter
is used as an anchor.- Parameters:
width (
Union
[int
,List
[int
],ndarray
]) – Width to resize, cannot be negative!fix (
Literal
['start'
,'end'
,'center'
]) – Fix positions by “start”, “end”, or “center”. Defaults to “start”.ignore_strand (
bool
) – Whether to ignore strands. Defaults to False.in_place (
bool
) – Whether to modify theGenomicRanges
object in place.
- Raises:
ValueError – If
fix
is neitherstart
,center
, orend
.- Return type:
- Returns:
A new RangedSummarizedExperiment object with the resized ranges, either as a copy of the original or as a reference to the (in-place-modified) original.
- restrict(start=None, end=None, keep_all_ranges=False, in_place=False)[source]¶
Restrict ranges to a given start and end positions.
- Parameters:
start (
Union
[int
,List
[int
],ndarray
,None
]) – Start position. Defaults to None.end (
Union
[int
,List
[int
],ndarray
,None
]) – End position. Defaults to None.keep_all_ranges (
bool
) – Whether to keep intervals that do not overlap with start and end. Defaults to False.in_place (
bool
) – Whether to modify theGenomicRanges
object in place.
- Return type:
- Returns:
A new RangedSummarizedExperiment object with restricted intervals, either as a copy of the original or as a reference to the (in-place-modified) original.
- property row_ranges: GenomicRanges | GenomicRangesList¶
Alias for
get_rowranges()
.
- property seq_info: SeqInfo¶
Get sequence information object (if available).
- Returns:
Sequence information.
- property seqnames: List[str]¶
Get sequence or chromosome names.
- Returns:
List of all chromosome names.
- set_row_ranges(row_ranges, in_place=False)[source]¶
Set new genomic features.
- Parameters:
row_ranges (
Union
[GenomicRanges
,GenomicRangesList
,None
]) – Genomic features, must be the same length as the number of rows of the matrices in assays.in_place (
bool
) – Whether to modify theRangeSummarizedExperiment
in place.
- Return type:
- Returns:
A modified
RangeSummarizedExperiment
object, either as a copy of the original or as a reference to the (in-place-modified) original.
- shift(shift=0, in_place=False)[source]¶
Shift all intervals.
shift
may be be negative.- Parameters:
- Return type:
- Returns:
A new RangedSummarizedExperiment object with the shifted ranges, either as a copy of the original or as a reference to the (in-place-modified) original.
- sort(decreasing=False, in_place=False)[source]¶
Sort by ranges.
- Parameters:
- Return type:
- Returns:
A new sorted RangedSummarizedExperiment object.
- property start: ndarray¶
Get genomic start positions for each feature or row in experimental data.
- Returns:
A
numpy.ndarray
of start positions.
- property strand: ndarray¶
Get strand information.
- Returns:
A
numpy.ndarray
of strand information.
- subset_by_overlaps(query, query_type='any', max_gap=-1, min_overlap=1, ignore_strand=False)[source]¶
Subset a RangedSummarizedExperiment by feature overlaps.
- Parameters:
query (
Union
[GenomicRanges
,GenomicRangesList
,RangedSummarizedExperiment
]) –Query GenomicRanges.
query
may be aGenomicRanges
or aRangedSummarizedExperiment
object.query_type (
str
) –Overlap query type, must be one of
”any”: Any overlap is good
”start”: Overlap at the beginning of the intervals
”end”: Must overlap at the end of the intervals
”within”: Fully contain the query interval
Defaults to “any”.
max_gap (
int
) – Maximum gap allowed in the overlap. Defaults to -1 (no gap allowed).min_overlap (
int
) – Minimum overlap with query. Defaults to 1.ignore_strand (
bool
) – Whether to ignore strands. Defaults to False.
- Raises:
TypeError – If query is not a RangedSummarizedExperiment or GenomicRanges.
- Return type:
- Returns:
A new RangedSummarizedExperiment object. None if there are no indices to slice.
- property width: ndarray¶
Get widths of
row_ranges
.- Returns:
A
numpy.ndarray
of widths for each interval.
- summarizedexperiment.RangedSummarizedExperiment.combine_columns(*x)[source]¶
Combine multiple
RangedSummarizedExperiment
objects by column.All assays must contain the same assay names. If you need a flexible combine operation, checkout
relaxed_combine_columns()
.- Return type:
- Returns:
A combined
RangedSummarizedExperiment
.
- summarizedexperiment.RangedSummarizedExperiment.combine_rows(*x)[source]¶
Combine multiple
RangedSummarizedExperiment
objects by row.All assays must contain the same assay names. If you need a flexible combine operation, checkout
relaxed_combine_rows()
.- Return type:
- Returns:
A combined
RangedSummarizedExperiment
.
- summarizedexperiment.RangedSummarizedExperiment.relaxed_combine_columns(*x)[source]¶
A relaxed version of the
combine_columns()
method forRangedSummarizedExperiment
objects. Whereascombine_columns
expects that all objects have the same rows,relaxed_combine_columns
allows for different rows. Absent columns in any object are filled in with appropriate placeholder values before combining.- Parameters:
x (
RangedSummarizedExperiment
) – One or moreRangedSummarizedExperiment
objects, possibly with differences in the number and identity of their rows.- Return type:
- Returns:
A
RangedSummarizedExperiment
that combines allexperiments
along their columns and contains the union of all rows. Rows absent in anyx
are filled in with placeholders consisting of Nones or masked NumPy values.
- summarizedexperiment.RangedSummarizedExperiment.relaxed_combine_rows(*x)[source]¶
A relaxed version of the
combine_rows()
method forRangedSummarizedExperiment
objects. Whereascombine_rows
expects that all objects have the same columns,relaxed_combine_rows
allows for different columns. Absent columns in any object are filled in with appropriate placeholder values before combining.- Parameters:
x (
RangedSummarizedExperiment
) – One or moreRangedSummarizedExperiment
objects, possibly with differences in the number and identity of their columns.- Return type:
- Returns:
A
RangedSummarizedExperiment
that combines allexperiments
along their rows and contains the union of all columns. Columns absent in anyx
are filled in with placeholders consisting of Nones or masked NumPy values.
summarizedexperiment.SummarizedExperiment module¶
- class summarizedexperiment.SummarizedExperiment.SummarizedExperiment(assays=None, row_data=None, column_data=None, row_names=None, column_names=None, metadata=None, validate=True)[source]¶
Bases:
BaseSE
Container to represents genomic experiment data (assays), features (row_data), sample data (column_data) and any other metadata.
SummarizedExperiment follows the R/Bioconductor specification; rows are features, columns are samples.
- __annotations__ = {}¶
- __init__(assays=None, row_data=None, column_data=None, row_names=None, column_names=None, metadata=None, validate=True)[source]¶
Initialize a Summarized Experiment (SE).
- Parameters:
A dictionary containing matrices, with assay names as keys and 2-dimensional matrices represented as either
ndarray
orspmatrix
.Alternatively, you may use any 2-dimensional matrix that has the
shape
property and implements the slice operation using the__getitem__
dunder method.All matrices in assays must be 2-dimensional and have the same shape (number of rows, number of columns).
row_data (
Optional
[BiocFrame
]) –Features, must be the same length as the number of rows of the matrices in assays.
Feature information is coerced to a
BiocFrame
. Defaults to None.column_data (
Optional
[BiocFrame
]) –Sample data, must be the same length as the number of columns of the matrices in assays.
Sample information is coerced to a
BiocFrame
. Defaults to None.row_names (
Optional
[List
[str
]]) –A list of strings, same as the number of rows.
If
row_names
are not provided, these are inferred fromrow_data
.Defaults to None.
column_names (
Optional
[List
[str
]]) –A list of string, same as the number of columns.
if
column_names
are not provided, these are inferred fromcolumn_data
.Defaults to None.
metadata (
Optional
[dict
]) – Additional experimental metadata describing the methods. Defaults to None.validate (
bool
) – Internal use only.
- combine_columns(*other)[source]¶
Wrapper around
combine_columns()
.- Return type:
- combine_rows(*other)[source]¶
Wrapper around
combine_rows()
.- Return type:
- relaxed_combine_columns(*other)[source]¶
Wrapper around
relaxed_combine_columns()
.- Return type:
- relaxed_combine_rows(*other)[source]¶
Wrapper around
relaxed_combine_rows()
.- Return type:
- summarizedexperiment.SummarizedExperiment.combine_columns(*x)[source]¶
Combine multiple
SummarizedExperiment
objects by column.All assays must contain the same assay names. If you need a flexible combine operation, checkout
relaxed_combine_columns()
.- Return type:
- Returns:
A combined
SummarizedExperiment
.
- summarizedexperiment.SummarizedExperiment.combine_rows(*x)[source]¶
Combine multiple
SummarizedExperiment
objects by row.All assays must contain the same assay names. If you need a flexible combine operation, checkout
relaxed_combine_rows()
.- Return type:
- Returns:
A combined
SummarizedExperiment
.
- summarizedexperiment.SummarizedExperiment.relaxed_combine_columns(*x)[source]¶
A relaxed version of the
combine_columns()
method forSummarizedExperiment
objects. Whereascombine_columns
expects that all objects have the same rows,relaxed_combine_columns
allows for different rows. Absent columns in any object are filled in with appropriate placeholder values before combining.- Parameters:
x (
SummarizedExperiment
) – One or moreSummarizedExperiment
objects, possibly with differences in the number and identity of their rows.- Return type:
- Returns:
A
SummarizedExperiment
that combines allexperiments
along their columns and contains the union of all rows. Rows absent in anyx
are filled in with placeholders consisting of Nones or masked NumPy values.
- summarizedexperiment.SummarizedExperiment.relaxed_combine_rows(*x)[source]¶
A relaxed version of the
combine_rows()
method forSummarizedExperiment
objects. Whereascombine_rows
expects that all objects have the same columns,relaxed_combine_rows
allows for different columns. Absent columns in any object are filled in with appropriate placeholder values before combining.- Parameters:
x (
SummarizedExperiment
) – One or moreSummarizedExperiment
objects, possibly with differences in the number and identity of their columns.- Return type:
- Returns:
A
SummarizedExperiment
that combines allexperiments
along their rows and contains the union of all columns. Columns absent in anyx
are filled in with placeholders consisting of Nones or masked NumPy values.