iranges package¶
Submodules¶
iranges.IRanges module¶
- class iranges.IRanges.IRanges(start=[], width=[], names=None, mcols=None, metadata=None, validate=True)[source]¶
Bases:
object
A collection of integer ranges, equivalent to the
IRanges
class from the Bioconductor package of the same name.This holds a start position and a width, and is most typically used to represent coordinates along some genomic sequence. The interpretation of the start position depends on the application; for sequences, the start is usually a 1-based position, but other use cases may allow zero or even negative values.
- __copy__()[source]¶
Shallow copy of the object.
- Return type:
- Returns:
Same type as the caller, a shallow copy of this object.
- __deepcopy__(memo)[source]¶
Deep copy of the object.
- Parameters:
memo – Passed to internal
deepcopy()
calls.- Return type:
- Returns:
Same type as the caller, a deep copy of this object.
- __init__(start=[], width=[], names=None, mcols=None, metadata=None, validate=True)[source]¶
- Parameters:
start (
Sequence
[int
]) – Sequence of integers containing the start position for each range. All values should fall within the range that can be represented by a 32-bit signed integer.width (
Sequence
[int
]) – Sequence of integers containing the width for each range. This should be of the same length asstart
. All values should be non-negative and fall within the range that can be represented by a 32-bit signed integer. Similarly,start + width
should not exceed the range of a 32-bit signed integer.names (
Optional
[Sequence
[str
]]) – Sequence of strings containing the name for each range. This should have length equal tostart
and should only contain strings. If no names are present, None may be supplied instead.mcols (
Optional
[BiocFrame
]) – A data frame containing additional metadata columns for each range. This should have number of rows equal to the length ofstart
. If None, defaults to a zero-column data frame.metadata (
Optional
[dict
]) – Additional metadata. If None, defaults to an empty dictionary.validate (
bool
) – Whether to validate the arguments, internal use only.
- __setitem__(args, value)[source]¶
Add or update positions (in-place operation).
- Parameters:
subset – Integer indices, a boolean filter, or (if the current object is named) names specifying the ranges to be replaced, see
normalize_subscript()
.value (
IRanges
) – AnIRanges
object of length equal to the number of ranges to be replaced, as specified bysubset
.
- Returns:
Specified ranges are replaced by
value
in the current object.
- clip_intervals(shift=0, width=None, adjust_width_by_shift=False)[source]¶
Clip intervals. Starts are always clipped to positive interval ranges (1, Inf).
If
width
is specified, the intervals are clipped to (1, width).- Parameters:
- Return type:
- Returns:
A
IRanges
object, with the clipped intervals.
- count_overlaps(query, query_type='any', max_gap=-1, min_overlap=1, delete_index=True)[source]¶
Count number of overlaps with
query
IRanges object.- Parameters:
query (
IRanges
) – Query IRanges.query_type (
Literal
['any'
,'start'
,'end'
,'within'
]) –Overlap query type, must be one of
”any”: Any overlap is good
”start”: Overlap at the beginning of the intervals
”end”: Must overlap at the end of the intervals
”within”: Fully contain the query interval
Defaults to “any”.
max_gap (
int
) – Maximum gap allowed in the overlap. Defaults to -1 (no gap allowed).min_overlap (
int
) – Minimum overlap with query. Defaults to 1.delete_index (
bool
) – Delete the cached ncls index. Internal use only.
- Raises:
TypeError – If
query
is not anIRanges
object.- Return type:
- Returns:
Numpy vector with the number of overlaps for each range in query.
- coverage(shift=0, width=None, weight=1)[source]¶
Calculate coverage, for each position, counts the number of intervals that cover it.
- Parameters:
- Raises:
TypeError – If ‘weight’ is not a number. If ‘width’ is not an expected type.
- Return type:
- Returns:
A numpy array with the coverage vector.
- classmethod empty()[source]¶
Create an zero-length
IRanges
object.- Returns:
same type as caller, in this case a
IRanges
.
- property end: ndarray¶
Get all end positions (read-only).
- Returns:
NumPy array of 32-bit signed integers containing the end position (not inclusive) for all ranges.
- find_overlaps(query, query_type='any', select='all', max_gap=-1, min_overlap=1, delete_index=True)[source]¶
Find overlaps with
query
IRanges object.- Parameters:
query (
IRanges
) – Query IRanges.query_type (
Literal
['any'
,'start'
,'end'
,'within'
]) –Overlap query type, must be one of
”any”: Any overlap is good
”start”: Overlap at the beginning of the intervals
”end”: Must overlap at the end of the intervals
”within”: Fully contain the query interval
Defaults to “any”.
select (
Literal
['all'
,'first'
,'last'
,'arbitrary'
]) – Determine what hit to choose when there are multiple hits for an interval insubject
.max_gap (
int
) – Maximum gap allowed in the overlap. Defaults to -1 (no gap allowed).min_overlap (
int
) – Minimum overlap with query. Defaults to 1.delete_index (
bool
) – Delete the cached ncls index. Internal use only.
- Raises:
TypeError – If
query
is not anIRanges
object.- Return type:
- Returns:
A List with the same length as the number of intervals
query
. Each element is a list of indices that overlap or, None if there are no overlaps.
- flank(width, start=True, both=False, in_place=False)[source]¶
Compute flanking ranges for each range. The logic is from the IRanges package.
If
start
isTrue
for a given range, the flanking occurs at the start, otherwise the end. The widths of the flanks are given by thewidth
parameter.width
can be negative, in which case the flanking region is reversed so that it represents a prefix or suffix of the range.Usage:
ir.flank(3, True), where “x” indicates a range in
ir
and “-” indicates the resulting flanking region:—xxxxxxx
- If
start
wereFalse
, the range inir
becomes xxxxxxx—
For negative width, i.e. ir.flank(x, -3, FALSE), where “*” indicates the overlap between “x” and the result:
xxxx***
If
both
isTrue
, then, for all ranges in “x”, the flanking regions are extended into (or out of, ifwidth
is negative) the range, so that the result straddles the given endpoint and has twice the width given by width.- Parameters:
- Return type:
- Returns:
If
in_place = False
, a newIRanges
is returned with the flanked intervals. Otherwise, the current object is directly modified and a reference to it is returned.
- If
- follow(query, select='all', delete_index=True)[source]¶
Search nearest positions only downstream that overlap with each range in
query
.- Parameters:
- Raises:
TypeError – If
query
is not of typeIRanges
.- Return type:
- Returns:
A List with the same lenth as the number of intervals in query. Each element may contain indices nearest to the interval or None if there are no nearest intervals.
- classmethod from_pandas(input)[source]¶
Create a
IRanges
from aDataFrame
object.- Parameters:
input (pandas.DataFrame) – Input data must contain columns ‘start’ and ‘width’.
- Return type:
IRanges
- Returns:
A
IRanges
object.
- classmethod from_polars(input)[source]¶
Create a
IRanges
from aDataFrame
object.- Parameters:
input (polars.DataFrame) – Input data must contain columns ‘start’ and ‘width’.
- Return type:
IRanges
- Returns:
A
IRanges
object.
- gaps(start=None, end=None)[source]¶
Gaps returns an
IRanges
object representing the set of integers that remain after the intervals are removed specified by the start and end arguments.
- gaps_numpy(start=None, end=None)[source]¶
Gaps returns an
IRanges
object representing the set of integers that remain after the intervals are removed specified by the start and end arguments.This function uses a vectorized approach using numpy vectors. The normal
gaps()
method performs better in most cases.
- get_end()[source]¶
Get all end positions.
- Return type:
- Returns:
NumPy array of 32-bit signed integers containing the end position (not inclusive) for all ranges.
- get_mcols()[source]¶
Get metadata about ranges.
- Return type:
- Returns:
Data frame containing additional metadata columns for all ranges.
- get_metadata()[source]¶
Get additional metadata.
- Return type:
- Returns:
Dictionary containing additional metadata.
- get_row(index_or_name)[source]¶
Access a row by index or row name.
- Parameters:
index_or_name (
Union
[str
,int
]) –Integer index of the row to access.
Alternatively, you may provide a string specifying the row name to access, only if
names
are available.- Raises:
ValueError – If
index_or_name
is not in row names. If the integer index is greater than the number of rows.TypeError – If
index_or_name
is neither a string nor an integer.
- Returns:
A sliced IRanges object.
- Return type:
- get_start()[source]¶
Get all start positions.
- Return type:
- Returns:
NumPy array of 32-bit signed integers containing the start positions for all ranges.
- get_width()[source]¶
Get width of each interval.
- Return type:
- Returns:
NumPy array of 32-bit signed integers containing the widths for all ranges.
- intersect_ncls(other, delete_index=True)[source]¶
Find intersecting intervals with other. Uses the NCLS index.
- property mcols: BiocFrame¶
Get metadata about ranges.
- Returns:
Data frame containing additional metadata columns for all ranges.
- property metadata: dict¶
Get additional metadata.
- Returns:
Dictionary containing additional metadata.
- property names: Names | None¶
Get all names.
- Returns:
List containing the names for all ranges, or None if no names are available.
- narrow(start=None, width=None, end=None, in_place=False)[source]¶
Narrow genomic positions by provided
start
,width
andend
parameters.Important: These arguments are relative shift in positions for each range.
- Parameters:
start (
Union
[int
,List
[int
],ndarray
,None
]) – Relative start position. Defaults to None.width (
Union
[int
,List
[int
],ndarray
,None
]) – Width of each interval position. Defaults to None.end (
Union
[int
,List
[int
],ndarray
,None
]) – Relative end position. Defaults to None.in_place (
bool
) – Whether to modify the object in place. Defaults to False.
- Raises:
ValueError – If width is provided, either start or end must be provided. Provide two of the three parameters - start, end and width but not all.
- Return type:
- Returns:
If
in_place = False
, a newIRanges
is returned with the narrow intervals. Otherwise, the current object is directly modified and a reference to it is returned.
- nearest(query, select='all', delete_index=True)[source]¶
Search nearest positions both upstream and downstream that overlap with each range in
query
.- Parameters:
- Raises:
TypeError – If
query
is not of typeIRanges
.- Return type:
- Returns:
A List with the same lenth as the number of intervals in query. Each element may contain indices nearest to the interval or None if there are no nearest intervals.
- precede(query, select='all', delete_index=True)[source]¶
Search nearest positions only downstream that overlap with each range in
query
.- Parameters:
- Raises:
TypeError – If
query
is not of typeIRanges
.- Return type:
- Returns:
A List with the same lenth as the number of intervals in query. Each element may contain indices nearest to the interval or None if there are no nearest intervals.
- promoters(upstream=2000, downstream=200, in_place=False)[source]¶
Extend intervals to promoter regions.
Generates promoter ranges relative to the transcription start site (TSS), where TSS is start(x). The promoter range is expanded around the TSS according to the upstream and downstream arguments. Upstream represents the number of nucleotides in the 5’ direction and downstream the number in the 3’ direction. The full range is defined as, (start(x) - upstream) to (start(x) + downstream - 1).
- Parameters:
- Return type:
- Returns:
If
in_place = False
, a newIRanges
is returned with the promoter intervals. Otherwise, the current object is directly modified and a reference to it is returned.
- range()[source]¶
Concatenate all intervals.
- Return type:
- Returns:
An new
IRanges
instance with a single range, the minimum of all the start positions, Maximum of all end positions.
- reduce(with_reverse_map=False, drop_empty_ranges=False, min_gap_width=1)[source]¶
Reduce orders the ranges, then merges overlapping or adjacent ranges.
- Parameters:
- Return type:
- Returns:
A new
IRanges
object with reduced intervals.
- reflect(bounds, in_place=False)[source]¶
Reverses each range in x relative to the corresponding range in bounds.
Reflection preserves the width of a range, but shifts it such the distance from the left bound to the start of the range becomes the distance from the end of the range to the right bound. This is illustrated below, where x represents a range in x and [ and ] indicate the bounds:
[..xxx…..] becomes […..xxx..]
- Parameters:
- Return type:
- Returns:
If
in_place = False
, a newIRanges
is returned with the reflected intervals. Otherwise, the current object is directly modified and a reference to it is returned.
- resize(width, fix='start', in_place=False)[source]¶
Resize ranges to the specified
width
where either thestart
,end
, orcenter
is used as an anchor.- Parameters:
width (
Union
[int
,List
[int
],ndarray
]) – Width to resize, must be non-negative!fix (
Union
[Literal
['start'
,'end'
,'center'
],List
[Literal
['start'
,'end'
,'center'
]]]) –Fix positions by “start”, “end”, or “center”.
Alternatively, fix may be a list with the same size as this IRanges object, denoting what to use as an anchor for each interval.
Defaults to “start”.
in_place (
bool
) – Whether to modify the object in place. Defaults to False.
- Raises:
ValueError – If parameter
fix
is neither start, end, nor center. Ifwidth
is negative.- Return type:
- Returns:
If
in_place = False
, a newIRanges
is returned with the resized intervals. Otherwise, the current object is directly modified and a reference to it is returned.
- restrict(start=None, end=None, keep_all_ranges=False)[source]¶
Restrict ranges to a given start and end positions.
- Parameters:
- Return type:
- Returns:
A new
IRanges
with the restricted intervals.
- set_mcols(mcols, in_place=False)[source]¶
Set new metadata about ranges.
- Parameters:
- Return type:
- Returns:
If
in_place = False
, a newIRanges
is returned with the modified metadata columns. Otherwise, the current object is directly modified and a reference to it is returned.
- set_metadata(metadata, in_place=False)[source]¶
Set or replace metadata.
- Parameters:
- Return type:
- Returns:
If
in_place = False
, a newIRanges
is returned with the modified metadata. Otherwise, the current object is directly modified and a reference to it is returned.
- set_names(names, in_place=False)[source]¶
- Parameters:
- Return type:
- Returns:
If
in_place = False
, a newIRanges
is returned with the modified names. Otherwise, the current object is directly modified and a reference to it is returned.
- set_start(start, in_place=False)[source]¶
Modify start positions (in-place operation).
- Parameters:
- Return type:
- Returns:
If
in_place = False
, a newIRanges
is returned with the modified start positions. Otherwise, the current object is directly modified and a reference to it is returned.
- set_width(width, in_place=False)[source]¶
- Parameters:
- Return type:
- Returns:
If
in_place = False
, a newIRanges
is returned with the modified widths. Otherwise, the current object is directly modified and a reference to it is returned.
- shift(shift, in_place=False)[source]¶
Shifts all the intervals by the amount specified by the
shift
argument.- Parameters:
- Return type:
- Returns:
If
in_place = False
, a newIRanges
is returned with the shifted intervals. Otherwise, the current object is directly modified and a reference to it is returned.
- sort(decreasing=False, in_place=False)[source]¶
Sort the intervals.
- Parameters:
- Return type:
- Returns:
If
in_place = False
, a newIRanges
is returned with the sorted intervals. Otherwise, the current object is directly modified and a reference to it is returned.
- property start: ndarray¶
Get all start positions.
- Returns:
NumPy array of 32-bit signed integers containing the start positions for all ranges.
- subset_by_overlaps(query, query_type='any', max_gap=-1, min_overlap=1, delete_index=True)[source]¶
Subset by overlapping intervals in
query
.- Parameters:
query (
IRanges
) – QueryIRanges
object.query_type (
Literal
['any'
,'start'
,'end'
,'within'
]) –Overlap query type, must be one of
”any”: Any overlap is good
”start”: Overlap at the beginning of the intervals
”end”: Must overlap at the end of the intervals
”within”: Fully contain the query interval
Defaults to “any”.
max_gap (
int
) – Maximum gap allowed in the overlap. Defaults to -1 (no gap allowed).min_overlap (
int
) – Minimum overlap with query. Defaults to 1.delete_index (
bool
) – Delete the cached ncls index. Internal use only.
- Raises:
TypeError – If
query
is not of typeIRanges
.- Return type:
- Returns:
A new
IRanges
object containing ranges that overlap with query.
- to_pandas()[source]¶
Convert this
IRanges
object into aDataFrame
.- Return type:
pandas.DataFrame
- Returns:
A
DataFrame
object.
- to_polars()[source]¶
Convert this
IRanges
object into aDataFrame
.- Return type:
polars.DataFrame
- Returns:
A
DataFrame
object.
iranges.interval module¶
- iranges.interval.calc_gap_and_overlap(first, second)[source]¶
Calculate gap and/or overlap between two intervals.