iranges package¶
Submodules¶
iranges.IRanges module¶
- class iranges.IRanges.IRanges(start: Sequence[int] = [], width: Sequence[int] = [], names: Sequence[str] | None = None, mcols: BiocFrame | None = None, metadata: dict | None = None, validate: bool = True)[source]¶
Bases:
object
A collection of integer ranges, equivalent to the
IRanges
class from the Bioconductor package of the same name.This holds a start position and a width, and is most typically used to represent coordinates along some genomic sequence. The interpretation of the start position depends on the application; for sequences, the start is usually a 1-based position, but other use cases may allow zero or even negative values.
- __copy__() IRanges [source]¶
Shallow copy of the object.
- Returns:
Same type as the caller, a shallow copy of this object.
- __deepcopy__(memo) IRanges [source]¶
Deep copy of the object.
- Parameters:
memo – Passed to internal
deepcopy()
calls.- Returns:
Same type as the caller, a deep copy of this object.
- __getitem__(subset: Sequence | int | str | bool | slice | range) IRanges [source]¶
Subset the IRanges.
- Parameters:
subset – Integer indices, a boolean filter, or (if the current object is named) names specifying the ranges to be extracted, see
normalize_subscript()
.- Returns:
A new
IRanges
object containing the ranges of interest.
- __init__(start: Sequence[int] = [], width: Sequence[int] = [], names: Sequence[str] | None = None, mcols: BiocFrame | None = None, metadata: dict | None = None, validate: bool = True)[source]¶
- Parameters:
start – Sequence of integers containing the start position for each range. All values should fall within the range that can be represented by a 32-bit signed integer.
width – Sequence of integers containing the width for each range. This should be of the same length as
start
. All values should be non-negative and fall within the range that can be represented by a 32-bit signed integer. Similarly,start + width
should not exceed the range of a 32-bit signed integer.names – Sequence of strings containing the name for each range. This should have length equal to
start
and should only contain strings. If no names are present, None may be supplied instead.mcols – A data frame containing additional metadata columns for each range. This should have number of rows equal to the length of
start
. If None, defaults to a zero-column data frame.metadata – Additional metadata. If None, defaults to an empty dictionary.
validate – Whether to validate the arguments, internal use only.
- __iter__() IRangesIter [source]¶
Iterator over intervals.
- __setitem__(args: Sequence | int | str | bool | slice | range, value: IRanges)[source]¶
Add or update positions (in-place operation).
- Parameters:
subset – Integer indices, a boolean filter, or (if the current object is named) names specifying the ranges to be replaced, see
normalize_subscript()
.value – An
IRanges
object of length equal to the number of ranges to be replaced, as specified bysubset
.
- Returns:
Specified ranges are replaced by
value
in the current object.
- clip_intervals(shift: int | List[int] | ndarray = 0, width: int | List[int] | ndarray | None = None, adjust_width_by_shift: bool = False) IRanges [source]¶
Clip intervals. Starts are always clipped to positive interval ranges (1, Inf).
If
width
is specified, the intervals are clipped to (1, width).- Parameters:
shift – Shift all starts before clipping. Defaults to 0.
width – Clip width of each interval. Defaults to None.
adjust_width_by_shift – Whether to adjust the width based on shift. Defaults to False.
- Returns:
A
IRanges
object, with the clipped intervals.
- count_overlaps(query: IRanges, query_type: Literal['any', 'start', 'end', 'within'] = 'any', max_gap: int = -1, min_overlap: int = 1, delete_index: bool = True) ndarray [source]¶
Count number of overlaps with
query
IRanges object.- Parameters:
query – Query IRanges.
query_type –
Overlap query type, must be one of
”any”: Any overlap is good
”start”: Overlap at the beginning of the intervals
”end”: Must overlap at the end of the intervals
”within”: Fully contain the query interval
Defaults to “any”.
max_gap – Maximum gap allowed in the overlap. Defaults to -1 (no gap allowed).
min_overlap – Minimum overlap with query. Defaults to 1.
delete_index – Delete the cached ncls index. Internal use only.
- Raises:
TypeError – If
query
is not anIRanges
object.- Returns:
Numpy vector with the number of overlaps for each range in query.
- coverage(shift: int | List[int] | ndarray = 0, width: int | List[int] | ndarray | None = None, weight: int | float = 1) ndarray [source]¶
Calculate coverage, for each position, counts the number of intervals that cover it.
- Parameters:
shift – Shift all intervals. Defaults to 0.
width – Restrict the width of all intervals. Defaults to None.
weight – Weight to use. Defaults to 1.
- Raises:
TypeError – If ‘weight’ is not a number. If ‘width’ is not an expected type.
- Returns:
A numpy array with the coverage vector.
- disjoin(with_reverse_map: bool = False) IRanges [source]¶
Calculate disjoint intervals.
- Parameters:
with_reverse_map – Whether to return a map of indices back to the original object. Defaults to False.
- Returns:
A new IRanges containing disjoint intervals.
- distance(query: IRanges) ndarray [source]¶
Calculate the pair-wise distance with intervals in query.
- Parameters:
query – Query IRanges.
- Returns:
Numpy vector containing distances for each interval in query.
- classmethod empty()[source]¶
Create an zero-length
IRanges
object.- Returns:
same type as caller, in this case a
IRanges
.
- property end: ndarray¶
Get all end positions (read-only).
- Returns:
NumPy array of 32-bit signed integers containing the end position (not inclusive) for all ranges.
- find_overlaps(query: IRanges, query_type: Literal['any', 'start', 'end', 'within'] = 'any', select: Literal['all', 'first', 'last', 'arbitrary'] = 'all', max_gap: int = -1, min_overlap: int = 1, delete_index: bool = True) List[List[int]] [source]¶
Find overlaps with
query
IRanges object.- Parameters:
query – Query IRanges.
query_type –
Overlap query type, must be one of
”any”: Any overlap is good
”start”: Overlap at the beginning of the intervals
”end”: Must overlap at the end of the intervals
”within”: Fully contain the query interval
Defaults to “any”.
select – Determine what hit to choose when there are multiple hits for an interval in
subject
.max_gap – Maximum gap allowed in the overlap. Defaults to -1 (no gap allowed).
min_overlap – Minimum overlap with query. Defaults to 1.
delete_index – Delete the cached ncls index. Internal use only.
- Raises:
TypeError – If
query
is not anIRanges
object.- Returns:
A List with the same length as the number of intervals
query
. Each element is a list of indices that overlap or, None if there are no overlaps.
- flank(width: int, start: bool = True, both: bool = False, in_place: bool = False) IRanges [source]¶
Compute flanking ranges for each range. The logic is from the IRanges package.
If
start
isTrue
for a given range, the flanking occurs at the start, otherwise the end. The widths of the flanks are given by thewidth
parameter.width
can be negative, in which case the flanking region is reversed so that it represents a prefix or suffix of the range.Usage:
ir.flank(3, True), where “x” indicates a range in
ir
and “-” indicates the resulting flanking region:—xxxxxxx
- If
start
wereFalse
, the range inir
becomes xxxxxxx—
For negative width, i.e. ir.flank(x, -3, FALSE), where “*” indicates the overlap between “x” and the result:
xxxx***
If
both
isTrue
, then, for all ranges in “x”, the flanking regions are extended into (or out of, ifwidth
is negative) the range, so that the result straddles the given endpoint and has twice the width given by width.- Parameters:
width – Width to flank by. May be negative.
start – Whether to only flank starts. Defaults to True.
both – Whether to flank both starts and ends. Defaults to False.
in_place – Whether to modify the object in place. Defaults to False.
- Returns:
If
in_place = False
, a newIRanges
is returned with the flanked intervals. Otherwise, the current object is directly modified and a reference to it is returned.
- If
- follow(query: IRanges, select: Literal['all', 'last'] = 'all', delete_index: bool = True) List[List[int]] [source]¶
Search nearest positions only downstream that overlap with each range in
query
.- Parameters:
query – Query IRanges to find nearest positions.
select – Determine what hit to choose when there are multiple hits for an interval in
query
.delete_index – Delete the cached ncls index. Internal use only.
- Raises:
TypeError – If
query
is not of typeIRanges
.- Returns:
A List with the same lenth as the number of intervals in query. Each element may contain indices nearest to the interval or None if there are no nearest intervals.
- classmethod from_pandas(input: pandas.DataFrame) IRanges [source]¶
Create a
IRanges
from aDataFrame
object.- Parameters:
input – Input data must contain columns ‘start’ and ‘width’.
- Returns:
A
IRanges
object.
- classmethod from_polars(input: polars.DataFrame) IRanges [source]¶
Create a
IRanges
from aDataFrame
object.- Parameters:
input – Input data must contain columns ‘start’ and ‘width’.
- Returns:
A
IRanges
object.
- gaps(start: int | None = None, end: int | None = None) IRanges [source]¶
Gaps returns an
IRanges
object representing the set of integers that remain after the intervals are removed specified by the start and end arguments.- Parameters:
start – Restrict start position. Defaults to 1.
end – Restrict end position. Defaults to None.
- Returns:
A new
IRanges
is with the gap regions.
- gaps_numpy(start: int | None = None, end: int | None = None) IRanges [source]¶
Gaps returns an
IRanges
object representing the set of integers that remain after the intervals are removed specified by the start and end arguments.This function uses a vectorized approach using numpy vectors. The normal
gaps()
method performs better in most cases.- Parameters:
start – Restrict start position. Defaults to 1.
end – Restrict end position. Defaults to None.
- Returns:
A new
IRanges
is with the gap regions.
- get_end() ndarray [source]¶
Get all end positions.
- Returns:
NumPy array of 32-bit signed integers containing the end position (not inclusive) for all ranges.
- get_mcols() BiocFrame [source]¶
Get metadata about ranges.
- Returns:
Data frame containing additional metadata columns for all ranges.
- get_metadata() dict [source]¶
Get additional metadata.
- Returns:
Dictionary containing additional metadata.
- get_names() Names | None [source]¶
Get all names.
- Returns:
List containing the names for all ranges, or None if no names are present.
- get_row(index_or_name: str | int) IRanges [source]¶
Access a row by index or row name.
- Parameters:
index_or_name –
Integer index of the row to access.
Alternatively, you may provide a string specifying the row name to access, only if
names
are available.- Raises:
ValueError – If
index_or_name
is not in row names. If the integer index is greater than the number of rows.TypeError – If
index_or_name
is neither a string nor an integer.
- Returns:
A sliced IRanges object.
- Return type:
- get_start() ndarray [source]¶
Get all start positions.
- Returns:
NumPy array of 32-bit signed integers containing the start positions for all ranges.
- get_width() ndarray [source]¶
Get width of each interval.
- Returns:
NumPy array of 32-bit signed integers containing the widths for all ranges.
- intersect(other: IRanges) IRanges [source]¶
Find intersecting intervals with other.
- Parameters:
other – An IRanges object.
- Raises:
TypeError – If
other
is not IRanges.- Returns:
A new
IRanges
object with all intersecting intervals.
- intersect_ncls(other: IRanges, delete_index: bool = True) IRanges [source]¶
Find intersecting intervals with other. Uses the NCLS index.
- Parameters:
other – An IRanges object.
- Raises:
TypeError – If
other
is not IRanges.- Returns:
A new
IRanges
object with all intersecting intervals.
- property mcols: BiocFrame¶
Get metadata about ranges.
- Returns:
Data frame containing additional metadata columns for all ranges.
- property metadata: dict¶
Get additional metadata.
- Returns:
Dictionary containing additional metadata.
- property names: Names | None¶
Get all names.
- Returns:
List containing the names for all ranges, or None if no names are available.
- narrow(start: int | List[int] | ndarray | None = None, width: int | List[int] | ndarray | None = None, end: int | List[int] | ndarray | None = None, in_place: bool = False) IRanges [source]¶
Narrow genomic positions by provided
start
,width
andend
parameters.Important: These arguments are relative shift in positions for each range.
- Parameters:
start – Relative start position. Defaults to None.
width – Width of each interval position. Defaults to None.
end – Relative end position. Defaults to None.
in_place – Whether to modify the object in place. Defaults to False.
- Raises:
ValueError – If width is provided, either start or end must be provided. Provide two of the three parameters - start, end and width but not all.
- Returns:
If
in_place = False
, a newIRanges
is returned with the narrow intervals. Otherwise, the current object is directly modified and a reference to it is returned.
- nearest(query: IRanges, select: Literal['all', 'arbitrary'] = 'all', delete_index: bool = True) List[List[int]] [source]¶
Search nearest positions both upstream and downstream that overlap with each range in
query
.- Parameters:
query – Query IRanges to find nearest positions.
select – Determine what hit to choose when there are multiple hits for an interval in
query
.delete_index – Delete the cached ncls index. Internal use only.
- Raises:
TypeError – If
query
is not of typeIRanges
.- Returns:
A List with the same lenth as the number of intervals in query. Each element may contain indices nearest to the interval or None if there are no nearest intervals.
- order(decreasing: bool = False) ndarray [source]¶
Get the order of indices for sorting.
- Parameters:
decreasing – Whether to sort in descending order. Defaults to False.
- Returns:
NumPy vector containing index positions in the sorted order.
- overlap_indices(start: int | None = None, end: int | None = None) ndarray [source]¶
Find overlaps with the start and end positions.
- Parameters:
start – Start position. Defaults to None.
end – End position. Defaults to None.
- Returns:
Numpy vector containing indices that overlap with the given range.
- precede(query: IRanges, select: Literal['all', 'first'] = 'all', delete_index: bool = True) List[List[int]] [source]¶
Search nearest positions only downstream that overlap with each range in
query
.- Parameters:
query – Query IRanges to find nearest positions.
select – Determine what hit to choose when there are multiple hits for an interval in
query
.delete_index – Delete the cached ncls index. Internal use only.
- Raises:
TypeError – If
query
is not of typeIRanges
.- Returns:
A List with the same lenth as the number of intervals in query. Each element may contain indices nearest to the interval or None if there are no nearest intervals.
- promoters(upstream: int = 2000, downstream: int = 200, in_place: bool = False) IRanges [source]¶
Extend intervals to promoter regions.
Generates promoter ranges relative to the transcription start site (TSS), where TSS is start(x). The promoter range is expanded around the TSS according to the upstream and downstream arguments. Upstream represents the number of nucleotides in the 5’ direction and downstream the number in the 3’ direction. The full range is defined as, (start(x) - upstream) to (start(x) + downstream - 1).
- Parameters:
upstream – Number of positions to extend in the 5’ direction. Defaults to 2000.
downstream – Number of positions to extend in the 3’ direction. Defaults to 200.
in_place – Whether to modify the object in place. Defaults to False.
- Returns:
If
in_place = False
, a newIRanges
is returned with the promoter intervals. Otherwise, the current object is directly modified and a reference to it is returned.
- range() IRanges [source]¶
Concatenate all intervals.
- Returns:
An new
IRanges
instance with a single range, the minimum of all the start positions, Maximum of all end positions.
- reduce(with_reverse_map: bool = False, drop_empty_ranges: bool = False, min_gap_width: int = 1) IRanges [source]¶
Reduce orders the ranges, then merges overlapping or adjacent ranges.
- Parameters:
with_reverse_map – Whether to return map of indices back to original object. Defaults to False.
drop_empty_ranges – Whether to drop empty ranges. Defaults to False.
min_gap_width – Ranges separated by a gap of at least
min_gap_width
positions are not merged. Defaults to 1.
- Returns:
A new
IRanges
object with reduced intervals.
- reflect(bounds: IRanges, in_place: bool = False) IRanges [source]¶
Reverses each range in x relative to the corresponding range in bounds.
Reflection preserves the width of a range, but shifts it such the distance from the left bound to the start of the range becomes the distance from the end of the range to the right bound. This is illustrated below, where x represents a range in x and [ and ] indicate the bounds:
[..xxx…..] becomes […..xxx..]
- Parameters:
bounds – IRanges with the same length as the current object specifying the bounds.
in_place – Whether to modify the object in place. Defaults to False.
- Returns:
If
in_place = False
, a newIRanges
is returned with the reflected intervals. Otherwise, the current object is directly modified and a reference to it is returned.
- resize(width: int | List[int] | ndarray, fix: Literal['start', 'end', 'center'] | List[Literal['start', 'end', 'center']] = 'start', in_place: bool = False) IRanges [source]¶
Resize ranges to the specified
width
where either thestart
,end
, orcenter
is used as an anchor.- Parameters:
width – Width to resize, must be non-negative!
fix –
Fix positions by “start”, “end”, or “center”.
Alternatively, fix may be a list with the same size as this IRanges object, denoting what to use as an anchor for each interval.
Defaults to “start”.
in_place – Whether to modify the object in place. Defaults to False.
- Raises:
ValueError – If parameter
fix
is neither start, end, nor center. Ifwidth
is negative.- Returns:
If
in_place = False
, a newIRanges
is returned with the resized intervals. Otherwise, the current object is directly modified and a reference to it is returned.
- restrict(start: int | List[int] | ndarray | None = None, end: int | List[int] | ndarray | None = None, keep_all_ranges: bool = False) IRanges [source]¶
Restrict ranges to a given start and end positions.
- Parameters:
start – Start position. Defaults to None.
end – End position. Defaults to None.
keep_all_ranges – Whether to keep intervals that do not overlap with start and end. Defaults to False.
- Returns:
A new
IRanges
with the restricted intervals.
- set_mcols(mcols: BiocFrame | None, in_place: bool = False) IRanges [source]¶
Set new metadata about ranges.
- Parameters:
mcols – Data frame of additional columns, see the constructor for details.
in_place – Whether to modify the object in place.
- Returns:
If
in_place = False
, a newIRanges
is returned with the modified metadata columns. Otherwise, the current object is directly modified and a reference to it is returned.
- set_metadata(metadata: dict | None, in_place: bool = False) IRanges [source]¶
Set or replace metadata.
- Parameters:
metadata – Additional metadata.
in_place – Whether to modify the object in place.
- Returns:
If
in_place = False
, a newIRanges
is returned with the modified metadata. Otherwise, the current object is directly modified and a reference to it is returned.
- set_names(names: Sequence[str] | None, in_place: bool = False) IRanges [source]¶
- Parameters:
names – Sequence of names or None, see the constructor for details.
in_place – Whether to modify the object in place.
- Returns:
If
in_place = False
, a newIRanges
is returned with the modified names. Otherwise, the current object is directly modified and a reference to it is returned.
- set_start(start: Sequence[int], in_place: bool = False) IRanges [source]¶
Modify start positions (in-place operation).
- Parameters:
start – Sequence of start positions, see the constructor for details.
in_place – Whether to modify the object in place.
- Returns:
If
in_place = False
, a newIRanges
is returned with the modified start positions. Otherwise, the current object is directly modified and a reference to it is returned.
- set_width(width: Sequence[int], in_place: bool = False) IRanges [source]¶
- Parameters:
width – Sequence of widths, see the constructor for details.
in_place – Whether to modify the object in place.
- Returns:
If
in_place = False
, a newIRanges
is returned with the modified widths. Otherwise, the current object is directly modified and a reference to it is returned.
- setdiff(other: IRanges) IRanges [source]¶
Find set difference with other.
- Parameters:
other – An IRanges object.
- Raises:
TypeError – If
other
is not IRanges.- Returns:
A new
IRanges
object.
- shift(shift: int | List[int] | ndarray, in_place: bool = False) IRanges [source]¶
Shifts all the intervals by the amount specified by the
shift
argument.- Parameters:
shift – Amount to shift by.
in_place – Whether to modify the object in place. Defaults to False.
- Returns:
If
in_place = False
, a newIRanges
is returned with the shifted intervals. Otherwise, the current object is directly modified and a reference to it is returned.
- sort(decreasing: bool = False, in_place: bool = False) IRanges [source]¶
Sort the intervals.
- Parameters:
decreasing – Whether to sort in descending order. Defaults to False.
in_place – Whether to modify the object in place. Defaults to False.
- Returns:
If
in_place = False
, a newIRanges
is returned with the sorted intervals. Otherwise, the current object is directly modified and a reference to it is returned.
- property start: ndarray¶
Get all start positions.
- Returns:
NumPy array of 32-bit signed integers containing the start positions for all ranges.
- subset_by_overlaps(query: IRanges, query_type: Literal['any', 'start', 'end', 'within'] = 'any', max_gap: int = -1, min_overlap: int = 1, delete_index: bool = True) IRanges [source]¶
Subset by overlapping intervals in
query
.- Parameters:
query – Query
IRanges
object.query_type –
Overlap query type, must be one of
”any”: Any overlap is good
”start”: Overlap at the beginning of the intervals
”end”: Must overlap at the end of the intervals
”within”: Fully contain the query interval
Defaults to “any”.
max_gap – Maximum gap allowed in the overlap. Defaults to -1 (no gap allowed).
min_overlap – Minimum overlap with query. Defaults to 1.
delete_index – Delete the cached ncls index. Internal use only.
- Raises:
TypeError – If
query
is not of typeIRanges
.- Returns:
A new
IRanges
object containing ranges that overlap with query.
- to_pandas() pandas.DataFrame [source]¶
Convert this
IRanges
object into aDataFrame
.- Returns:
A
DataFrame
object.
- to_polars() polars.DataFrame [source]¶
Convert this
IRanges
object into aDataFrame
.- Returns:
A
DataFrame
object.
iranges.interval module¶
- iranges.interval.calc_gap_and_overlap(first: Tuple[int, int], second: Tuple[int, int]) Tuple[int | None, int | None] [source]¶
Calculate gap and/or overlap between two intervals.
- Parameters:
first – Interval containing start and end positions. end is non-inclusive.
second – Interval containing start and end positions. end is non-inclusive.
- iranges.interval.create_np_interval_vector(intervals: IRanges, with_reverse_map: bool = False, force_size: int | None = None, dont_sum: bool = False, value: int | float = 1) Tuple[ndarray, List | None] [source]¶
Represent intervals and calculate coverage.
- Parameters:
intervals – Input intervals.
with_reverse_map – Return map of indices? Defaults to False.
force_size – Force size of the array.
dont_sum – Do not sum. Defaults to False.
value – Default value to increment. Defaults to 1.
- Returns:
A numpy array representing coverage from the intervals and optionally the index map.