IRanges
: Interval arithmetic¶
Python implementation of the IRanges Bioconductor package.
An IRanges
holds a start position and a width, and is typically used to represent coordinates along a genomic sequence. The interpretation of the start position depends on the application; for sequences, the start is usually a 1-based position, but other use cases may allow zero or even negative values, e.g., circular genomes. Ends are inclusive.
IRanges
uses nested containment lists under the hood to perform fast overlap and search based operations.
These classes follow a functional paradigm for accessing or setting properties, with further details discussed in functional paradigm section.
Installation¶
To get started, install the package from PyPI
pip install iranges
The descriptions for some of these methods come from the Bioconductor documentation.
Construct IRanges
¶
An IRanges
holds a start position and a width, and is most typically used to represent coordinates along some genomic sequence. Ends are inclusive.
from iranges import IRanges
starts = [-2, 6, 9, -4, 1, 0, -6, 10]
widths = [5, 0, 6, 1, 4, 3, 2, 3]
ir = IRanges(starts, widths)
print(ir)
IRanges object with 8 ranges and 0 metadata columns
start end width
<ndarray[int32]> <ndarray[int32]> <ndarray[int32]>
[0] -2 2 5
[1] 6 5 0
[2] 9 14 6
[3] -4 -4 1
[4] 1 4 4
[5] 0 2 3
[6] -6 -5 2
[7] 10 12 3
Accessing properties¶
Properties can be accessed directly from the object:
print("Number of intervals:", len(ir))
print("start positions:", ir.get_start())
print("width of each interval:", ir.get_width())
print("end positions:", ir.get_end())
Number of intervals: 8
start positions: [-2 6 9 -4 1 0 -6 10]
width of each interval: [5 0 6 1 4 3 2 3]
end positions: [ 2 5 14 -4 4 2 -5 12]
Just like BiocFrame, these classes offer both functional-style and property-based getters and setters.
print("start positions:", ir.start)
print("width of each interval:", ir.width)
print("end positions:", ir.end)
start positions: [-2 6 9 -4 1 0 -6 10]
width of each interval: [5 0 6 1 4 3 2 3]
end positions: [ 2 5 14 -4 4 2 -5 12]
Reduced ranges (Normality)¶
reduce
method reduces the intervals to an IRanges
where the intervals are:
not empty
not overlapping
ordered from left to right
not even adjacent (i.e. there must be a non empty gap between 2 consecutive ranges).
reduced = ir.reduce()
print(reduced)
IRanges object with 4 ranges and 0 metadata columns
start end width
<ndarray[int32]> <ndarray[int32]> <ndarray[int32]>
[0] -6 -4 3
[1] -2 4 7
[2] 6 5 0
[3] 9 14 6
Overlap operations¶
IRanges
uses nested containment lists under the hood to perform fast overlap and search-based operations.
subject = IRanges([2, 2, 10], [1, 2, 3])
query = IRanges([1, 4, 9], [5, 4, 2])
overlap = subject.find_overlaps(query)
print(overlap)
BiocFrame with 3 rows and 2 columns
self_hits query_hits
<ndarray[int64]> <ndarray[int64]>
[0] 1 0
[1] 0 0
[2] 2 2
Finding neighboring ranges¶
The nearest
, precede
or follow
methods finds the nearest overlapping range along the specified direction.
query = IRanges([1, 3, 9], [2, 5, 2])
subject = IRanges([3, 5, 12], [1, 2, 1])
nearest = subject.nearest(query, select="all")
print(nearest)
BiocFrame with 4 rows and 2 columns
query_hits self_hits
<ndarray[int64]> <ndarray[int64]>
[0] 0 0
[1] 1 0
[2] 1 1
[3] 2 2
These methods typically return a list of indices from subject
for each interval in query
.
coverage¶
The coverage
method counts the number of overlaps for each position.
cov = subject.coverage()
print(cov)
[0. 0. 1. 0. 1. 1. 0. 0. 0. 0. 0. 1.]
Transforming ranges¶
shift
adjusts the start positions by their shift.
shifted = ir.shift(shift=10)
print(shifted)
IRanges object with 8 ranges and 0 metadata columns
start end width
<ndarray[int32]> <ndarray[int32]> <ndarray[int32]>
[0] 8 12 5
[1] 16 15 0
[2] 19 24 6
[3] 6 6 1
[4] 11 14 4
[5] 10 12 3
[6] 4 5 2
[7] 20 22 3
Other range transformation methods include narrow
, resize
, flank
, reflect
and restrict
. For example narrow
supports the adjustment of start
, end
and width
values, which should be relative to each range.
narrowed = ir.narrow(start=4, width=2)
print(narrowed)
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
Cell In[9], line 1
----> 1 narrowed = ir.narrow(start=4, width=2)
2 print(narrowed)
File ~/work/IRanges/IRanges/.tox/docs/lib/python3.12/site-packages/iranges/IRanges.py:1064, in IRanges.narrow(self, start, width, end, in_place)
1061 return output
1063 sew = SEWWrangler(output._width, start, end, width, translate_negative=True, allow_nonnarrowing=False)
-> 1064 window_starts, window_widths = sew.solve()
1066 output._start = output._start + window_starts - 1
1067 output._width = window_widths
File ~/work/IRanges/IRanges/.tox/docs/lib/python3.12/site-packages/iranges/sew_handler.py:109, in SEWWrangler.solve(self)
107 out_widths = self.width
108 # Validate after computing
--> 109 self._validate_narrowing(out_starts, out_widths)
111 else:
112 # Only width specified
113 out_widths = self.width
File ~/work/IRanges/IRanges/.tox/docs/lib/python3.12/site-packages/iranges/sew_handler.py:77, in SEWWrangler._validate_narrowing(self, starts, widths)
75 if np.any(too_wide):
76 idx = np.where(too_wide)[0][0]
---> 77 raise Exception(
78 f"solving row {idx + 1}: 'allow.nonnarrowing' is FALSE and "
79 f"the solved end ({int(ends[idx])}) is > refwidth"
80 )
Exception: solving row 2: 'allow.nonnarrowing' is FALSE and the solved end (5) is > refwidth
Disjoin intervals¶
Well as the name says, computes disjoint intervals.
disjoint = ir.disjoin()
print(disjoint)
reflect
and flank
¶
reflect
reverses each range within a set of common reference bounds.
starts = [2, 5, 1]
widths = [2, 3, 3]
x = IRanges(starts, widths)
bounds = IRanges([0, 5, 3], [11, 2, 7])
res = x.reflect(bounds=bounds)
print(res)
flank
returns ranges of a specified width that flank, to the left (default) or right, each input range. One use case of this is forming promoter regions for a set of genes.
starts = [2, 5, 1]
widths = [2, 3, 3]
x = IRanges(starts, widths)
res = x.flank(2, start=False)
print(res)
Set operations¶
IRanges
supports most interval set operations. For example, to compute gaps
:
gaps = ir.gaps()
print(gaps)
Or Perform interval set operations, e…g union
, intersection
, disjoin
:
x = IRanges([1, 5, -2, 0, 14], [10, 5, 6, 12, 4])
y = IRanges([14, 0, -5, 6, 18], [7, 3, 8, 3, 3])
intersection = x.intersect(y)
print(intersection)