IRanges: Interval arithmetic

Python implementation of the IRanges Bioconductor package.

An IRanges holds a start position and a width, and is typically used to represent coordinates along a genomic sequence. The interpretation of the start position depends on the application; for sequences, the start is usually a 1-based position, but other use cases may allow zero or even negative values, e.g., circular genomes. Ends are inclusive.

IRanges uses nested containment lists under the hood to perform fast overlap and search based operations.

These classes follow a functional paradigm for accessing or setting properties, with further details discussed in functional paradigm section.

Installation

To get started, install the package from PyPI

pip install iranges

The descriptions for some of these methods come from the Bioconductor documentation.

Construct IRanges

An IRanges holds a start position and a width, and is most typically used to represent coordinates along some genomic sequence. Ends are inclusive.

from iranges import IRanges

starts = [-2, 6, 9, -4, 1, 0, -6, 10]
widths = [5, 0, 6, 1, 4, 3, 2, 3]
ir = IRanges(starts, widths)

print(ir)
IRanges object with 8 ranges and 0 metadata columns
               start              end            width
    <ndarray[int32]> <ndarray[int32]> <ndarray[int32]>
[0]               -2                2                5
[1]                6                5                0
[2]                9               14                6
[3]               -4               -4                1
[4]                1                4                4
[5]                0                2                3
[6]               -6               -5                2
[7]               10               12                3

Accessing properties

Properties can be accessed directly from the object:

print("Number of intervals:", len(ir))

print("start positions:", ir.get_start())
print("width of each interval:", ir.get_width())
print("end positions:", ir.get_end())
Number of intervals: 8
start positions: [-2  6  9 -4  1  0 -6 10]
width of each interval: [5 0 6 1 4 3 2 3]
end positions: [ 2  5 14 -4  4  2 -5 12]

Just like BiocFrame, these classes offer both functional-style and property-based getters and setters.

print("start positions:", ir.start)
print("width of each interval:", ir.width)
print("end positions:", ir.end)
start positions: [-2  6  9 -4  1  0 -6 10]
width of each interval: [5 0 6 1 4 3 2 3]
end positions: [ 2  5 14 -4  4  2 -5 12]

Reduced ranges (Normality)

reduce method reduces the intervals to an IRanges where the intervals are:

  • not empty

  • not overlapping

  • ordered from left to right

  • not even adjacent (i.e. there must be a non empty gap between 2 consecutive ranges).

reduced = ir.reduce()
print(reduced)
IRanges object with 4 ranges and 0 metadata columns
               start              end            width
    <ndarray[int32]> <ndarray[int32]> <ndarray[int32]>
[0]               -6               -4                3
[1]               -2                4                7
[2]                6                5                0
[3]                9               14                6

Overlap operations

IRanges uses nested containment lists under the hood to perform fast overlap and search-based operations.

subject = IRanges([2, 2, 10], [1, 2, 3])
query = IRanges([1, 4, 9], [5, 4, 2])

overlap = subject.find_overlaps(query)
print(overlap)
BiocFrame with 3 rows and 2 columns
           self_hits       query_hits
    <ndarray[int64]> <ndarray[int64]>
[0]                1                0
[1]                0                0
[2]                2                2

Finding neighboring ranges

The nearest, precede or follow methods finds the nearest overlapping range along the specified direction.

query = IRanges([1, 3, 9], [2, 5, 2])
subject = IRanges([3, 5, 12], [1, 2, 1])

nearest = subject.nearest(query, select="all")
print(nearest)
BiocFrame with 4 rows and 2 columns
          query_hits        self_hits
    <ndarray[int64]> <ndarray[int64]>
[0]                0                0
[1]                1                0
[2]                1                1
[3]                2                2

These methods typically return a list of indices from subject for each interval in query.

coverage

The coverage method counts the number of overlaps for each position.

cov = subject.coverage()
print(cov)
[0. 0. 1. 0. 1. 1. 0. 0. 0. 0. 0. 1.]

Transforming ranges

shift adjusts the start positions by their shift.

shifted = ir.shift(shift=10)
print(shifted)
IRanges object with 8 ranges and 0 metadata columns
               start              end            width
    <ndarray[int32]> <ndarray[int32]> <ndarray[int32]>
[0]                8               12                5
[1]               16               15                0
[2]               19               24                6
[3]                6                6                1
[4]               11               14                4
[5]               10               12                3
[6]                4                5                2
[7]               20               22                3

Other range transformation methods include narrow, resize, flank, reflect and restrict. For example narrow supports the adjustment of start, end and width values, which should be relative to each range.

narrowed = ir.narrow(start=4, width=2)
print(narrowed)
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
Cell In[9], line 1
----> 1 narrowed = ir.narrow(start=4, width=2)
      2 print(narrowed)

File ~/work/IRanges/IRanges/.tox/docs/lib/python3.12/site-packages/iranges/IRanges.py:1064, in IRanges.narrow(self, start, width, end, in_place)
   1061     return output
   1063 sew = SEWWrangler(output._width, start, end, width, translate_negative=True, allow_nonnarrowing=False)
-> 1064 window_starts, window_widths = sew.solve()
   1066 output._start = output._start + window_starts - 1
   1067 output._width = window_widths

File ~/work/IRanges/IRanges/.tox/docs/lib/python3.12/site-packages/iranges/sew_handler.py:109, in SEWWrangler.solve(self)
    107     out_widths = self.width
    108     # Validate after computing
--> 109     self._validate_narrowing(out_starts, out_widths)
    111 else:
    112     # Only width specified
    113     out_widths = self.width

File ~/work/IRanges/IRanges/.tox/docs/lib/python3.12/site-packages/iranges/sew_handler.py:77, in SEWWrangler._validate_narrowing(self, starts, widths)
     75 if np.any(too_wide):
     76     idx = np.where(too_wide)[0][0]
---> 77     raise Exception(
     78         f"solving row {idx + 1}: 'allow.nonnarrowing' is FALSE and "
     79         f"the solved end ({int(ends[idx])}) is > refwidth"
     80     )

Exception: solving row 2: 'allow.nonnarrowing' is FALSE and the solved end (5) is > refwidth

Disjoin intervals

Well as the name says, computes disjoint intervals.

disjoint = ir.disjoin()
print(disjoint)

reflect and flank

reflect reverses each range within a set of common reference bounds.

starts = [2, 5, 1]
widths = [2, 3, 3]
x = IRanges(starts, widths)
bounds = IRanges([0, 5, 3], [11, 2, 7])

res = x.reflect(bounds=bounds)
print(res)

flank returns ranges of a specified width that flank, to the left (default) or right, each input range. One use case of this is forming promoter regions for a set of genes.

starts = [2, 5, 1]
widths = [2, 3, 3]
x = IRanges(starts, widths)

res = x.flank(2, start=False)
print(res)

Set operations

IRanges supports most interval set operations. For example, to compute gaps:

gaps = ir.gaps()
print(gaps)

Or Perform interval set operations, e…g union, intersection, disjoin:

x = IRanges([1, 5, -2, 0, 14], [10, 5, 6, 12, 4])
y = IRanges([14, 0, -5, 6, 18], [7, 3, 8, 3, 3])

intersection = x.intersect(y)
print(intersection)

Further reading