iranges package

Submodules

iranges.IRanges module

class iranges.IRanges.IRanges(start=[], width=[], names=None, mcols=None, metadata=None, validate=True)[source]

Bases: object

A collection of integer ranges, equivalent to the IRanges class from the Bioconductor package of the same name.

This holds a start position and a width, and is most typically used to represent coordinates along some genomic sequence. The interpretation of the start position depends on the application; for sequences, the start is usually a 1-based position, but other use cases may allow zero or even negative values.

__copy__()[source]

Shallow copy of the object.

Return type:

IRanges

Returns:

Same type as the caller, a shallow copy of this object.

__deepcopy__(memo)[source]

Deep copy of the object.

Parameters:

memo – Passed to internal deepcopy() calls.

Return type:

IRanges

Returns:

Same type as the caller, a deep copy of this object.

__getitem__(subset)[source]

Subset the IRanges.

Parameters:

subset (Union[Sequence, int, str, bool, slice, range]) – Integer indices, a boolean filter, or (if the current object is named) names specifying the ranges to be extracted, see normalize_subscript().

Return type:

IRanges

Returns:

A new IRanges object containing the ranges of interest.

__init__(start=[], width=[], names=None, mcols=None, metadata=None, validate=True)[source]
Parameters:
  • start (Sequence[int]) – Sequence of integers containing the start position for each range. All values should fall within the range that can be represented by a 32-bit signed integer.

  • width (Sequence[int]) – Sequence of integers containing the width for each range. This should be of the same length as start. All values should be non-negative and fall within the range that can be represented by a 32-bit signed integer. Similarly, start + width should not exceed the range of a 32-bit signed integer.

  • names (Optional[Sequence[str]]) – Sequence of strings containing the name for each range. This should have length equal to start and should only contain strings. If no names are present, None may be supplied instead.

  • mcols (Optional[BiocFrame]) – A data frame containing additional metadata columns for each range. This should have number of rows equal to the length of start. If None, defaults to a zero-column data frame.

  • metadata (Optional[dict]) – Additional metadata. If None, defaults to an empty dictionary.

  • validate (bool) – Whether to validate the arguments, internal use only.

__iter__()[source]

Iterator over intervals.

Return type:

IRangesIter

__len__()[source]
Return type:

int

Returns:

Length of this object.

__repr__()[source]

Return repr(self).

Return type:

str

__setitem__(args, value)[source]

Add or update positions (in-place operation).

Parameters:
  • subset – Integer indices, a boolean filter, or (if the current object is named) names specifying the ranges to be replaced, see normalize_subscript().

  • value (IRanges) – An IRanges object of length equal to the number of ranges to be replaced, as specified by subset.

Returns:

Specified ranges are replaced by value in the current object.

clip_intervals(shift=0, width=None, adjust_width_by_shift=False)[source]

Clip intervals. Starts are always clipped to positive interval ranges (1, Inf).

If width is specified, the intervals are clipped to (1, width).

Parameters:
  • shift (Union[int, List[int], ndarray]) – Shift all starts before clipping. Defaults to 0.

  • width (Union[int, List[int], ndarray, None]) – Clip width of each interval. Defaults to None.

  • adjust_width_by_shift (bool) – Whether to adjust the width based on shift. Defaults to False.

Return type:

IRanges

Returns:

A IRanges object, with the clipped intervals.

count_overlaps(query, query_type='any', max_gap=-1, min_overlap=1, delete_index=True)[source]

Count number of overlaps with query IRanges object.

Parameters:
  • query (IRanges) – Query IRanges.

  • query_type (Literal['any', 'start', 'end', 'within']) –

    Overlap query type, must be one of

    • ”any”: Any overlap is good

    • ”start”: Overlap at the beginning of the intervals

    • ”end”: Must overlap at the end of the intervals

    • ”within”: Fully contain the query interval

    Defaults to “any”.

  • max_gap (int) – Maximum gap allowed in the overlap. Defaults to -1 (no gap allowed).

  • min_overlap (int) – Minimum overlap with query. Defaults to 1.

  • delete_index (bool) – Delete the cached ncls index. Internal use only.

Raises:

TypeError – If query is not an IRanges object.

Return type:

ndarray

Returns:

Numpy vector with the number of overlaps for each range in query.

coverage(shift=0, width=None, weight=1)[source]

Calculate coverage, for each position, counts the number of intervals that cover it.

Parameters:
Raises:

TypeError – If ‘weight’ is not a number. If ‘width’ is not an expected type.

Return type:

ndarray

Returns:

A numpy array with the coverage vector.

disjoin(with_reverse_map=False)[source]

Calculate disjoint intervals.

Parameters:

with_reverse_map (bool) – Whether to return a map of indices back to the original object. Defaults to False.

Return type:

IRanges

Returns:

A new IRanges containing disjoint intervals.

distance(query)[source]

Calculate the pair-wise distance with intervals in query.

Parameters:

query (IRanges) – Query IRanges.

Return type:

ndarray

Returns:

Numpy vector containing distances for each interval in query.

classmethod empty()[source]

Create an zero-length IRanges object.

Returns:

same type as caller, in this case a IRanges.

property end: ndarray

Get all end positions (read-only).

Returns:

NumPy array of 32-bit signed integers containing the end position (not inclusive) for all ranges.

find_overlaps(query, query_type='any', select='all', max_gap=-1, min_overlap=1, delete_index=True)[source]

Find overlaps with query IRanges object.

Parameters:
  • query (IRanges) – Query IRanges.

  • query_type (Literal['any', 'start', 'end', 'within']) –

    Overlap query type, must be one of

    • ”any”: Any overlap is good

    • ”start”: Overlap at the beginning of the intervals

    • ”end”: Must overlap at the end of the intervals

    • ”within”: Fully contain the query interval

    Defaults to “any”.

  • select (Literal['all', 'first', 'last', 'arbitrary']) – Determine what hit to choose when there are multiple hits for an interval in subject.

  • max_gap (int) – Maximum gap allowed in the overlap. Defaults to -1 (no gap allowed).

  • min_overlap (int) – Minimum overlap with query. Defaults to 1.

  • delete_index (bool) – Delete the cached ncls index. Internal use only.

Raises:

TypeError – If query is not an IRanges object.

Return type:

List[List[int]]

Returns:

A List with the same length as the number of intervals query. Each element is a list of indices that overlap or, None if there are no overlaps.

flank(width, start=True, both=False, in_place=False)[source]

Compute flanking ranges for each range. The logic is from the IRanges package.

If start is True for a given range, the flanking occurs at the start, otherwise the end. The widths of the flanks are given by the width parameter.

width can be negative, in which case the flanking region is reversed so that it represents a prefix or suffix of the range.

Usage:

ir.flank(3, True), where “x” indicates a range in ir and “-” indicates the resulting flanking region:

—xxxxxxx

If start were False, the range in ir becomes

xxxxxxx—

For negative width, i.e. ir.flank(x, -3, FALSE), where “*” indicates the overlap between “x” and the result:

xxxx***

If both is True, then, for all ranges in “x”, the flanking regions are extended into (or out of, if width is negative) the range, so that the result straddles the given endpoint and has twice the width given by width.

This is illustrated below for ir.flank(3, both=TRUE):

***xxxx

Parameters:
  • width (int) – Width to flank by. May be negative.

  • start (bool) – Whether to only flank starts. Defaults to True.

  • both (bool) – Whether to flank both starts and ends. Defaults to False.

  • in_place (bool) – Whether to modify the object in place. Defaults to False.

Return type:

IRanges

Returns:

If in_place = False, a new IRanges is returned with the flanked intervals. Otherwise, the current object is directly modified and a reference to it is returned.

follow(query, select='all', delete_index=True)[source]

Search nearest positions only downstream that overlap with each range in query.

Parameters:
  • query (IRanges) – Query IRanges to find nearest positions.

  • select (Literal['all', 'last']) – Determine what hit to choose when there are multiple hits for an interval in query.

  • delete_index (bool) – Delete the cached ncls index. Internal use only.

Raises:

TypeError – If query is not of type IRanges.

Return type:

List[List[int]]

Returns:

A List with the same lenth as the number of intervals in query. Each element may contain indices nearest to the interval or None if there are no nearest intervals.

classmethod from_pandas(input)[source]

Create a IRanges from a DataFrame object.

Parameters:

input (pandas.DataFrame) – Input data must contain columns ‘start’ and ‘width’.

Return type:

IRanges

Returns:

A IRanges object.

classmethod from_polars(input)[source]

Create a IRanges from a DataFrame object.

Parameters:

input (polars.DataFrame) – Input data must contain columns ‘start’ and ‘width’.

Return type:

IRanges

Returns:

A IRanges object.

gaps(start=None, end=None)[source]

Gaps returns an IRanges object representing the set of integers that remain after the intervals are removed specified by the start and end arguments.

Parameters:
  • start (Optional[int]) – Restrict start position. Defaults to 1.

  • end (Optional[int]) – Restrict end position. Defaults to None.

Return type:

IRanges

Returns:

A new IRanges is with the gap regions.

gaps_numpy(start=None, end=None)[source]

Gaps returns an IRanges object representing the set of integers that remain after the intervals are removed specified by the start and end arguments.

This function uses a vectorized approach using numpy vectors. The normal gaps() method performs better in most cases.

Parameters:
  • start (Optional[int]) – Restrict start position. Defaults to 1.

  • end (Optional[int]) – Restrict end position. Defaults to None.

Return type:

IRanges

Returns:

A new IRanges is with the gap regions.

get_end()[source]

Get all end positions.

Return type:

ndarray

Returns:

NumPy array of 32-bit signed integers containing the end position (not inclusive) for all ranges.

get_mcols()[source]

Get metadata about ranges.

Return type:

BiocFrame

Returns:

Data frame containing additional metadata columns for all ranges.

get_metadata()[source]

Get additional metadata.

Return type:

dict

Returns:

Dictionary containing additional metadata.

get_names()[source]

Get all names.

Return type:

Optional[Names]

Returns:

List containing the names for all ranges, or None if no names are present.

get_row(index_or_name)[source]

Access a row by index or row name.

Parameters:

index_or_name (Union[str, int]) –

Integer index of the row to access.

Alternatively, you may provide a string specifying the row name to access, only if names are available.

Raises:
  • ValueError – If index_or_name is not in row names. If the integer index is greater than the number of rows.

  • TypeError – If index_or_name is neither a string nor an integer.

Returns:

A sliced IRanges object.

Return type:

IRanges

get_start()[source]

Get all start positions.

Return type:

ndarray

Returns:

NumPy array of 32-bit signed integers containing the start positions for all ranges.

get_width()[source]

Get width of each interval.

Return type:

ndarray

Returns:

NumPy array of 32-bit signed integers containing the widths for all ranges.

intersect(other)[source]

Find intersecting intervals with other.

Parameters:

other (IRanges) – An IRanges object.

Raises:

TypeError – If other is not IRanges.

Return type:

IRanges

Returns:

A new IRanges object with all intersecting intervals.

intersect_ncls(other, delete_index=True)[source]

Find intersecting intervals with other. Uses the NCLS index.

Parameters:

other (IRanges) – An IRanges object.

Raises:

TypeError – If other is not IRanges.

Return type:

IRanges

Returns:

A new IRanges object with all intersecting intervals.

property mcols: BiocFrame

Get metadata about ranges.

Returns:

Data frame containing additional metadata columns for all ranges.

property metadata: dict

Get additional metadata.

Returns:

Dictionary containing additional metadata.

property names: Names | None

Get all names.

Returns:

List containing the names for all ranges, or None if no names are available.

narrow(start=None, width=None, end=None, in_place=False)[source]

Narrow genomic positions by provided start, width and end parameters.

Important: These arguments are relative shift in positions for each range.

Parameters:
Raises:

ValueError – If width is provided, either start or end must be provided. Provide two of the three parameters - start, end and width but not all.

Return type:

IRanges

Returns:

If in_place = False, a new IRanges is returned with the narrow intervals. Otherwise, the current object is directly modified and a reference to it is returned.

nearest(query, select='all', delete_index=True)[source]

Search nearest positions both upstream and downstream that overlap with each range in query.

Parameters:
  • query (IRanges) – Query IRanges to find nearest positions.

  • select (Literal['all', 'arbitrary']) – Determine what hit to choose when there are multiple hits for an interval in query.

  • delete_index (bool) – Delete the cached ncls index. Internal use only.

Raises:

TypeError – If query is not of type IRanges.

Return type:

List[List[int]]

Returns:

A List with the same lenth as the number of intervals in query. Each element may contain indices nearest to the interval or None if there are no nearest intervals.

order(decreasing=False)[source]

Get the order of indices for sorting.

Parameters:

decreasing (bool) – Whether to sort in descending order. Defaults to False.

Return type:

ndarray

Returns:

NumPy vector containing index positions in the sorted order.

overlap_indices(start=None, end=None)[source]

Find overlaps with the start and end positions.

Parameters:
  • start (Optional[int]) – Start position. Defaults to None.

  • end (Optional[int]) – End position. Defaults to None.

Return type:

ndarray

Returns:

Numpy vector containing indices that overlap with the given range.

precede(query, select='all', delete_index=True)[source]

Search nearest positions only downstream that overlap with each range in query.

Parameters:
  • query (IRanges) – Query IRanges to find nearest positions.

  • select (Literal['all', 'first']) – Determine what hit to choose when there are multiple hits for an interval in query.

  • delete_index (bool) – Delete the cached ncls index. Internal use only.

Raises:

TypeError – If query is not of type IRanges.

Return type:

List[List[int]]

Returns:

A List with the same lenth as the number of intervals in query. Each element may contain indices nearest to the interval or None if there are no nearest intervals.

promoters(upstream=2000, downstream=200, in_place=False)[source]

Extend intervals to promoter regions.

Generates promoter ranges relative to the transcription start site (TSS), where TSS is start(x). The promoter range is expanded around the TSS according to the upstream and downstream arguments. Upstream represents the number of nucleotides in the 5’ direction and downstream the number in the 3’ direction. The full range is defined as, (start(x) - upstream) to (start(x) + downstream - 1).

Parameters:
  • upstream (int) – Number of positions to extend in the 5’ direction. Defaults to 2000.

  • downstream (int) – Number of positions to extend in the 3’ direction. Defaults to 200.

  • in_place (bool) – Whether to modify the object in place. Defaults to False.

Return type:

IRanges

Returns:

If in_place = False, a new IRanges is returned with the promoter intervals. Otherwise, the current object is directly modified and a reference to it is returned.

range()[source]

Concatenate all intervals.

Return type:

IRanges

Returns:

An new IRanges instance with a single range, the minimum of all the start positions, Maximum of all end positions.

reduce(with_reverse_map=False, drop_empty_ranges=False, min_gap_width=1)[source]

Reduce orders the ranges, then merges overlapping or adjacent ranges.

Parameters:
  • with_reverse_map (bool) – Whether to return map of indices back to original object. Defaults to False.

  • drop_empty_ranges (bool) – Whether to drop empty ranges. Defaults to False.

  • min_gap_width (int) – Ranges separated by a gap of at least min_gap_width positions are not merged. Defaults to 1.

Return type:

IRanges

Returns:

A new IRanges object with reduced intervals.

reflect(bounds, in_place=False)[source]

Reverses each range in x relative to the corresponding range in bounds.

Reflection preserves the width of a range, but shifts it such the distance from the left bound to the start of the range becomes the distance from the end of the range to the right bound. This is illustrated below, where x represents a range in x and [ and ] indicate the bounds:

[..xxx…..] becomes […..xxx..]

Parameters:
  • bounds (IRanges) – IRanges with the same length as the current object specifying the bounds.

  • in_place (bool) – Whether to modify the object in place. Defaults to False.

Return type:

IRanges

Returns:

If in_place = False, a new IRanges is returned with the reflected intervals. Otherwise, the current object is directly modified and a reference to it is returned.

resize(width, fix='start', in_place=False)[source]

Resize ranges to the specified width where either the start, end, or center is used as an anchor.

Parameters:
  • width (Union[int, List[int], ndarray]) – Width to resize, must be non-negative!

  • fix (Union[Literal['start', 'end', 'center'], List[Literal['start', 'end', 'center']]]) –

    Fix positions by “start”, “end”, or “center”.

    Alternatively, fix may be a list with the same size as this IRanges object, denoting what to use as an anchor for each interval.

    Defaults to “start”.

  • in_place (bool) – Whether to modify the object in place. Defaults to False.

Raises:

ValueError – If parameter fix is neither start, end, nor center. If width is negative.

Return type:

IRanges

Returns:

If in_place = False, a new IRanges is returned with the resized intervals. Otherwise, the current object is directly modified and a reference to it is returned.

restrict(start=None, end=None, keep_all_ranges=False)[source]

Restrict ranges to a given start and end positions.

Parameters:
Return type:

IRanges

Returns:

A new IRanges with the restricted intervals.

set_mcols(mcols, in_place=False)[source]

Set new metadata about ranges.

Parameters:
  • mcols (Optional[BiocFrame]) – Data frame of additional columns, see the constructor for details.

  • in_place (bool) – Whether to modify the object in place.

Return type:

IRanges

Returns:

If in_place = False, a new IRanges is returned with the modified metadata columns. Otherwise, the current object is directly modified and a reference to it is returned.

set_metadata(metadata, in_place=False)[source]

Set or replace metadata.

Parameters:
  • metadata (Optional[dict]) – Additional metadata.

  • in_place (bool) – Whether to modify the object in place.

Return type:

IRanges

Returns:

If in_place = False, a new IRanges is returned with the modified metadata. Otherwise, the current object is directly modified and a reference to it is returned.

set_names(names, in_place=False)[source]
Parameters:
  • names (Optional[Sequence[str]]) – Sequence of names or None, see the constructor for details.

  • in_place (bool) – Whether to modify the object in place.

Return type:

IRanges

Returns:

If in_place = False, a new IRanges is returned with the modified names. Otherwise, the current object is directly modified and a reference to it is returned.

set_start(start, in_place=False)[source]

Modify start positions (in-place operation).

Parameters:
  • start (Sequence[int]) – Sequence of start positions, see the constructor for details.

  • in_place (bool) – Whether to modify the object in place.

Return type:

IRanges

Returns:

If in_place = False, a new IRanges is returned with the modified start positions. Otherwise, the current object is directly modified and a reference to it is returned.

set_width(width, in_place=False)[source]
Parameters:
  • width (Sequence[int]) – Sequence of widths, see the constructor for details.

  • in_place (bool) – Whether to modify the object in place.

Return type:

IRanges

Returns:

If in_place = False, a new IRanges is returned with the modified widths. Otherwise, the current object is directly modified and a reference to it is returned.

setdiff(other)[source]

Find set difference with other.

Parameters:

other (IRanges) – An IRanges object.

Raises:

TypeError – If other is not IRanges.

Return type:

IRanges

Returns:

A new IRanges object.

shift(shift, in_place=False)[source]

Shifts all the intervals by the amount specified by the shift argument.

Parameters:
  • shift (Union[int, List[int], ndarray]) – Amount to shift by.

  • in_place (bool) – Whether to modify the object in place. Defaults to False.

Return type:

IRanges

Returns:

If in_place = False, a new IRanges is returned with the shifted intervals. Otherwise, the current object is directly modified and a reference to it is returned.

sort(decreasing=False, in_place=False)[source]

Sort the intervals.

Parameters:
  • decreasing (bool) – Whether to sort in descending order. Defaults to False.

  • in_place (bool) – Whether to modify the object in place. Defaults to False.

Return type:

IRanges

Returns:

If in_place = False, a new IRanges is returned with the sorted intervals. Otherwise, the current object is directly modified and a reference to it is returned.

property start: ndarray

Get all start positions.

Returns:

NumPy array of 32-bit signed integers containing the start positions for all ranges.

subset_by_overlaps(query, query_type='any', max_gap=-1, min_overlap=1, delete_index=True)[source]

Subset by overlapping intervals in query.

Parameters:
  • query (IRanges) – Query IRanges object.

  • query_type (Literal['any', 'start', 'end', 'within']) –

    Overlap query type, must be one of

    • ”any”: Any overlap is good

    • ”start”: Overlap at the beginning of the intervals

    • ”end”: Must overlap at the end of the intervals

    • ”within”: Fully contain the query interval

    Defaults to “any”.

  • max_gap (int) – Maximum gap allowed in the overlap. Defaults to -1 (no gap allowed).

  • min_overlap (int) – Minimum overlap with query. Defaults to 1.

  • delete_index (bool) – Delete the cached ncls index. Internal use only.

Raises:

TypeError – If query is not of type IRanges.

Return type:

IRanges

Returns:

A new IRanges object containing ranges that overlap with query.

to_pandas()[source]

Convert this IRanges object into a DataFrame.

Return type:

pandas.DataFrame

Returns:

A DataFrame object.

to_polars()[source]

Convert this IRanges object into a DataFrame.

Return type:

polars.DataFrame

Returns:

A DataFrame object.

union(other)[source]

Find union of intervals with other.

Parameters:

other (IRanges) – An IRanges object.

Raises:

TypeError – If other is not IRanges.

Return type:

IRanges

Returns:

A new IRanges object with all ranges.

property width: ndarray

Get width of each interval.

Returns:

NumPy array of 32-bit signed integers containing the widths for all ranges.

class iranges.IRanges.IRangesIter(obj)[source]

Bases: object

An iterator to IRanges.

Parameters:

obj (IRanges) – Object to iterate.

__init__(obj)[source]

Initialize the iterator.

Parameters:

obj (IRanges) – Source object to iterate.

__iter__()[source]
__next__()[source]

iranges.interval module

iranges.interval.calc_gap_and_overlap(first, second)[source]

Calculate gap and/or overlap between two intervals.

Parameters:
  • first (Tuple[int, int]) – Interval containing start and end positions. end is non-inclusive.

  • second (Tuple[int, int]) – Interval containing start and end positions. end is non-inclusive.

Return type:

Tuple[Optional[int], Optional[int]]

iranges.interval.create_np_interval_vector(intervals, with_reverse_map=False, force_size=None, dont_sum=False, value=1)[source]

Represent intervals and calculate coverage.

Parameters:
  • intervals (IRanges) – Input intervals.

  • with_reverse_map (bool) – Return map of indices? Defaults to False.

  • force_size (Optional[int]) – Force size of the array.

  • dont_sum (bool) – Do not sum. Defaults to False.

  • value (Union[int, float]) – Default value to increment. Defaults to 1.

Return type:

Tuple[ndarray, Optional[List]]

Returns:

A numpy array representing coverage from the intervals and optionally the index map.

Module contents