iranges package¶
Submodules¶
iranges.IRanges module¶
- class iranges.IRanges.IRanges(start=[], width=[], names=None, mcols=None, metadata=None, validate=True)[source]¶
Bases:
object
A collection of integer ranges, equivalent to the
IRanges
class from the Bioconductor package of the same name. It enables efficient storage and manipulation of genomic intervals defined by start positions and widths.Each range consists of a start position and width. For genomic sequences, the start is typically 1-based, though other applications may use zero or negative values. The width represents the length of the interval. Ends are inclusive.
- __copy__()[source]¶
Shallow copy of the object.
- Return type:
- Returns:
Same type as the caller, a shallow copy of this object.
- __deepcopy__(memo)[source]¶
Deep copy of the object.
- Parameters:
memo – Passed to internal
deepcopy()
calls.- Return type:
- Returns:
Same type as the caller, a deep copy of this object.
- __init__(start=[], width=[], names=None, mcols=None, metadata=None, validate=True)[source]¶
- Parameters:
start (
Sequence
[int
]) – Sequence of integers containing the start position for each range. All values should fall within the range that can be represented by a 32-bit signed integer.width (
Sequence
[int
]) – Sequence of integers containing the width for each range. This should be of the same length asstart
. All values should be non-negative and fall within the range that can be represented by a 32-bit signed integer. Similarly,start + width
should not exceed the range of a 32-bit signed integer.names (
Optional
[Sequence
[str
]]) – Sequence of strings containing the name for each range. This should have length equal tostart
and should only contain strings. If no names are present, None may be supplied instead.mcols (
Optional
[BiocFrame
]) – A data frame containing additional metadata columns for each range. This should have number of rows equal to the length ofstart
. If None, defaults to a zero-column data frame.metadata (
Optional
[dict
]) – Additional metadata. If None, defaults to an empty dictionary.validate (
bool
) – Whether to validate the arguments, internal use only.
- __setitem__(args, value)[source]¶
Add or update positions (in-place operation).
- Parameters:
subset – Integer indices, a boolean filter, or (if the current object is named) names specifying the ranges to be replaced, see
normalize_subscript()
.value (
IRanges
) – AnIRanges
object of length equal to the number of ranges to be replaced, as specified bysubset
.
- Returns:
Specified ranges are replaced by
value
in the current object.
- combine(*other)[source]¶
Combine multiple range objects into one.
Wrapper around
combine_sequences()
.- Return type:
- Returns:
An IRanges containing all the combined ranges.
- count_overlaps(query, query_type='any', max_gap=-1, min_overlap=0, delete_index=True)[source]¶
Count number of overlaps for each range in
query
.- Parameters:
query (
IRanges
) – Query IRanges.query_type (
Literal
['any'
,'start'
,'end'
,'within'
]) –Overlap query type, must be one of
”any”: Any overlap is good
”start”: Overlap at the beginning of the range
”end”: Must overlap at the end of the range
”within”: Fully contain the query interval
Defaults to “any”.
max_gap (
int
) – Maximum gap allowed in the overlap. Defaults to -1 (no gap allowed).min_overlap (
int
) – Minimum overlap with query. Defaults to 1.delete_index (
bool
) – Delete the cached ncls index. Internal use only.
- Return type:
- Returns:
NumPy vector with length same as number of query ranges, value represents the number of overlaps in self for each query.
- coverage(shift=None, width=None, weight=None, circle_length=None, method='auto')[source]¶
Compute weighted coverage of ranges.
- Parameters:
shift (
Optional
[ndarray
]) – Array of shift values. Defaults to None for no shift.width (
Optional
[int
]) – Maximum width to clip to. Defaults to None for no clipping.weight (
Optional
[ndarray
]) – Array of weights. Defaults to None for equal weights for all ranges (weight = 1).circle_length (
Optional
[int
]) – Length of circular sequence. Defaults to None for linear sequence.method (
Literal
['auto'
,'sort'
,'hash'
,'naive'
]) – Coverage computation method. Defaults to “auto”.
- Return type:
- Returns:
NumPy array containing coverage values.
- disjoint_bins()[source]¶
Split ranges into a set of bins so that the ranges in each bin are disjoint.
- Return type:
- Returns:
An NumPy vector indicating the bin index for each range.
- classmethod empty()[source]¶
Create an zero-length
IRanges
object.- Returns:
Same type as caller, in this case a
IRanges
.
- property end: ndarray¶
Get all end positions (read-only).
- Returns:
NumPy array of 32-bit signed integers containing the end position for all ranges.
- find_overlaps(query, query_type='any', select='all', max_gap=-1, min_overlap=0, delete_index=True)[source]¶
Find overlaps with
query
.- Parameters:
query (
IRanges
) – Query IRanges.query_type (
Literal
['any'
,'start'
,'end'
,'within'
]) –Overlap query type, must be one of
”any”: Any overlap is good
”start”: Overlap at the beginning of the range
”end”: Must overlap at the end of the range
”within”: Fully contain the query interval
Defaults to “any”.
select (
Literal
['all'
,'first'
,'last'
,'arbitrary'
]) –Determine what hit to choose when there are multiple hits for a query range.
Must be one of “all”, “first”, “last”, “arbitrary”.
Defaults to “all”.
max_gap (
int
) – Maximum gap allowed in the overlap. Defaults to -1 (no gap allowed).min_overlap (
int
) – Minimum overlap with query. Defaults to 1.delete_index (
bool
) – Delete the cached ncls index. Internal use only.
- Returns:
query_hits: Indices into query ranges
self_hits: Corresponding indices into self ranges that are upstream
Each row represents a query-subject pair where subject precedes query.
- Return type:
A BiocFrame with two columns
- flank(width, start=True, both=False, in_place=False)[source]¶
Compute flanking ranges for each range. The logic is from the IRanges package.
If
start
isTrue
for a given range, the flanking occurs at the start, otherwise the end. The widths of the flanks are given by thewidth
parameter.width
can be negative, in which case the flanking region is reversed so that it represents a prefix or suffix of the range.Notes
ir.flank(3, True), where “x” indicates a range in
ir
and “-” indicates the resulting flanking region:—xxxxxxx
- If
start
wereFalse
, the range inir
becomes xxxxxxx—
For negative width, i.e. ir.flank(x, -3, FALSE), where “*” indicates the overlap between “x” and the result:
xxxx***
If
both
isTrue
, then, for all ranges in “x”, the flanking regions are extended into (or out of, ifwidth
is negative) the range, so that the result straddles the given endpoint and has twice the width given by width.Checkout the documentation on the Bioc package for more details.
- Parameters:
- Return type:
- Returns:
If
in_place = False
, a newIRanges
is returned with the flanked ranges. Otherwise, the current object is directly modified and a reference to it is returned.
- If
- follow(query, select='last')[source]¶
Find nearest positions that are downstream/follow each query range.
- Parameters:
- Returns:
- A numpy array of integers with length matching query, containing indices
into self for the closest downstream position of each query range. Value may be None if there are no matches.
- If select=”all”:
A BiocFrame with two columns: - query_hits: Indices into query ranges - self_hits: Corresponding indices into self ranges that are upstream Each row represents a query-self pair where self follows query.
- Return type:
If select=”last”
- classmethod from_pandas(input)[source]¶
Create an
IRanges
object from aDataFrame
.- Parameters:
input – Input data must contain columns ‘start’ and ‘width’.
- Return type:
- Returns:
A
IRanges
object.
- classmethod from_polars(input)[source]¶
Create an
IRanges
object from aDataFrame
.- Parameters:
input – Input data must contain columns ‘start’ and ‘width’.
- Return type:
- Returns:
A
IRanges
object.
- gaps(start=None, end=None)[source]¶
Gaps returns an
IRanges
object representing the set of intervals that remain after the ranges are removed specified by the start and end arguments.
- get_end()[source]¶
Get end positions (inclusive).
- Return type:
- Returns:
NumPy array of 32-bit signed integers containing the end position for all ranges.
- get_mcols()[source]¶
Get metadata about ranges.
- Return type:
- Returns:
Data frame containing additional metadata columns for all ranges.
- get_metadata()[source]¶
Get additional metadata.
- Return type:
- Returns:
Dictionary containing additional metadata.
- get_row(index_or_name)[source]¶
Access a row by index or row name.
- Parameters:
index_or_name (
Union
[str
,int
]) –Integer index of the row to access.
Alternatively, you may provide a string specifying the row name to access, only if
names
are available.- Raises:
ValueError – If
index_or_name
is not in row names. If the integer index is greater than the number of rows.TypeError – If
index_or_name
is neither a string nor an integer.
- Returns:
A sliced IRanges object.
- Return type:
- get_start()[source]¶
Get start positions.
- Return type:
- Returns:
NumPy array of 32-bit signed integers containing the start positions for all ranges.
- get_width()[source]¶
Get widths.
- Return type:
- Returns:
NumPy array of 32-bit signed integers containing the widths for all ranges.
- intersect_ncls(other, delete_index=True)[source]¶
Find intersecting ranges with other. Uses the NCLS index.
- is_disjoint()[source]¶
Check if the ranges are disjoint.
- Return type:
- Returns:
True if all ranges are non-overlapping, otherwise False.
- property mcols: BiocFrame¶
Get metadata.
- Returns:
Data frame containing additional metadata columns for all ranges.
- property metadata: dict¶
Get additional metadata.
- Returns:
Dictionary containing additional metadata.
- property names: Names | None¶
Get names.
- Returns:
List containing the names for all ranges, or None if no names are available.
- narrow(start=None, width=None, end=None, in_place=False)[source]¶
Narrow ranges.
Important: These arguments are relative shift in positions for each range.
- Parameters:
start (
Union
[int
,List
[int
],ndarray
,None
]) – Relative start position. Defaults to None.width (
Union
[int
,List
[int
],ndarray
,None
]) – Width of each interval position. Defaults to None.end (
Union
[int
,List
[int
],ndarray
,None
]) – Relative end position. Defaults to None.in_place (
bool
) – Whether to modify the object in place. Defaults to False.
- Return type:
- Returns:
If
in_place = False
, a newIRanges
is returned with the narrowed ranges. Otherwise, the current object is directly modified and a reference to it is returned.
- nearest(query, select='arbitrary', delete_index=True)[source]¶
Find nearest ranges in both directions.
- Parameters:
- Returns:
- A numpy array of integers with length matching query, containing indices
into self for the closest for each query range. Value may be None if there are no matches.
- If select=”all”:
A BiocFrame with two columns: - query_hits: Indices into query ranges - self_hits: Corresponding indices into self ranges that are upstream Each row represents a query-subject pair where subject is nearest to query.
- Return type:
If select=”arbitrary”
- precede(query, select='first')[source]¶
Find nearest positions that are upstream/precede each query range.
- Parameters:
- Returns:
- A numpy array of integers with length matching query, containing indices
into self for the closest upstream position of each query range. Value may be None if there are no matches.
- If select=”all”:
A BiocFrame with two columns: - query_hits: Indices into query ranges - self_hits: Corresponding indices into self ranges that are upstream Each row represents a query-self pair where self precedes query.
- Return type:
If select=”first”
- promoters(upstream=2000, downstream=200, in_place=False)[source]¶
Get promoter regions (upstream and downstream of TSS sites).
Generates promoter ranges relative to the transcription start site (TSS), where TSS is start(x). The promoter range is expanded around the TSS according to the upstream and downstream arguments. Upstream represents the number of nucleotides in the 5’ direction and downstream the number in the 3’ direction. The full range is defined as, (start(x) - upstream) to (start(x) + downstream - 1).
- Parameters:
- Return type:
- Returns:
If
in_place = False
, a newIRanges
is returned with the promoter ranges. Otherwise, the current object is directly modified and a reference to it is returned.
- range()[source]¶
Concatenate and compute the mix and max across all ranges.
- Return type:
- Returns:
An new
IRanges
instance with a single range, the minimum of all the start positions, Maximum of all end positions.
- reduce(with_reverse_map=False, drop_empty_ranges=False, min_gap_width=1)[source]¶
Reduce orders the ranges, then merges overlapping or adjacent ranges.
- Parameters:
- Return type:
- Returns:
A new
IRanges
object with reduced ranges.
- reflect(bounds, in_place=False)[source]¶
Reverses each range in x relative to the corresponding range in bounds.
Reflection preserves the width of a range, but shifts it such the distance from the left bound to the start of the range becomes the distance from the end of the range to the right bound. This is illustrated below, where x represents a range in x and [ and ] indicate the bounds:
[..xxx…..] becomes […..xxx..]
- Parameters:
- Return type:
- Returns:
If
in_place = False
, a newIRanges
is returned with the reflected ranges. Otherwise, the current object is directly modified and a reference to it is returned.
- resize(width, fix='start', in_place=False)[source]¶
Resize ranges to the specified
width
where either thestart
,end
, orcenter
is used as an anchor.- Parameters:
width (
Union
[int
,List
[int
],ndarray
]) – Width to resize, must be non-negative!fix (
Union
[Literal
['start'
,'end'
,'center'
],List
[Literal
['start'
,'end'
,'center'
]]]) –Fix positions by “start”, “end”, or “center”.
Alternatively, fix may be a list with the same size as this IRanges object, denoting what to use as an anchor for each interval.
Defaults to “start”.
in_place (
bool
) – Whether to modify the object in place. Defaults to False.
- Return type:
- Returns:
If
in_place = False
, a newIRanges
is returned with the resized ranges. Otherwise, the current object is directly modified and a reference to it is returned.
- restrict(start=None, end=None, keep_all_ranges=False)[source]¶
Restrict ranges to a given start and end positions.
- Parameters:
- Return type:
- Returns:
A new
IRanges
with the restricted ranges.
- set_mcols(mcols, in_place=False)[source]¶
Set new metadata about ranges.
- Parameters:
- Return type:
- Returns:
If
in_place = False
, a newIRanges
is returned with the modified metadata columns. Otherwise, the current object is directly modified and a reference to it is returned.
- set_metadata(metadata, in_place=False)[source]¶
Set or replace metadata.
- Parameters:
- Return type:
- Returns:
If
in_place = False
, a newIRanges
is returned with the modified metadata. Otherwise, the current object is directly modified and a reference to it is returned.
- set_names(names, in_place=False)[source]¶
- Parameters:
- Return type:
- Returns:
If
in_place = False
, a newIRanges
is returned with the modified names. Otherwise, the current object is directly modified and a reference to it is returned.
- set_start(start, in_place=False)[source]¶
Modify start positions (in-place operation).
- Parameters:
- Return type:
- Returns:
If
in_place = False
, a newIRanges
is returned with the modified start positions. Otherwise, the current object is directly modified and a reference to it is returned.
- set_width(width, in_place=False)[source]¶
- Parameters:
- Return type:
- Returns:
If
in_place = False
, a newIRanges
is returned with the modified widths. Otherwise, the current object is directly modified and a reference to it is returned.
- shift(shift, in_place=False)[source]¶
Shift ranges by specified amount.
- Parameters:
- Return type:
- Returns:
If
in_place = False
, a newIRanges
is returned with the shifted ranges. Otherwise, the current object is directly modified and a reference to it is returned.
- shift_and_clip_ranges(shift, width=None, circle_length=None)[source]¶
Shift and clip interval ranges.
- Parameters:
- Returns:
Array of shifted/clipped start positions
Array of shifted/clipped widths
Coverage length
Boolean indicating if ranges are in tiling configuration
- Return type:
Tuple of
- sort(decreasing=False, in_place=False)[source]¶
Sort the ranges.
- Parameters:
- Return type:
- Returns:
If
in_place = False
, a newIRanges
is returned with the sorted ranges. Otherwise, the current object is directly modified and a reference to it is returned.
- property start: ndarray¶
Get start positions.
- Returns:
NumPy array of 32-bit signed integers containing the start positions for all ranges.
- subset_by_overlaps(query, query_type='any', select='all', max_gap=-1, min_overlap=0, delete_index=True)[source]¶
Subset to overlapping ranges with
query
.- Parameters:
query (
IRanges
) – QueryIRanges
object.query_type (
Literal
['any'
,'start'
,'end'
,'within'
]) –Overlap query type, must be one of
”any”: Any overlap is good
”start”: Overlap at the beginning of the range
”end”: Must overlap at the end of the range
”within”: Fully contain the query interval
Defaults to “any”.
select (
Literal
['all'
,'first'
,'last'
,'arbitrary'
]) –Determine what hit to choose when there are multiple hits for a query range.
Must be one of “all”, “first”, “last”, “arbitrary”.
Defaults to “all”.
max_gap (
int
) – Maximum gap allowed in the overlap. Defaults to -1 (no gap allowed).min_overlap (
int
) – Minimum overlap with query. Defaults to 1.delete_index (
bool
) – Delete the cached ncls index. Internal use only.
- Return type:
- Returns:
A new
IRanges
object containing ranges that overlap with query.
- terminators(upstream=2000, downstream=200, in_place=False)[source]¶
Get terminator regions (upstream and downstream of TES).
- Parameters:
- Return type:
- Returns:
If
in_place = False
, a newIRanges
is returned with the terminator ranges. Otherwise, the current object is directly modified and a reference to it is returned.
- threebands(start=None, end=None, width=None)[source]¶
Split ranges into three parts: left, middle, and right.
- Parameters:
- Returns:
‘left’: IRanges for left bands ‘middle’: IRanges for middle bands ‘right’: IRanges for right bands
- Return type:
Dictionary with
iranges.lib_iranges module¶
Iranges cpp implementations
- iranges.lib_iranges.coverage(starts: numpy.ndarray[numpy.int32], widths: numpy.ndarray[numpy.int32], shift: numpy.ndarray[numpy.int32], width: object, weight: numpy.ndarray[numpy.float64], circle_len: object, method: str = 'auto') numpy.ndarray[numpy.float64] ¶
Compute weighted coverage of ranges
- iranges.lib_iranges.disjoint_bins(starts: numpy.ndarray[numpy.int32], widths: numpy.ndarray[numpy.int32]) numpy.ndarray[numpy.int32] ¶
Assign ranges to disjoint bins
- iranges.lib_iranges.gaps_ranges(starts: numpy.ndarray[numpy.int32], widths: numpy.ndarray[numpy.int32], restrict_start: object = None, restrict_end: object = None) tuple[numpy.ndarray[numpy.int32], numpy.ndarray[numpy.int32]] ¶
Find gaps between ranges
- iranges.lib_iranges.get_order(starts: numpy.ndarray[numpy.int32], widths: numpy.ndarray[numpy.int32]) list[int] ¶
Get the order of genomic ranges
- iranges.lib_iranges.reduce_ranges(starts: numpy.ndarray[numpy.int32], widths: numpy.ndarray[numpy.int32], drop_empty_ranges: bool = False, min_gapwidth: int = 0, with_revmap: bool = False, with_inframe_start: bool = False) dict ¶
Reduce ranges by merging overlapping or adjacent ranges
- iranges.lib_iranges.shift_and_clip_ranges(starts: numpy.ndarray[numpy.int32], widths: numpy.ndarray[numpy.int32], shift: numpy.ndarray[numpy.int32], width: object, circle_len: object) tuple[numpy.ndarray[numpy.int32], numpy.ndarray[numpy.int32], int, bool] ¶
Shift and clip ranges
iranges.sew_handler module¶
- class iranges.sew_handler.SEWWrangler(ref_widths, start=None, end=None, width=None, translate_negative=True, allow_nonnarrowing=False)[source]¶
Bases:
object
Handler to resolve start/end/width parameters.
- __init__(ref_widths, start=None, end=None, width=None, translate_negative=True, allow_nonnarrowing=False)[source]¶
Initialize SEW parameters.
iranges.utils module¶
- iranges.utils.calc_gap_and_overlap(start1, width1, start2, width2)[source]¶
Calculate gap, overlap and relative position between two intervals.
- iranges.utils.clip_ranges(starts, widths, min_val=None, max_val=None)[source]¶
Clip ranges to specified bounds.
- Parameters:
- Return type:
- Returns:
Tuple of clipped (starts, widths) ranges.
- iranges.utils.compute_up_down(starts, widths, upstream, downstream, site)[source]¶
Helper for promoters/terminators.
- iranges.utils.handle_negative_coords(coords, ref_len)[source]¶
Convert negative coordinates to positive using reference length.
- Parameters:
coords (
MaskedArray
) – Coordinate array (can have negative values).ref_len (
ndarray
) – Reference lengths for conversion.
- Return type:
- Returns:
Array with negative coordinates converted to positive.