biocutils package¶
Submodules¶
biocutils.BooleanList module¶
- class biocutils.BooleanList.BooleanList(data=None, names=None, _validate=True)[source]¶
Bases:
NamedList
List of booleans. This mimics a regular Python list except that anything added to it will be coerced into a boolean. None values are also acceptable and are treated as missing booleans. The list may also be named (see
NamedList
), which provides some dictionary-like functionality.- safe_append(value, in_place=False)[source]¶
Calls
safe_append()
after coercingvalue
to a boolean.- Return type:
- safe_extend(other, in_place=True)[source]¶
Calls
safe_extend()
after coercing elements ofother
to booleans.- Return type:
- safe_insert(index, value, in_place=False)[source]¶
Calls
safe_insert()
after coercingvalue
to a boolean.- Return type:
- set_slice(index, value, in_place=False)[source]¶
Calls
set_slice()
after coercingvalue
to booleans.- Return type:
- set_value(index, value, in_place=False)[source]¶
Calls
set_value()
after coercingvalue
to a boolean.- Return type:
biocutils.Factor module¶
- class biocutils.Factor.Factor(codes, levels, ordered=False, names=None, _validate=True)[source]¶
Bases:
object
Factor class, equivalent to R’s
factor
.This is a vector of integer codes, each of which is an index into a list of unique strings. The aim is to encode a list of strings as integers for easier numerical analysis.
- __eq__(other)[source]¶
- Parameters:
other (
Factor
) – AnotherFactor
.- Returns:
Whether the current object is equal to
other
, i.e., same codes, levels, names and ordered status.
- __getitem__(index)[source]¶
If
index
is a scalar, this is an alias forget_value()
.If
index
is a sequence, this is an alias forget_slice()
.
- __hash__ = None¶
- __iter__()[source]¶
- Return type:
- Returns:
An iterator over the factor. This will iterate over the codes and report the corresponding level (or None).
- __setitem__(index, value)[source]¶
If
index
is a scalar, this is an alias forset_value()
.If
index
is a sequence, this is an alias forset_slice()
.
- property codes: ndarray¶
Alias for
get_codes()
.
- drop_unused_levels(in_place=False)[source]¶
Drop unused levels.
- Parameters:
in_place (
bool
) – Whether to perform this modification in-place.- Return type:
- Returns:
If
in_place = False
, returns same type as caller (a newFactor
object) where all unused levels have been removed.If
in_place = True
, unused levels are removed from the current object; a reference to the current object is returned.
- static from_sequence(x, levels=None, sort_levels=True, ordered=False, names=None, **kwargs)[source]¶
Convert a sequence of hashable values into a factor.
- Parameters:
x (
Sequence
[str
]) – A sequence of strings. Any value may be None to indicate missingness.levels (
Optional
[Sequence
[str
]]) – Sequence of reference levels, against which the entries inx
are compared. If None, this defaults to all unique values ofx
.sort_levels (
bool
) – Whether to sort the automatically-determined levels. If False, the levels are kept in order of their appearance inx
. Not used iflevels
is explicitly supplied.ordered (
bool
) – Whether the levels should be assumed to be ordered. Note that this refers to their importance and has nothing to do with their sorting order or with the setting ofsort_levels
.names (
Optional
[Sequence
[str
]]) – List of names. This should have same length asx
. Alternatively None, if the factor has no names.kwargs – Further arguments to pass to
factorize()
.
- Return type:
- Returns:
A
Factor
object.
- get_codes()[source]¶
- Return type:
- Returns:
Array of integer codes, used as indices into the levels from
get_levels()
. Missing values are marked with -1.This should be treated as a read-only reference. To modify the codes, use
set_codes()
instead.
- get_levels()[source]¶
- Return type:
- Returns:
List of strings containing the factor levels.
This should be treated as a read-only reference. To modify the levels, use
replace_levels()
instead.
- get_names()[source]¶
- Return type:
- Returns:
Names for the factor elements.
This should be treated as a read-only reference. To modify the names, use
set_names()
instead.
- get_slice(index)[source]¶
- Parameters:
index (
Union
[slice
,range
,Sequence
,int
,str
,bool
,NormalizedSubscript
]) – Subset of elements to obtain, seenormalize_subscript()
for details. Strings are matched to names in the current object, using the first occurrence if duplicate names are present. Scalars are treated as length-1 sequences.- Return type:
- Returns:
A
Factor
is returned containing the specified subset.
- get_value(index)[source]¶
- Parameters:
index (
Union
[str
,int
]) – Integer index of the element to obtain. Alternatively, a string containing the name of the element, using the first occurrence if duplicate names are present.- Return type:
- Returns:
The factor level for the code at the specified position, or None if the entry is missing.
- property levels: StringList¶
Alias for
get_levels()
.
- property names: Names¶
Alias for
get_names()
.
- property ordered: bool¶
Alias for
get_ordered()
.
- remap_levels(levels, in_place=False)[source]¶
Remap codes to a replacement list of levels. Each entry of the remapped
Factor
will refer to the same string across the old and new levels, provided that string is present in both sets of levels. (To change the levels without altering the codes of theFactor
, usereplace_levels()
instead.)- Parameters:
levels (
Union
[str
,Sequence
[str
]]) –A sequence of replacement levels. These should be unique strings with no missing values.
Alternatively a single string containing an existing level in this object. The new levels are defined as a permutation of the existing levels where the provided string is now the first level. The order of all other levels is preserved.
in_place (
bool
) – Whether to perform this modification in-place.
- Return type:
- Returns:
If
in_place = False
, returns same type as caller (a newFactor
object) where the levels have been replaced. This will automatically update the codes so that they still refer to the same string in the newlevels
. If a code refers to a level that is not present in the newlevels
, it is set to a missing value.If
in_place = True
, the levels are replaced in the current object, and a reference to the current object is returned.
- replace_levels(levels, in_place=False)[source]¶
Replace the existing levels with a new list. The codes of the returned
Factor
are unchanged by this method and will index into the replacementlevels
, so each element of theFactor
may refer to a different string after the levels are replaced. (To change the levels while ensuring that each element of theFactor
refers to the same string, useremap_levels()
. instead.)- Parameters:
- Return type:
- Returns:
If
in_place = False
, returns same type as caller (a newFactor
object) where the levels have been replaced. Codes are unchanged and may refer to different strings.If
in_place = True
, the levels are replaced in the current object, and a reference to the current object is returned.
- set_codes(codes, in_place=False)[source]¶
- Parameters:
- Return type:
- Returns:
A modified
Factor
object with the new codes, either as a new object or as a reference to the current object.
- set_levels(levels, remap=True, in_place=False)[source]¶
Alias for
remap_levels()
ifremap = True
, otherwise an alias forreplace_levels()
. The first alias is deprecated andremap_levels()
should be used directly if that is the intent.- Return type:
- set_slice(index, value, in_place=False)[source]¶
Replace items in the
Factor
list. Theindex
elements in the current object are replaced with the corresponding values invalue
. This is performed by finding the level for each entry of the replacementvalue
, matching it to a level in the current object, and replacing the entry ofcodes
with the code of the matched level. If there is no matching level, a missing value is inserted.- Parameters:
index (
Union
[slice
,range
,Sequence
,int
,str
,bool
,NormalizedSubscript
]) – Subset of elements to replace, seenormalize_subscript()
for details. Strings are matched to names in the current object, using the first occurrence if duplicate names are present. Scalars are treated as length-1 sequences.value (
Factor
) – AFactor
of the same length containing the replacement values.in_place (
bool
) – Whether the replacement should be performed in place.
- Returns:
A
Factor
object with values atindex
replaced byvalue
. This is either a new object or a reference to the current object, depending onin_place
.
- set_value(index, value, in_place=False)[source]¶
- Parameters:
index (
Union
[str
,int
]) – Integer index of the element to replace. Alternatively, a string containing the name of the element, using the first occurrence if duplicate names are present.value (
Optional
[str
]) – Replacement value. This should be a string corresponding to a factor level, or None if missing.in_place (
bool
) – Whether to perform the modification in place.
- Return type:
- Returns:
A
Factor
object with the modified entry atindex
. This is either a new object or a reference to the current object.
- to_pandas()[source]¶
Coerce to
Categorical
object.- Returns:
A
Categorical
object.- Return type:
Categorical
biocutils.FloatList module¶
- class biocutils.FloatList.FloatList(data=None, names=None, _validate=True)[source]¶
Bases:
NamedList
List of floats. This mimics a regular Python list except that anything added to it will be coerced into a float. None values are also acceptable and are treated as missing floats. The list may also be named (see
NamedList
), which provides some dictionary-like functionality.- __annotations__ = {}¶
- safe_append(value, in_place=False)[source]¶
Calls
safe_append()
after coercingvalue
to a float.- Return type:
- safe_extend(other, in_place=True)[source]¶
Calls
safe_extend()
after coercing elements ofother
to floats.- Return type:
- safe_insert(index, value, in_place=False)[source]¶
Calls
safe_insert()
after coercingvalue
to a float.- Return type:
- set_slice(index, value, in_place=False)[source]¶
Calls
set_slice()
after coercingvalue
to floats.- Return type:
- set_value(index, value, in_place=False)[source]¶
Calls
set_value()
after coercingvalue
to a float.- Return type:
biocutils.IntegerList module¶
- class biocutils.IntegerList.IntegerList(data=None, names=None, _validate=True)[source]¶
Bases:
NamedList
List of integers. This mimics a regular Python list except that anything added to it will be coerced into a integer. None values are also acceptable and are treated as missing integers. The list may also be named (see
NamedList
), which provides some dictionary-like functionality.- __annotations__ = {}¶
- safe_append(value, in_place=False)[source]¶
Calls
safe_append()
after coercingvalue
to a integer.- Return type:
- safe_extend(other, in_place=True)[source]¶
Calls
safe_extend()
after coercing elements ofother
to integers.- Return type:
- safe_insert(index, value, in_place=False)[source]¶
Calls
safe_insert()
after coercingvalue
to a integer.- Return type:
- set_slice(index, value, in_place=False)[source]¶
Calls
set_slice()
after coercingvalue
to integers.- Return type:
- set_value(index, value, in_place=False)[source]¶
Calls
set_value()
after coercingvalue
to a integer.- Return type:
biocutils.NamedList module¶
- class biocutils.NamedList.NamedList(data=None, names=None, _validate=True)[source]¶
Bases:
object
A list-like object that could have names for each element, equivalent to R’s named list. This combines list and dictionary functionality, e.g., it can be indexed by position or slices (list) but also by name (dictionary).
- __add__(other)[source]¶
Alias for
safe_extend()
.- Return type:
- __annotations__ = {}¶
- __deepcopy__(memo=None, _nil=[])[source]¶
- Parameters:
memo – See
deepcopy()
for details._nil – See
deepcopy()
for details.
- Return type:
- Returns:
A deep copy of a
NamedList
with the same contents.
- __getitem__(index)[source]¶
If
index
is a scalar, this is an alias forget_value()
.If
index
is a sequence, this is an alias forget_slice()
.
- __hash__ = None¶
- __iadd__(other)[source]¶
Alias for
extend()
, returning a reference to the current object after the in-place modification.
- __setitem__(index, value)[source]¶
If
index
is a scalar, this is an alias forset_value()
within_place = True
.If
index
is a sequence, this is an alias forset_slice()
within_place = True
.
- append(value)[source]¶
Alias for
safe_append()
within_place = True
.
- as_list()[source]¶
- Return type:
- Returns:
The underlying list of elements.
The returned object should be treated as a read-only reference.
- extend(other)[source]¶
Alias for
safe_extend()
within_place = True
.
- get_names()[source]¶
- Return type:
- Returns:
Names for the list elements.
The returned object should be treated as a read-only reference. To modify the names, use
set_names()
instead.
- get_slice(index)[source]¶
- Parameters:
index (
Union
[slice
,range
,Sequence
,int
,str
,bool
,NormalizedSubscript
]) – Subset of elements to obtain, seenormalize_subscript()
for details. Strings are matched to names in the current object, using the first occurrence if duplicate names are present. Scalars are treated as length-1 sequences.- Return type:
- Returns:
A
NamedList
is returned containing the specified subset.
- insert(index, value)[source]¶
Alias for
safe_insert()
within_place = True
.
- property names: Names¶
Alias for
get_names()
.
- safe_append(value, in_place=False)[source]¶
- Parameters:
- Return type:
- Returns:
A
NamedList
wherevalue
is added to the end. Ifin_place = False
, this is a new object, otherwise it is a reference to the current object. If names are present in the current object, the newly added element has its name set to an empty string.
- safe_extend(other, in_place=False)[source]¶
- Parameters:
- Return type:
- Returns:
A
NamedList
where items inother
are added to the end. Ifin_place = False
, this is a new object, otherwise a reference to the current object is returned.
- safe_insert(index, value, in_place=False)[source]¶
- Parameters:
index (
Union
[int
,str
]) – An integer index containing a position to insert at. Alternatively, the name of the value to insert at (the first occurrence of each name is used).value (
Any
) – A value to be inserted into the current object.in_place (
bool
) – Whether to modify the current object in place.
- Return type:
- Returns:
A
NamedList
wherevalue
is inserted atindex
. This is a new object ifin_place = False
, otherwise it is a reference to the current object. If names are present in the current object, the newly inserted element’s name is set to an empty string.
- set_names(names, in_place=False)[source]¶
- Parameters:
- Return type:
- Returns:
A modified
NamedList
with the new names. Ifin_place = False
, this is a newNamedList
, otherwise it is a reference to the currentNamedList
.
- set_slice(index, value, in_place=False)[source]¶
- Parameters:
index (
Union
[slice
,range
,Sequence
,int
,str
,bool
,NormalizedSubscript
]) – Subset of elements to replace, seenormalize_subscript()
for details. Strings are matched to names in the current object, using the first occurrence if duplicate names are present. Scalars are treated as length-1 sequences.value (
Sequence
) –If
index
is a sequence, a sequence of the same length containing values to be set at the positions inindex
.If
index
is a scalar, any object to be used as the replacement value for the position atindex
.in_place (
bool
) – Whether to perform the replacement in place.
- Return type:
- Returns:
A
NamedList
where the entries atindex
are replaced with the contents ofvalue
. Ifin_place = False
, this is a new object, otherwise it is a reference to the current object.Unlike
set_value()
, this will not add new elements ifindex
contains names that do not already exist in the object; a missing name error is raised instead.
- set_value(index, value, in_place=False)[source]¶
- Parameters:
index (
Union
[str
,int
]) – Integer index of the element to obtain. Alternatively, a string containing the name of the element; we consider the first occurrence of the name if duplicates are present.value (
Any
) – Replacement value of the list element.in_place (
bool
) – Whether to perform the replacement in place.
- Return type:
- Returns:
A
NamedList
is returned after the value at the specified position (or with the specified name) is replaced. Ifin_place = False
, this is a new object, otherwise it is a reference to the current object.If
index
is a name that does not already exist in the current object,value
is added to the end of the list, and theindex
is added as a new name.
biocutils.Names module¶
- class biocutils.Names.Names(names=None, _validate=True)[source]¶
Bases:
object
List of strings containing names. Typically used to decorate sequences, such that callers can get or set elements by name instead of position.
- __add__(other)[source]¶
- Parameters:
other (
list
) – List of names.- Returns:
A new
Names
containing the combined contents of the current object andother
.
- __deepcopy__(memo=None, _nil=[])[source]¶
- Parameters:
memo – See
deepcopy()
for details._nil – See
deepcopy()
for details.
- Return type:
- Returns:
A deep copy of this
Names
object with the same contents.
- __getitem__(index)[source]¶
If
index
is a scalar, this is an alias forget_value
.If
index
is a sequence, this is an alias forget_slice
.
- __hash__ = None¶
- __iadd__(other)[source]¶
- Parameters:
other (
list
) – List of names.- Returns:
The current object is modified by adding
other
to its names.
- __iter__()[source]¶
- Return type:
list_iterator
- Returns:
An iterator on the underlying list of names.
- __setitem__(index, value)[source]¶
If
index
is a scalar, this is an alias forset_value
within_place = True
.If
index
is a sequence, this is an alias forset_slice
within_place = True
.
- append(value)[source]¶
Alias for
safe_append
within_place = True
.
- extend(value)[source]¶
Alias for
safe_extend
within_place = True
.
- get_slice(index)[source]¶
- Parameters:
index (
Union
[slice
,range
,Sequence
,int
,bool
,NormalizedSubscript
]) – Positions of interest, see the allowed indices innormalize_subscript()
for more details. Scalars are treated as length-1 sequences.- Return type:
- Returns:
A
Names
object containing the names at the specified positions.
- insert(index, value)[source]¶
Alias for
safe_insert
within_place = True
.
- set_slice(index, value, in_place=False)[source]¶
- Parameters:
- Return type:
- Returns:
A modified
Names
object with the replacement name, either as a new object or as a reference to the current object.
biocutils.StringList module¶
- class biocutils.StringList.StringList(data=None, names=None, _validate=True)[source]¶
Bases:
NamedList
List of strings. This mimics a regular Python list except that anything added to it will be coerced into a string. None values are also acceptable and are treated as missing strings. The list may also be named (see
NamedList
), which provides some dictionary-like functionality.- __annotations__ = {}¶
- safe_append(value, in_place=False)[source]¶
Calls
safe_append()
after coercingvalue
to a string.- Return type:
- safe_extend(other, in_place=True)[source]¶
Calls
safe_extend()
after coercing elements ofother
to strings.- Return type:
- safe_insert(index, value, in_place=False)[source]¶
Calls
safe_insert()
after coercingvalue
to a string.- Return type:
- set_slice(index, value, in_place=False)[source]¶
Calls
set_slice()
after coercingvalue
to strings.- Return type:
- set_value(index, value, in_place=False)[source]¶
Calls
set_value()
after coercingvalue
to a string.- Return type:
biocutils.assign module¶
- biocutils.assign.assign(x, indices, replacement)[source]¶
Generic assign that checks if the objects are n-dimensional for n > 1 (i.e. has a
shape
property of length greater than 1); if so, it callsassign_rows()
to assign them along the first dimension, otherwise it assumes that they are vector-like and callsassign_sequence()
instead.
biocutils.assign_rows module¶
- biocutils.assign_rows.assign_rows(x, indices, replacement)[source]¶
Assign
replacement
values to a copy ofx
at the rows specified byindices
. This defaults to creating a deep copy ofx
and then assigningreplacement
to the first dimension of the copy.- Parameters:
x (
Any
) – Any high-dimensional object.indices (
Sequence
[int
]) – Sequence of non-negative integers specifying rows ofx
.replacement (
Any
) – Replacement values to be assigned tox
. This should have the same number of rows as the length ofindices
. Typicallyreplacement
will have the same dimensionality asx
.
- Return type:
- Returns:
A copy of
x
with the rows replaced atindices
.
biocutils.assign_sequence module¶
- biocutils.assign_sequence.assign_sequence(x, indices, replacement)[source]¶
Assign
replacement
values to a copy ofx
at the specifiedindices
. This defaults to creating a deep copy ofx
and then iterating throughindices
to assign the values ofreplacement
.- Parameters:
- Return type:
- Returns:
A copy of
x
with the replacement values.
biocutils.combine module¶
- biocutils.combine.combine(*x)[source]¶
Generic combine that checks if the objects are n-dimensional for n > 1 (i.e. has a
shape
property of length greater than 1); if so, it callscombine_rows()
to combine them by the first dimension, otherwise it assumes that they are vector-like and callscombine_sequences()
instead.- Parameters:
x (
Any
) – Objects to combine.- Returns:
A combined object, typically the same type as the first element in
x
.
biocutils.combine_columns module¶
- biocutils.combine_columns.combine_columns(*x)[source]¶
Combine n-dimensional objects along the second dimension.
If all elements are
ndarray
, we combine them using numpy’sconcatenate()
.If all elements are either
spmatrix
orsparray
, these objects are combined using scipy’shstack
.If all elements are
DataFrame
objects, they are combined usingconcat()
along the second axis.- Parameters:
x (
Any
) – n-dimensional objects to combine. All elements of x are expected to be the same class.- Returns:
Combined object, typically the same type as the first entry of
x
biocutils.combine_rows module¶
- biocutils.combine_rows.combine_rows(*x)[source]¶
Combine n-dimensional objects along their first dimension.
If all elements are
ndarray
, we combine them using numpy’sconcatenate()
.If all elements are either
spmatrix
orsparray
, these objects are combined using scipy’svstack
.If all elements are
DataFrame
objects, they are combined usingconcat()
along the first axis.- Parameters:
x (
Any
) – One or more n-dimensional objects to combine. All elements of x are expected to be the same class.- Returns:
Combined object, typically the same type as the first entry of
x
.
biocutils.combine_sequences module¶
- biocutils.combine_sequences.combine_sequences(*x)[source]¶
Combine vector-like objects (1-dimensional arrays).
If all elements are
ndarray
, we combine them using numpy’sconcatenate()
.If all elements are
Series
objects, they are combined usingconcat()
.For all other scenarios, all elements are coerced to a
list
and combined.- Parameters:
x (
Any
) – Vector-like objects to combine. All elements ofx
are expected to be the same class or atleast compatible with each other.- Returns:
A combined object, ideally of the same type as the first element in
x
.
biocutils.convert_to_dense module¶
biocutils.extract_column_names module¶
biocutils.extract_row_names module¶
biocutils.factorize module¶
- biocutils.factorize.factorize(x, levels=None, sort_levels=False, dtype=None, fail_missing=None)[source]¶
Convert a sequence of hashable values into a factor.
- Parameters:
x (
Sequence
) – A sequence of hashable values. Any value may be None to indicate missingness.levels (
Optional
[Sequence
]) – Sequence of reference levels, against which the entries inx
are compared. If None, this defaults to all unique values ofx
.sort_levels (
bool
) – Whether to sort the automatically-determined levels. If False, the levels are kept in order of their appearance inx
. Not used iflevels
is explicitly supplied.dtype (
Optional
[dtype
]) – NumPy type of the array of indices, seematch()
for details.fail_missing (
Optional
[bool
]) – Whether to raise an error upon encountering missing levels inx
, seematch()
for details.
- Return type:
- Returns:
Tuple where the first element is a list of unique levels and the second element in a NumPy array containing integer codes, i.e., indices into the first list. Indexing the first list by the second array will recover
x
, with the exception of any None or masked values inx
that will instead be represented by -1 in the second array.
biocutils.get_height module¶
biocutils.intersect module¶
- biocutils.intersect.intersect(*x, duplicate_method='first')[source]¶
Identify the intersection of values in multiple sequences, while preserving the order of values in the first sequence.
- Parameters:
x (
Sequence
) – Zero, one or more sequences of interest containing hashable values. We ignore missing values as defined byis_missing_scalar()
.duplicate_method (
Literal
['first'
,'last'
]) – Whether to keep the first or last occurrence of duplicated values when preserving order in the first sequence.
- Return type:
- Returns:
Intersection of values across all
x
.
biocutils.is_high_dimensional module¶
biocutils.is_list_of_type module¶
- biocutils.is_list_of_type.is_list_of_type(x, target_type, ignore_none=False)[source]¶
Checks if
x
is a list, and whether all elements of the list are of the same type.- Parameters:
- Return type:
- Returns:
True if
x
is a list or tuple and all elements are of the target type (or None, ifignore_none = True
). Otherwise, False.
biocutils.is_missing_scalar module¶
biocutils.map_to_index module¶
- biocutils.map_to_index.map_to_index(x, duplicate_method='first')[source]¶
Create a dictionary to map values of a sequence to positional indices.
- Parameters:
x (
Sequence
) – Sequence of hashable values. We ignore missing values defined byis_missing_scalar()
.duplicate_method (
Literal
['first'
,'last'
]) – Whether to consider the first or last occurrence of a duplicated value inx
.
- Returns:
Dictionary that maps values of
x
to their position insidex
.- Return type:
biocutils.match module¶
- biocutils.match.match(x, targets, duplicate_method='first', dtype=None, fail_missing=None)[source]¶
Find a matching value of each element of
x
intarget
.- Parameters:
x (
Sequence
) – Sequence of values to match.targets (
Union
[dict
,Sequence
]) – Sequence of targets to be matched against. Alternatively, a dictionary generated by passing a sequence of targets tomap_to_index()
.duplicate_method (
Literal
['first'
,'last'
]) – How to handle duplicate entries intargets
. Matches can be reported to the first or last occurrence of duplicates.dtype (
Optional
[ndarray
]) – NumPy type of the output array. This should be an integer type; if missing values are expected, the type should be a signed integer. If None, a suitable signed type is automatically determined.fail_missing (
Optional
[bool
]) – Whether to raise an error ifx
cannot be found intargets
. IfNone
, this defaults toTrue
ifdtype
is an unsigned type, otherwise it defaults toFalse
.
- Return type:
- Returns:
Array of length equal to
x
, containing the integer position of each entry ofx
insidetarget
; or -1, if the entry ofx
is None or cannot be found intarget
.
biocutils.normalize_subscript module¶
- class biocutils.normalize_subscript.NormalizedSubscript(subscript)[source]¶
Bases:
object
Subscript normalized by
normalize_subscript()
. This is used to indicate that no further normalization is required, such thatnormalize_subscript()
is just a no-op.
- biocutils.normalize_subscript.normalize_subscript(sub, length, names=None, non_negative_only=True)[source]¶
Normalize a subscript for
__getitem__
or friends into a sequence of integer indices, for consistent downstream use.- Parameters:
sub (
Union
[slice
,range
,Sequence
,int
,str
,bool
,NormalizedSubscript
]) –The subscript. This can be any of the following:
A slice.
A range containing indices to elements. Negative values are allowed. An error is raised if the indices are out of range.
A single integer specifying the index of an element. A negative value is allowed. An error is raised if the index is out of range.
A single string that can be found in
names
, which is converted to the index of the first occurrence of that string innames
. An error is raised if the string cannot be found.A single boolean, which is converted into a list containing the first element if true, and an empty list if false.
A sequence of strings, integers and/or booleans. Strings are converted to indices based on first occurrence in
names
, as described above. Integers should be indices to an element. Each truthy boolean is converted to an index equal to its position insub
, and each Falsey boolean is ignored.A
NormalizedSubscript
, in which case thesubscript
property is directly returned.
length (
int
) – Length of the object.names (
Optional
[Sequence
[str
]]) – List of names for each entry in the object. If not None, this should have length equal tolength
. Some optimizations are possible if this is aNames
object.non_negative_only (
bool
) – Whether negative indices must be converted into non-negative equivalents. Setting this to False may improve efficiency.
- Return type:
- Returns:
A tuple containing (i) a sequence of integer indices in
[0, length)
specifying the subscript elements, and (ii) a boolean indicating whethersub
was a scalar.
biocutils.package_utils module¶
biocutils.print_truncated module¶
- biocutils.print_truncated.print_truncated(x, truncated_to=3, full_threshold=10)[source]¶
Pretty-print an object, replacing the middle elements of lists/dictionaries with an ellipsis if there are too many. This provides a useful preview of an object without spewing out all of its contents on the screen.
- Parameters:
- Return type:
- Returns:
String containing the pretty-printed contents.
- biocutils.print_truncated.print_truncated_dict(x, truncated_to=3, full_threshold=10, transform=None, sep=', ', include_brackets=True)[source]¶
Pretty-print a dictionary, replacing the middle elements with an ellipsis if there are too many. This provides a useful preview of an object without spewing out all of its contents on the screen.
- Parameters:
x (
Dict
) – Dictionary to be printed.truncated_to (
int
) – Number of elements to truncate to, at the start and end of the sequence. This should be less than half offull_threshold
.full_threshold (
int
) – Threshold on the number of elements, below which the list is shown in its entirety.transform (
Optional
[Callable
]) – Optional transformation to apply to the values ofx
after truncation but before printing. Defaults toprint_truncated()
if not supplied.sep (
str
) – Separator between elements in the printed list.include_brackets (
bool
) – Whether to include the start/end brackets.
- Return type:
- Returns:
String containing the pretty-printed truncated dict.
- biocutils.print_truncated.print_truncated_list(x, truncated_to=3, full_threshold=10, transform=None, sep=', ', include_brackets=True)[source]¶
Pretty-print a list, replacing the middle elements with an ellipsis if there are too many. This provides a useful preview of an object without spewing out all of its contents on the screen.
- Parameters:
x (
List
) – List to be printed.truncated_to (
int
) – Number of elements to truncate to, at the start and end of the list. This should be less than half offull_threshold
.full_threshold (
int
) – Threshold on the number of elements, below which the list is shown in its entirety.transform (
Optional
[Callable
]) – Optional transformation to apply to the elements ofx
after truncation but before printing. Defaults toprint_truncated()
if not supplied.sep (
str
) – Separator between elements in the printed list.include_brackets (
bool
) – Whether to include the start/end brackets.
- Return type:
- Returns:
String containing the pretty-printed truncated list.
biocutils.print_wrapped_table module¶
- biocutils.print_wrapped_table.create_floating_names(names, indices)[source]¶
Create the floating names to use in
print_wrapped_table()
. If no names are present, positional indices are used instead.
- biocutils.print_wrapped_table.print_type(x)[source]¶
Print the type of an object, with some special behavior for certain classes (e.g., to add the data type of NumPy arrays). This is intended for display at the top of the columns of
print_wrapped_table()
.- Parameters:
x – Some object.
- Return type:
- Returns:
String containing the class of the object.
- biocutils.print_wrapped_table.print_wrapped_table(columns, floating_names=None, sep=' ', window=None)[source]¶
Pretty-print a table with aligned and wrapped columns. All column contents are padded so that they are right- justified. Wrapping is performed whenever a new column would exceed the window width, in which case the entire column (and all subsequent columns) are printed below the previous columns.
- Parameters:
columns (
List
[Sequence
[str
]]) –List of list of strings, where each inner list is the same length and contains the visible contents of a column. Strings are typically generated by calling repr() on data column values.
Callers are responsible for inserting ellipses, adding column type information (e.g., with
print_type()
) or truncating long strings (e.g., withtruncate_strings()
).floating_names (
Optional
[Sequence
[str
]]) –List of strings to be added to the left of the table. This is printed repeatedly for each set of wrapped columns.
See also
create_floating_names()
.sep (
str
) – Separator between columns.window (
Optional
[int
]) – Size of the terminal window, in characters. We attempt to determine this automatically, otherwise it is set to 150.
- Return type:
- Returns:
String containing the pretty-printed table.
- biocutils.print_wrapped_table.truncate_strings(values, width=40)[source]¶
Truncate long strings for printing in
print_wrapped_table()
.
biocutils.relaxed_combine_columns module¶
- biocutils.relaxed_combine_columns.relaxed_combine_columns(*x)[source]¶
Combine n-dimensional objects along the second dimension.
- Parameters:
x (
Any
) – n-dimensional objects to combine. All elements of x are expected to be the same class.- Returns:
Combined object, typically the same type as the first entry of
x
biocutils.relaxed_combine_rows module¶
- biocutils.relaxed_combine_rows.relaxed_combine_rows(*x)[source]¶
Combine n-dimensional objects along their first dimension.
- Parameters:
x (
Any
) – One or more n-dimensional objects to combine. All elements of x are expected to be the same class.- Returns:
Combined object, typically the same type as the first entry of
x
.
biocutils.reverse_index module¶
biocutils.show_as_cell module¶
- biocutils.show_as_cell.show_as_cell(x, indices)[source]¶
Show the contents of
x
as a cell of a table, typically for use in the__str__
method of a class that containsx
.- Parameters:
- Return type:
- Returns:
List of strings of length equal to
indices
, containing a string summary of each of the specified elements ofx
.
biocutils.subset module¶
- biocutils.subset.subset(x, indices)[source]¶
Generic subset that checks if the objects are n-dimensional for n > 1 (i.e. has a
shape
property of length greater than 1); if so, it callssubset_rows()
to subset them along the first dimension, otherwise it assumes that they are vector-like and callssubset_sequence()
instead.- Parameters:
x (
Any
) – Object to be subsetted.- Returns:
The subsetted object, typically the same type as
x
.
biocutils.subset_rows module¶
biocutils.subset_sequence module¶
biocutils.union module¶
- biocutils.union.union(*x, duplicate_method='first')[source]¶
Identify the union of values in multiple sequences, while preserving the order of the first (or last) occurence of each value.
- Parameters:
x (
Sequence
) – Zero, one or more sequences of interest containing hashable values. We ignore missing values as defined byis_missing_scalar()
.duplicate_method (
Literal
['first'
,'last'
]) – Whether to take the first or last occurrence of each value in the ordering of the output. If first, the first occurrence in the earliest sequence ofx
is reported; if last, the last occurrence in the latest sequence ofx
is reported.
- Return type:
- Returns:
Union of values across all
x
.