biocutils package

Submodules

biocutils.BooleanList module

class biocutils.BooleanList.BooleanList(data=None, names=None, _validate=True)[source]

Bases: NamedList

List of booleans. This mimics a regular Python list except that anything added to it will be coerced into a boolean. None values are also acceptable and are treated as missing booleans. The list may also be named (see NamedList), which provides some dictionary-like functionality.

safe_append(value, in_place=False)[source]

Calls safe_append() after coercing value to a boolean.

Return type:

BooleanList

safe_extend(other, in_place=True)[source]

Calls safe_extend() after coercing elements of other to booleans.

Return type:

BooleanList

safe_insert(index, value, in_place=False)[source]

Calls safe_insert() after coercing value to a boolean.

Return type:

BooleanList

set_slice(index, value, in_place=False)[source]

Calls set_slice() after coercing value to booleans.

Return type:

BooleanList

set_value(index, value, in_place=False)[source]

Calls set_value() after coercing value to a boolean.

Return type:

BooleanList

biocutils.Factor module

class biocutils.Factor.Factor(codes, levels, ordered=False, names=None, _validate=True)[source]

Bases: object

Factor class, equivalent to R’s factor.

This is a vector of integer codes, each of which is an index into a list of unique strings. The aim is to encode a list of strings as integers for easier numerical analysis.

__copy__()[source]
Return type:

Factor

Returns:

A shallow copy of the Factor object.

__deepcopy__(memo)[source]
Return type:

Factor

Returns:

A deep copy of the Factor object.

__eq__(other)[source]
Parameters:

other (Factor) – Another Factor.

Returns:

Whether the current object is equal to other, i.e., same codes, levels, names and ordered status.

__getitem__(index)[source]

If index is a scalar, this is an alias for get_value().

If index is a sequence, this is an alias for get_slice().

Return type:

Union[str, Factor]

__hash__ = None
__iter__()[source]
Return type:

FactorIterator

Returns:

An iterator over the factor. This will iterate over the codes and report the corresponding level (or None).

__len__()[source]
Return type:

int

Returns:

Length of the factor in terms of the number of codes.

__repr__()[source]
Return type:

str

Returns:

A stringified representation of this object.

__setitem__(index, value)[source]

If index is a scalar, this is an alias for set_value().

If index is a sequence, this is an alias for set_slice().

property codes: ndarray

Alias for get_codes().

drop_unused_levels(in_place=False)[source]

Drop unused levels.

Parameters:

in_place (bool) – Whether to perform this modification in-place.

Return type:

Factor

Returns:

If in_place = False, returns same type as caller (a new Factor object) where all unused levels have been removed.

If in_place = True, unused levels are removed from the current object; a reference to the current object is returned.

static from_sequence(x, levels=None, sort_levels=True, ordered=False, names=None, **kwargs)[source]

Convert a sequence of hashable values into a factor.

Parameters:
  • x (Sequence[str]) – A sequence of strings. Any value may be None to indicate missingness.

  • levels (Optional[Sequence[str]]) – Sequence of reference levels, against which the entries in x are compared. If None, this defaults to all unique values of x.

  • sort_levels (bool) – Whether to sort the automatically-determined levels. If False, the levels are kept in order of their appearance in x. Not used if levels is explicitly supplied.

  • ordered (bool) – Whether the levels should be assumed to be ordered. Note that this refers to their importance and has nothing to do with their sorting order or with the setting of sort_levels.

  • names (Optional[Sequence[str]]) – List of names. This should have same length as x. Alternatively None, if the factor has no names.

  • kwargs – Further arguments to pass to factorize().

Return type:

Factor

Returns:

A Factor object.

get_codes()[source]
Return type:

ndarray

Returns:

Array of integer codes, used as indices into the levels from get_levels(). Missing values are marked with -1.

This should be treated as a read-only reference. To modify the codes, use set_codes() instead.

get_levels()[source]
Return type:

StringList

Returns:

List of strings containing the factor levels.

This should be treated as a read-only reference. To modify the levels, use replace_levels() instead.

get_names()[source]
Return type:

Names

Returns:

Names for the factor elements.

This should be treated as a read-only reference. To modify the names, use set_names() instead.

get_ordered()[source]
Return type:

bool

Returns:

True if the levels are ordered, otherwise False.

get_slice(index)[source]
Parameters:

index (Union[slice, range, Sequence, int, str, bool, NormalizedSubscript]) – Subset of elements to obtain, see normalize_subscript() for details. Strings are matched to names in the current object, using the first occurrence if duplicate names are present. Scalars are treated as length-1 sequences.

Return type:

Factor

Returns:

A Factor is returned containing the specified subset.

get_value(index)[source]
Parameters:

index (Union[str, int]) – Integer index of the element to obtain. Alternatively, a string containing the name of the element, using the first occurrence if duplicate names are present.

Return type:

Optional[str]

Returns:

The factor level for the code at the specified position, or None if the entry is missing.

property levels: StringList

Alias for get_levels().

property names: Names

Alias for get_names().

property ordered: bool

Alias for get_ordered().

remap_levels(levels, in_place=False)[source]

Remap codes to a replacement list of levels. Each entry of the remapped Factor will refer to the same string across the old and new levels, provided that string is present in both sets of levels. (To change the levels without altering the codes of the Factor, use replace_levels() instead.)

Parameters:
  • levels (Union[str, Sequence[str]]) –

    A sequence of replacement levels. These should be unique strings with no missing values.

    Alternatively a single string containing an existing level in this object. The new levels are defined as a permutation of the existing levels where the provided string is now the first level. The order of all other levels is preserved.

  • in_place (bool) – Whether to perform this modification in-place.

Return type:

Factor

Returns:

If in_place = False, returns same type as caller (a new Factor object) where the levels have been replaced. This will automatically update the codes so that they still refer to the same string in the new levels. If a code refers to a level that is not present in the new levels, it is set to a missing value.

If in_place = True, the levels are replaced in the current object, and a reference to the current object is returned.

replace_levels(levels, in_place=False)[source]

Replace the existing levels with a new list. The codes of the returned Factor are unchanged by this method and will index into the replacement levels, so each element of the Factor may refer to a different string after the levels are replaced. (To change the levels while ensuring that each element of the Factor refers to the same string, use remap_levels(). instead.)

Parameters:
  • levels (Sequence[str]) – A sequence of replacement levels. These should be unique strings with no missing values. The length of this sequence should be no less than the current number of levels.

  • in_place (bool) – Whether to perform this modification in-place.

Return type:

Factor

Returns:

If in_place = False, returns same type as caller (a new Factor object) where the levels have been replaced. Codes are unchanged and may refer to different strings.

If in_place = True, the levels are replaced in the current object, and a reference to the current object is returned.

set_codes(codes, in_place=False)[source]
Parameters:
  • codes (Sequence[int]) – Integer codes referencing the factor levels. This should have the same length as the current object.

  • in_place (bool) – Whether to modify this object in-place.

Return type:

Factor

Returns:

A modified Factor object with the new codes, either as a new object or as a reference to the current object.

set_levels(levels, remap=True, in_place=False)[source]

Alias for remap_levels() if remap = True, otherwise an alias for replace_levels(). The first alias is deprecated and remap_levels() should be used directly if that is the intent.

Return type:

Factor

set_names(names, in_place=False)[source]
Parameters:
  • names (Optional[Names]) – List of names, of the same length as this list.

  • in_place (bool) – Whether to perform this modification in-place.

Return type:

NamedList

Returns:

A modified Factor with the new names, either as a new object or as a reference to the current object.

set_ordered(ordered, in_place=False)[source]
Parameters:
  • ordered (bool) – Whether to treat the levels as being ordered.

  • in_place (bool) – Whether to modify this object in-place.

Return type:

Factor

Returns:

A modified Factor object with the new ordered status, either as a new object or as a reference to the current object.

set_slice(index, value, in_place=False)[source]

Replace items in the Factor list. The index elements in the current object are replaced with the corresponding values in value. This is performed by finding the level for each entry of the replacement value, matching it to a level in the current object, and replacing the entry of codes with the code of the matched level. If there is no matching level, a missing value is inserted.

Parameters:
  • index (Union[slice, range, Sequence, int, str, bool, NormalizedSubscript]) – Subset of elements to replace, see normalize_subscript() for details. Strings are matched to names in the current object, using the first occurrence if duplicate names are present. Scalars are treated as length-1 sequences.

  • value (Factor) – A Factor of the same length containing the replacement values.

  • in_place (bool) – Whether the replacement should be performed in place.

Returns:

A Factor object with values at index replaced by value. This is either a new object or a reference to the current object, depending on in_place.

set_value(index, value, in_place=False)[source]
Parameters:
  • index (Union[str, int]) – Integer index of the element to replace. Alternatively, a string containing the name of the element, using the first occurrence if duplicate names are present.

  • value (Optional[str]) – Replacement value. This should be a string corresponding to a factor level, or None if missing.

  • in_place (bool) – Whether to perform the modification in place.

Return type:

Factor

Returns:

A Factor object with the modified entry at index. This is either a new object or a reference to the current object.

to_pandas()[source]

Coerce to Categorical object.

Returns:

A Categorical object.

Return type:

Categorical

class biocutils.Factor.FactorIterator(parent)[source]

Bases: object

Iterator for a Factor object.

__iter__()[source]
Return type:

FactorIterator

Returns:

The iterator.

__next__()[source]
Return type:

Optional[str]

Returns:

Level corresponding to the code at the current position, or None for missing codes.

biocutils.FloatList module

class biocutils.FloatList.FloatList(data=None, names=None, _validate=True)[source]

Bases: NamedList

List of floats. This mimics a regular Python list except that anything added to it will be coerced into a float. None values are also acceptable and are treated as missing floats. The list may also be named (see NamedList), which provides some dictionary-like functionality.

__annotations__ = {}
safe_append(value, in_place=False)[source]

Calls safe_append() after coercing value to a float.

Return type:

FloatList

safe_extend(other, in_place=True)[source]

Calls safe_extend() after coercing elements of other to floats.

Return type:

FloatList

safe_insert(index, value, in_place=False)[source]

Calls safe_insert() after coercing value to a float.

Return type:

FloatList

set_slice(index, value, in_place=False)[source]

Calls set_slice() after coercing value to floats.

Return type:

FloatList

set_value(index, value, in_place=False)[source]

Calls set_value() after coercing value to a float.

Return type:

FloatList

biocutils.IntegerList module

class biocutils.IntegerList.IntegerList(data=None, names=None, _validate=True)[source]

Bases: NamedList

List of integers. This mimics a regular Python list except that anything added to it will be coerced into a integer. None values are also acceptable and are treated as missing integers. The list may also be named (see NamedList), which provides some dictionary-like functionality.

__annotations__ = {}
safe_append(value, in_place=False)[source]

Calls safe_append() after coercing value to a integer.

Return type:

IntegerList

safe_extend(other, in_place=True)[source]

Calls safe_extend() after coercing elements of other to integers.

Return type:

IntegerList

safe_insert(index, value, in_place=False)[source]

Calls safe_insert() after coercing value to a integer.

Return type:

IntegerList

set_slice(index, value, in_place=False)[source]

Calls set_slice() after coercing value to integers.

Return type:

IntegerList

set_value(index, value, in_place=False)[source]

Calls set_value() after coercing value to a integer.

Return type:

IntegerList

biocutils.NamedList module

class biocutils.NamedList.NamedList(data=None, names=None, _validate=True)[source]

Bases: object

A list-like object that could have names for each element, equivalent to R’s named list. This combines list and dictionary functionality, e.g., it can be indexed by position or slices (list) but also by name (dictionary).

__add__(other)[source]

Alias for safe_extend().

Return type:

NamedList

__annotations__ = {}
__copy__()[source]

Alias for copy().

Return type:

NamedList

__deepcopy__(memo=None, _nil=[])[source]
Parameters:
Return type:

NamedList

Returns:

A deep copy of a NamedList with the same contents.

__eq__(other)[source]
Parameters:

other (NamedList) – Another NamedList.

Return type:

bool

Returns:

Whether the current object is equal to other, i.e., same data and names.

__getitem__(index)[source]

If index is a scalar, this is an alias for get_value().

If index is a sequence, this is an alias for get_slice().

Return type:

Union[NamedList, Any]

__hash__ = None
__iadd__(other)[source]

Alias for extend(), returning a reference to the current object after the in-place modification.

__iter__()[source]
Return type:

list_iterator

Returns:

An iterator on the underlying list of data.

__len__()[source]
Return type:

int

Returns:

Length of the list.

__repr__()[source]
Return type:

str

Returns:

Representation of the current list.

__setitem__(index, value)[source]

If index is a scalar, this is an alias for set_value() with in_place = True.

If index is a sequence, this is an alias for set_slice() with in_place = True.

append(value)[source]

Alias for safe_append() with in_place = True.

as_dict()[source]
Return type:

Dict[str, Any]

Returns:

A dictionary where the keys are the names and the values are the list elements. Only the first occurrence of each name is returned.

Values of the dictionary should be treated as read-only references.

as_list()[source]
Return type:

list

Returns:

The underlying list of elements.

The returned object should be treated as a read-only reference.

copy()[source]
Return type:

NamedList

Returns:

A shallow copy of a NamedList with the same contents. This will copy the underlying list (and names, if any exist) so that any in-place operations like append(), etc., on the new object will not change the original object.

extend(other)[source]

Alias for safe_extend() with in_place = True.

static from_dict(x)[source]
Parameters:

x (dict) – Dictionary where keys are strings (or can be coerced to them).

Return type:

NamedList

Returns:

A NamedList instance where the list elements are the values of x and the names are the stringified keys.

static from_list(x)[source]
Parameters:

x (list) – List of data elements.

Return type:

NamedList

Returns:

A NamedList instance with the contents of x and no names.

get_names()[source]
Return type:

Names

Returns:

Names for the list elements.

The returned object should be treated as a read-only reference. To modify the names, use set_names() instead.

get_slice(index)[source]
Parameters:

index (Union[slice, range, Sequence, int, str, bool, NormalizedSubscript]) – Subset of elements to obtain, see normalize_subscript() for details. Strings are matched to names in the current object, using the first occurrence if duplicate names are present. Scalars are treated as length-1 sequences.

Return type:

NamedList

Returns:

A NamedList is returned containing the specified subset.

get_value(index)[source]
Parameters:

index (Union[str, int]) – Integer index of the element to obtain. Alternatively, a string containing the name of the element, using the first occurrence if duplicate names are present.

Return type:

Any

Returns:

The value at the specified position (or with the specified name).

insert(index, value)[source]

Alias for safe_insert() with in_place = True.

property names: Names

Alias for get_names().

safe_append(value, in_place=False)[source]
Parameters:
  • value (Any) – Any value.

  • in_place (bool) – Whether to perform the modification in place.

Return type:

NamedList

Returns:

A NamedList where value is added to the end. If in_place = False, this is a new object, otherwise it is a reference to the current object. If names are present in the current object, the newly added element has its name set to an empty string.

safe_extend(other, in_place=False)[source]
Parameters:
  • other (Iterable) – Some iterable object. If this is a NamedList, its names are used to extend the names of the current object; otherwise the extended names are set to empty strings.

  • in_place (bool) – Whether to perform the modification in place.

Return type:

NamedList

Returns:

A NamedList where items in other are added to the end. If in_place = False, this is a new object, otherwise a reference to the current object is returned.

safe_insert(index, value, in_place=False)[source]
Parameters:
  • index (Union[int, str]) – An integer index containing a position to insert at. Alternatively, the name of the value to insert at (the first occurrence of each name is used).

  • value (Any) – A value to be inserted into the current object.

  • in_place (bool) – Whether to modify the current object in place.

Return type:

NamedList

Returns:

A NamedList where value is inserted at index. This is a new object if in_place = False, otherwise it is a reference to the current object. If names are present in the current object, the newly inserted element’s name is set to an empty string.

set_names(names, in_place=False)[source]
Parameters:
  • names (Optional[Names]) – List of names, of the same length as this list.

  • in_place (bool) – Whether to perform this modification in-place.

Return type:

NamedList

Returns:

A modified NamedList with the new names. If in_place = False, this is a new NamedList, otherwise it is a reference to the current NamedList.

set_slice(index, value, in_place=False)[source]
Parameters:
  • index (Union[slice, range, Sequence, int, str, bool, NormalizedSubscript]) – Subset of elements to replace, see normalize_subscript() for details. Strings are matched to names in the current object, using the first occurrence if duplicate names are present. Scalars are treated as length-1 sequences.

  • value (Sequence) –

    If index is a sequence, a sequence of the same length containing values to be set at the positions in index.

    If index is a scalar, any object to be used as the replacement value for the position at index.

  • in_place (bool) – Whether to perform the replacement in place.

Return type:

NamedList

Returns:

A NamedList where the entries at index are replaced with the contents of value. If in_place = False, this is a new object, otherwise it is a reference to the current object.

Unlike set_value(), this will not add new elements if index contains names that do not already exist in the object; a missing name error is raised instead.

set_value(index, value, in_place=False)[source]
Parameters:
  • index (Union[str, int]) – Integer index of the element to obtain. Alternatively, a string containing the name of the element; we consider the first occurrence of the name if duplicates are present.

  • value (Any) – Replacement value of the list element.

  • in_place (bool) – Whether to perform the replacement in place.

Return type:

NamedList

Returns:

A NamedList is returned after the value at the specified position (or with the specified name) is replaced. If in_place = False, this is a new object, otherwise it is a reference to the current object.

If index is a name that does not already exist in the current object, value is added to the end of the list, and the index is added as a new name.

biocutils.Names module

class biocutils.Names.Names(names=None, _validate=True)[source]

Bases: object

List of strings containing names. Typically used to decorate sequences, such that callers can get or set elements by name instead of position.

__add__(other)[source]
Parameters:

other (list) – List of names.

Returns:

A new Names containing the combined contents of the current object and other.

__copy__()[source]

Alias for copy.

Return type:

Names

__deepcopy__(memo=None, _nil=[])[source]
Parameters:
Return type:

Names

Returns:

A deep copy of this Names object with the same contents.

__eq__(other)[source]
Parameters:

other (Names) – Another Names object.

Return type:

bool

Returns:

Whether the current object is the same as other.

__getitem__(index)[source]

If index is a scalar, this is an alias for get_value.

If index is a sequence, this is an alias for get_slice.

Return type:

Union[str, Names]

__hash__ = None
__iadd__(other)[source]
Parameters:

other (list) – List of names.

Returns:

The current object is modified by adding other to its names.

__iter__()[source]
Return type:

list_iterator

Returns:

An iterator on the underlying list of names.

__len__()[source]
Return type:

int

Returns:

Length of the list.

__repr__()[source]
Return type:

str

Returns:

A stringified representation of this object.

__setitem__(index, value)[source]

If index is a scalar, this is an alias for set_value with in_place = True.

If index is a sequence, this is an alias for set_slice with in_place = True.

append(value)[source]

Alias for safe_append with in_place = True.

as_list()[source]
Return type:

List[str]

Returns:

List of strings containing the names.

This should be treated as a read-only reference. Modifications should be performed by creating a new Names object instead.

copy()[source]
Return type:

Names

Returns:

A shallow copy of the current object. This will copy the underlying list so that any in-place operations like append, etc., on the new object will not change the original object.

extend(value)[source]

Alias for safe_extend with in_place = True.

get_slice(index)[source]
Parameters:

index (Union[slice, range, Sequence, int, bool, NormalizedSubscript]) – Positions of interest, see the allowed indices in normalize_subscript() for more details. Scalars are treated as length-1 sequences.

Return type:

Names

Returns:

A Names object containing the names at the specified positions.

get_value(index)[source]
Parameters:

index (int) – Position of interest.

Return type:

str

Returns:

The name at the specified position.

insert(index, value)[source]

Alias for safe_insert with in_place = True.

map(name)[source]
Parameters:

name (str) – Name of interest.

Return type:

int

Returns:

Index containing the position of the first occurrence of name; or -1, if name is not present in this object.

safe_append(value, in_place=False)[source]
Parameters:
  • value (str) – Name to be added.

  • in_place (bool) – Whether to perform this appending in-place.

Return type:

Names

Returns:

A Names object is returned with the added name. This may be a new object or a reference to the current object.

safe_extend(value, in_place=False)[source]
Parameters:
  • value (Sequence[str]) – Names to be added.

  • in_place (bool) – Whether to perform this extension in-place.

Return type:

Names

Returns:

A Names object is returned with the extension. This may be a new object or a reference to the current object.

safe_insert(index, value, in_place=False)[source]
Parameters:
  • index (int) – Position on the object to insert at.

  • value (str) – Name to be added.

  • in_place (bool) – Whether to perform this insertion in-place.

Return type:

Names

Returns:

A Names object is returned with the inserted name. This may be a new object or a reference to the current object.

set_slice(index, value, in_place=False)[source]
Parameters:
Return type:

Names

Returns:

A modified Names object with the replacement name, either as a new object or as a reference to the current object.

set_value(index, value, in_place=False)[source]
Parameters:
  • index (int) – Position of interest.

  • value (str) – Replacement name.

  • in_place (bool) – Whether to perform the modification in-place.

Return type:

Names

Returns:

A modified Names object with the replacement name, either as a new object or as a reference to the current object.

biocutils.StringList module

class biocutils.StringList.StringList(data=None, names=None, _validate=True)[source]

Bases: NamedList

List of strings. This mimics a regular Python list except that anything added to it will be coerced into a string. None values are also acceptable and are treated as missing strings. The list may also be named (see NamedList), which provides some dictionary-like functionality.

__annotations__ = {}
safe_append(value, in_place=False)[source]

Calls safe_append() after coercing value to a string.

Return type:

StringList

safe_extend(other, in_place=True)[source]

Calls safe_extend() after coercing elements of other to strings.

Return type:

StringList

safe_insert(index, value, in_place=False)[source]

Calls safe_insert() after coercing value to a string.

Return type:

StringList

set_slice(index, value, in_place=False)[source]

Calls set_slice() after coercing value to strings.

Return type:

StringList

set_value(index, value, in_place=False)[source]

Calls set_value() after coercing value to a string.

Return type:

StringList

biocutils.assign module

biocutils.assign.assign(x, indices, replacement)[source]

Generic assign that checks if the objects are n-dimensional for n > 1 (i.e. has a shape property of length greater than 1); if so, it calls assign_rows() to assign them along the first dimension, otherwise it assumes that they are vector-like and calls assign_sequence() instead.

Parameters:

x (Any) – Object to be assignted.

Return type:

Any

Returns:

The object after assignment, typically the same type as x.

biocutils.assign_rows module

biocutils.assign_rows.assign_rows(x, indices, replacement)[source]

Assign replacement values to a copy of x at the rows specified by indices. This defaults to creating a deep copy of x and then assigning replacement to the first dimension of the copy.

Parameters:
  • x (Any) – Any high-dimensional object.

  • indices (Sequence[int]) – Sequence of non-negative integers specifying rows of x.

  • replacement (Any) – Replacement values to be assigned to x. This should have the same number of rows as the length of indices. Typically replacement will have the same dimensionality as x.

Return type:

Any

Returns:

A copy of x with the rows replaced at indices.

biocutils.assign_sequence module

biocutils.assign_sequence.assign_sequence(x, indices, replacement)[source]

Assign replacement values to a copy of x at the specified indices. This defaults to creating a deep copy of x and then iterating through indices to assign the values of replacement.

Parameters:
  • x (Any) – Any sequence-like object that can be assigned.

  • indices (Sequence[int]) – Sequence of non-negative integers specifying positions on x.

  • replacement (Any) – Replacement values to be assigned to x. This should have the same length as indices.

Return type:

Any

Returns:

A copy of x with the replacement values.

biocutils.combine module

biocutils.combine.combine(*x)[source]

Generic combine that checks if the objects are n-dimensional for n > 1 (i.e. has a shape property of length greater than 1); if so, it calls combine_rows() to combine them by the first dimension, otherwise it assumes that they are vector-like and calls combine_sequences() instead.

Parameters:

x (Any) – Objects to combine.

Returns:

A combined object, typically the same type as the first element in x.

biocutils.combine_columns module

biocutils.combine_columns.combine_columns(*x)[source]

Combine n-dimensional objects along the second dimension.

If all elements are ndarray, we combine them using numpy’s concatenate().

If all elements are either spmatrix or sparray, these objects are combined using scipy’s hstack.

If all elements are DataFrame objects, they are combined using concat() along the second axis.

Parameters:

x (Any) – n-dimensional objects to combine. All elements of x are expected to be the same class.

Returns:

Combined object, typically the same type as the first entry of x

biocutils.combine_rows module

biocutils.combine_rows.combine_rows(*x)[source]

Combine n-dimensional objects along their first dimension.

If all elements are ndarray, we combine them using numpy’s concatenate().

If all elements are either spmatrix or sparray, these objects are combined using scipy’s vstack.

If all elements are DataFrame objects, they are combined using concat() along the first axis.

Parameters:

x (Any) – One or more n-dimensional objects to combine. All elements of x are expected to be the same class.

Returns:

Combined object, typically the same type as the first entry of x.

biocutils.combine_sequences module

biocutils.combine_sequences.combine_sequences(*x)[source]

Combine vector-like objects (1-dimensional arrays).

If all elements are ndarray, we combine them using numpy’s concatenate().

If all elements are Series objects, they are combined using concat().

For all other scenarios, all elements are coerced to a list and combined.

Parameters:

x (Any) – Vector-like objects to combine. All elements of x are expected to be the same class or atleast compatible with each other.

Returns:

A combined object, ideally of the same type as the first element in x.

biocutils.convert_to_dense module

biocutils.convert_to_dense.convert_to_dense(x)[source]

Convert something to a NumPy dense array of the same shape. This is typically used a fallback for the various combining methods when there are lots of different array types that numpy.concatenate doesn’t understand.

Parameters:

x (Any) – Some array-like object to be stored as a NumPy array.

Return type:

ndarray

Returns:

A NumPy array.

biocutils.extract_column_names module

biocutils.extract_column_names.extract_column_names(x)[source]

Access column names from 2-dimensional representations.

Parameters:

x (Any) – Any object.

Return type:

ndarray

Returns:

Array of strings containing column names.

biocutils.extract_row_names module

biocutils.extract_row_names.extract_row_names(x)[source]

Access row names from 2-dimensional representations.

Parameters:

x (Any) – Any object.

Return type:

ndarray

Returns:

Array of strings containing row names.

biocutils.factorize module

biocutils.factorize.factorize(x, levels=None, sort_levels=False, dtype=None, fail_missing=None)[source]

Convert a sequence of hashable values into a factor.

Parameters:
  • x (Sequence) – A sequence of hashable values. Any value may be None to indicate missingness.

  • levels (Optional[Sequence]) – Sequence of reference levels, against which the entries in x are compared. If None, this defaults to all unique values of x.

  • sort_levels (bool) – Whether to sort the automatically-determined levels. If False, the levels are kept in order of their appearance in x. Not used if levels is explicitly supplied.

  • dtype (Optional[dtype]) – NumPy type of the array of indices, see match() for details.

  • fail_missing (Optional[bool]) – Whether to raise an error upon encountering missing levels in x, see match() for details.

Return type:

Tuple[list, ndarray]

Returns:

Tuple where the first element is a list of unique levels and the second element in a NumPy array containing integer codes, i.e., indices into the first list. Indexing the first list by the second array will recover x, with the exception of any None or masked values in x that will instead be represented by -1 in the second array.

biocutils.get_height module

biocutils.get_height.get_height(x)[source]

Get the “height” of an object, i.e., as if it were a column of a data frame or a similar container. This defaults to len for vector-like objects, or the first dimension for high-dimensional objects with a shape.

Parameters:

x (Any) – Some kind of object.

Return type:

int

Returns:

The height of the object.

biocutils.intersect module

biocutils.intersect.intersect(*x, duplicate_method='first')[source]

Identify the intersection of values in multiple sequences, while preserving the order of values in the first sequence.

Parameters:
  • x (Sequence) – Zero, one or more sequences of interest containing hashable values. We ignore missing values as defined by is_missing_scalar().

  • duplicate_method (Literal['first', 'last']) – Whether to keep the first or last occurrence of duplicated values when preserving order in the first sequence.

Return type:

list

Returns:

Intersection of values across all x.

biocutils.is_high_dimensional module

biocutils.is_high_dimensional.is_high_dimensional(x)[source]

Whether an object is high-dimensional, i.e., has a shape attribute that is of length greater than 1.

Parameters:

x – Some kind of object.

Returns:

Whether x is high-dimensional.

biocutils.is_list_of_type module

biocutils.is_list_of_type.is_list_of_type(x, target_type, ignore_none=False)[source]

Checks if x is a list, and whether all elements of the list are of the same type.

Parameters:
  • x (Union[list, tuple]) – A list or tuple of values.

  • target_type (Callable) – Type to check for, e.g. str, int.

  • ignore_none (bool) – Whether to ignore Nones when comparing to target_type.

Return type:

bool

Returns:

True if x is a list or tuple and all elements are of the target type (or None, if ignore_none = True). Otherwise, False.

biocutils.is_missing_scalar module

biocutils.is_missing_scalar.is_missing_scalar(x)[source]
Parameters:

x – Any scalar value.

Return type:

bool

Returns:

Whether x is None or a NumPy masked constant.

biocutils.map_to_index module

biocutils.map_to_index.map_to_index(x, duplicate_method='first')[source]

Create a dictionary to map values of a sequence to positional indices.

Parameters:
  • x (Sequence) – Sequence of hashable values. We ignore missing values defined by is_missing_scalar().

  • duplicate_method (Literal['first', 'last']) – Whether to consider the first or last occurrence of a duplicated value in x.

Returns:

Dictionary that maps values of x to their position inside x.

Return type:

dict

biocutils.match module

biocutils.match.match(x, targets, duplicate_method='first', dtype=None, fail_missing=None)[source]

Find a matching value of each element of x in target.

Parameters:
  • x (Sequence) – Sequence of values to match.

  • targets (Union[dict, Sequence]) – Sequence of targets to be matched against. Alternatively, a dictionary generated by passing a sequence of targets to map_to_index().

  • duplicate_method (Literal['first', 'last']) – How to handle duplicate entries in targets. Matches can be reported to the first or last occurrence of duplicates.

  • dtype (Optional[ndarray]) – NumPy type of the output array. This should be an integer type; if missing values are expected, the type should be a signed integer. If None, a suitable signed type is automatically determined.

  • fail_missing (Optional[bool]) – Whether to raise an error if x cannot be found in targets. If None, this defaults to True if dtype is an unsigned type, otherwise it defaults to False.

Return type:

ndarray

Returns:

Array of length equal to x, containing the integer position of each entry of x inside target; or -1, if the entry of x is None or cannot be found in target.

biocutils.normalize_subscript module

class biocutils.normalize_subscript.NormalizedSubscript(subscript)[source]

Bases: object

Subscript normalized by normalize_subscript(). This is used to indicate that no further normalization is required, such that normalize_subscript() is just a no-op.

__getitem__(index)[source]
Parameters:

index (Any) – Any argument accepted by the __getitem__ method of the subscript.

Return type:

Any

Returns:

The same return value as the __getitem__ method of the subscript. This should be an integer if index is an integer.

__len__()[source]
Return type:

int

Returns:

Length of the subscript.

property subscript: Sequence[int]

Returns: The subscript, as a sequence of integer positions.

biocutils.normalize_subscript.normalize_subscript(sub, length, names=None, non_negative_only=True)[source]

Normalize a subscript for __getitem__ or friends into a sequence of integer indices, for consistent downstream use.

Parameters:
  • sub (Union[slice, range, Sequence, int, str, bool, NormalizedSubscript]) –

    The subscript. This can be any of the following:

    • A slice.

    • A range containing indices to elements. Negative values are allowed. An error is raised if the indices are out of range.

    • A single integer specifying the index of an element. A negative value is allowed. An error is raised if the index is out of range.

    • A single string that can be found in names, which is converted to the index of the first occurrence of that string in names. An error is raised if the string cannot be found.

    • A single boolean, which is converted into a list containing the first element if true, and an empty list if false.

    • A sequence of strings, integers and/or booleans. Strings are converted to indices based on first occurrence in names, as described above. Integers should be indices to an element. Each truthy boolean is converted to an index equal to its position in sub, and each Falsey boolean is ignored.

    • A NormalizedSubscript, in which case the subscript property is directly returned.

  • length (int) – Length of the object.

  • names (Optional[Sequence[str]]) – List of names for each entry in the object. If not None, this should have length equal to length. Some optimizations are possible if this is a Names object.

  • non_negative_only (bool) – Whether negative indices must be converted into non-negative equivalents. Setting this to False may improve efficiency.

Return type:

Tuple

Returns:

A tuple containing (i) a sequence of integer indices in [0, length) specifying the subscript elements, and (ii) a boolean indicating whether sub was a scalar.

biocutils.package_utils module

biocutils.package_utils.is_package_installed(package_name)[source]

Check if the package is installed.

Parameters:

package_name (str) – Package name.

Returns:

True if package is installed, otherwise False.

Return type:

bool

biocutils.print_truncated module

biocutils.print_truncated.print_truncated(x, truncated_to=3, full_threshold=10)[source]

Pretty-print an object, replacing the middle elements of lists/dictionaries with an ellipsis if there are too many. This provides a useful preview of an object without spewing out all of its contents on the screen.

Parameters:
  • x – Object to be printed.

  • truncated_to (int) – Number of elements to truncate to, at the start and end of the list or dictionary. This should be less than half of full_threshold.

  • full_threshold (int) – Threshold on the number of elements, below which the list or dictionary is shown in its entirety.

Return type:

str

Returns:

String containing the pretty-printed contents.

biocutils.print_truncated.print_truncated_dict(x, truncated_to=3, full_threshold=10, transform=None, sep=', ', include_brackets=True)[source]

Pretty-print a dictionary, replacing the middle elements with an ellipsis if there are too many. This provides a useful preview of an object without spewing out all of its contents on the screen.

Parameters:
  • x (Dict) – Dictionary to be printed.

  • truncated_to (int) – Number of elements to truncate to, at the start and end of the sequence. This should be less than half of full_threshold.

  • full_threshold (int) – Threshold on the number of elements, below which the list is shown in its entirety.

  • transform (Optional[Callable]) – Optional transformation to apply to the values of x after truncation but before printing. Defaults to print_truncated() if not supplied.

  • sep (str) – Separator between elements in the printed list.

  • include_brackets (bool) – Whether to include the start/end brackets.

Return type:

str

Returns:

String containing the pretty-printed truncated dict.

biocutils.print_truncated.print_truncated_list(x, truncated_to=3, full_threshold=10, transform=None, sep=', ', include_brackets=True)[source]

Pretty-print a list, replacing the middle elements with an ellipsis if there are too many. This provides a useful preview of an object without spewing out all of its contents on the screen.

Parameters:
  • x (List) – List to be printed.

  • truncated_to (int) – Number of elements to truncate to, at the start and end of the list. This should be less than half of full_threshold.

  • full_threshold (int) – Threshold on the number of elements, below which the list is shown in its entirety.

  • transform (Optional[Callable]) – Optional transformation to apply to the elements of x after truncation but before printing. Defaults to print_truncated() if not supplied.

  • sep (str) – Separator between elements in the printed list.

  • include_brackets (bool) – Whether to include the start/end brackets.

Return type:

str

Returns:

String containing the pretty-printed truncated list.

biocutils.print_wrapped_table module

biocutils.print_wrapped_table.create_floating_names(names, indices)[source]

Create the floating names to use in print_wrapped_table(). If no names are present, positional indices are used instead.

Parameters:
  • names (Optional[List[str]]) – List of row names, or None if no row names are available.

  • indices (Sequence[int]) – Integer indices for which to obtain the names.

Return type:

List[str]

Returns:

List of strings containing floating names.

biocutils.print_wrapped_table.print_type(x)[source]

Print the type of an object, with some special behavior for certain classes (e.g., to add the data type of NumPy arrays). This is intended for display at the top of the columns of print_wrapped_table().

Parameters:

x – Some object.

Return type:

str

Returns:

String containing the class of the object.

biocutils.print_wrapped_table.print_wrapped_table(columns, floating_names=None, sep=' ', window=None)[source]

Pretty-print a table with aligned and wrapped columns. All column contents are padded so that they are right- justified. Wrapping is performed whenever a new column would exceed the window width, in which case the entire column (and all subsequent columns) are printed below the previous columns.

Parameters:
  • columns (List[Sequence[str]]) –

    List of list of strings, where each inner list is the same length and contains the visible contents of a column. Strings are typically generated by calling repr() on data column values.

    Callers are responsible for inserting ellipses, adding column type information (e.g., with print_type()) or truncating long strings (e.g., with truncate_strings()).

  • floating_names (Optional[Sequence[str]]) –

    List of strings to be added to the left of the table. This is printed repeatedly for each set of wrapped columns.

    See also create_floating_names().

  • sep (str) – Separator between columns.

  • window (Optional[int]) – Size of the terminal window, in characters. We attempt to determine this automatically, otherwise it is set to 150.

Return type:

str

Returns:

String containing the pretty-printed table.

biocutils.print_wrapped_table.truncate_strings(values, width=40)[source]

Truncate long strings for printing in print_wrapped_table().

Parameters:
  • values (List[str]) – List of strings to be printed.

  • width (int) – Width beyond which to truncate the string.

Return type:

List[str]

Returns:

List containing truncated strings.

biocutils.relaxed_combine_columns module

biocutils.relaxed_combine_columns.relaxed_combine_columns(*x)[source]

Combine n-dimensional objects along the second dimension.

Parameters:

x (Any) – n-dimensional objects to combine. All elements of x are expected to be the same class.

Returns:

Combined object, typically the same type as the first entry of x

biocutils.relaxed_combine_rows module

biocutils.relaxed_combine_rows.relaxed_combine_rows(*x)[source]

Combine n-dimensional objects along their first dimension.

Parameters:

x (Any) – One or more n-dimensional objects to combine. All elements of x are expected to be the same class.

Returns:

Combined object, typically the same type as the first entry of x.

biocutils.reverse_index module

biocutils.reverse_index.build_reverse_index(obj)[source]

Build a reverse index by name, for fast lookup operations.

Only contains the first occurence of a term.

Parameters:

obj (Sequence[str]) – List of names.

Returns:

A map of keys and their index positions.

biocutils.show_as_cell module

biocutils.show_as_cell.show_as_cell(x, indices)[source]

Show the contents of x as a cell of a table, typically for use in the __str__ method of a class that contains x.

Parameters:
  • x (Any) – Any object. By default, we assume that it can be treated as a sequence, with a valid __getitem__ method for an index.

  • indices (Sequence[int]) – List of indices to be extracted.

Return type:

List[str]

Returns:

List of strings of length equal to indices, containing a string summary of each of the specified elements of x.

biocutils.subset module

biocutils.subset.subset(x, indices)[source]

Generic subset that checks if the objects are n-dimensional for n > 1 (i.e. has a shape property of length greater than 1); if so, it calls subset_rows() to subset them along the first dimension, otherwise it assumes that they are vector-like and calls subset_sequence() instead.

Parameters:

x (Any) – Object to be subsetted.

Returns:

The subsetted object, typically the same type as x.

biocutils.subset_rows module

biocutils.subset_rows.subset_rows(x, indices)[source]

Subset x by indices on the first dimension. The default method attempts to use x’s __getitem__ method,

Parameters:
  • x (Any) – Any high-dimensional object.

  • indices (Sequence[int]) – Sequence of non-negative integers specifying the integers of interest.

Return type:

Any

Returns:

The result of slicing x by indices. The exact type depends on what x’s __getitem__ method returns.

biocutils.subset_sequence module

biocutils.subset_sequence.subset_sequence(x, indices)[source]

Subset x by indices to obtain a new object. The default method attempts to use x’s __getitem__ method.

Parameters:
  • x (Any) – Any object that supports __getitem__ with an integer sequence.

  • indices (Sequence[int]) – Sequence of non-negative integers specifying the integers of interest. All indices should be less than len(x).

Return type:

Any

Returns:

The result of slicing x by indices. The exact type depends on what x’s __getitem__ method returns.

biocutils.union module

biocutils.union.union(*x, duplicate_method='first')[source]

Identify the union of values in multiple sequences, while preserving the order of the first (or last) occurence of each value.

Parameters:
  • x (Sequence) – Zero, one or more sequences of interest containing hashable values. We ignore missing values as defined by is_missing_scalar().

  • duplicate_method (Literal['first', 'last']) – Whether to take the first or last occurrence of each value in the ordering of the output. If first, the first occurrence in the earliest sequence of x is reported; if last, the last occurrence in the latest sequence of x is reported.

Return type:

list

Returns:

Union of values across all x.

biocutils.which module

biocutils.which.which(x, dtype=None)[source]

Report the indices of all elements of x that are truthy.

Parameters:
  • x (Sequence) – Sequence of values to be interpreted as booleans.

  • dtype (Optional[ndarray]) – NumPy type of the output array. This should be an integer type. If None, a suitable signed type is automatically determined.

Return type:

ndarray

Returns:

Array of length no greater than x, containing the indices of all truthy entries. Indices are guaranteed to be unique and sorted.

Module contents