biocframe package¶
Subpackages¶
Submodules¶
biocframe.BiocFrame module¶
- class biocframe.BiocFrame.BiocFrame(data=None, number_of_rows=None, row_names=None, column_names=None, column_data=None, metadata=None, _validate=True)[source]¶
Bases:
BiocObjectBiocFrame is an alternative to
DataFrame, with support for nested and flexible column types. Inspired by theDFrameclass from Bioconductor’s S4Vectors package. Any object may be used as a column, provided it has:Some concept of “height”, as defined by
get_height()from BiocUtils. This defaults to the length as defined by__len__.The ability to be sliced by integer indices, as implemented by
subset()from BiocUtils. This defaults to calling__getitem__.The ability to be combined with other objects, as implemented in
combine()from BiocUtils.The ability to perform an assignment, as implemented in
assign()from BiocUtils.
This allows
BiocFrameto accept arbitrarily complex classes (such as nestedBiocFrameinstances) as columns.- __array_ufunc__(func, method, *inputs, **kwargs)[source]¶
Interface for NumPy array methods.
Note: This is a very primitive implementation and needs tests to support different types.
- Return type:
- Returns:
An object with the same type as the caller.
- __delitem__(name)[source]¶
Alias for
remove_columnwithin_place = True.As this mutates the original object, a warning is raised.
- Return type:
- __getitem__(args)[source]¶
Wrapper around
get_columnandget_sliceto obtain a slice of aBiocFrameor any of its columns.- Parameters:
args (
Union[int,str,Sequence[Union[str,int]],Tuple[Union[int,str,Sequence[Union[str,int]],slice],...]]) –A sequence or a scalar integer or string, specifying the columns to retain based on their names or indices.
Alternatively a tuple of length 1. The first entry specifies the rows to retain based on their names or indices.
Alternatively a tuple of length 2. The first entry specifies the rows to retain, while the second entry specifies the columns to retain, based on their names or indices.
- Return type:
- Returns:
If
argsis a scalar, the specified column is returned. This is achieved internally by callingget_column.If
argsis a sequence, a newBiocFrameis returned containing only the specified columns. This is achieved by just callingget_slicewith no row slicing.If
argsis a tuple of length 1, a newBiocFrameis returned containing the specified rows. This is achieved by just callingget_slicewith no column slicing.If
argsis a tuple of length 2, a newBiocFrameis returned containing the specified rows and columns. This is achieved by just callingget_slicewith the specified arguments.
- __hash__ = None¶
- __init__(data=None, number_of_rows=None, row_names=None, column_names=None, column_data=None, metadata=None, _validate=True)[source]¶
Initialize a
BiocFrameobject from columns.- Parameters:
data (
Union[Dict[str,Any],NamedList,Sequence[Any],None]) –Dictionary of column names as keys and their values. All columns must have the same length. Defaults to an empty dictionary.
Alternatively may provide a Mapping object, for example a
NamedListthat can be coerced into a dictionary.Alternatively, a sequence of columns may be provided. In this case,
column_namesmust be provided and must have the same length as the sequence.number_of_rows (
Optional[int]) – Number of rows. If not specified, inferred fromdata. This needs to be provided ifdatais empty androw_namesare not present.row_names (
Union[Sequence[str],Names,None]) – Row names. This should not contain missing strings.column_names (
Union[Sequence[str],Names,None]) – Column names. If not provided, inferred from thedata. This may be in a different order than the keys ofdata. This should not contain missing strings.column_data (
Optional[BiocFrame]) – Metadata about columns. Must have the same number of rows as the length ofcolumn_names. Defaults to None.metadata (
Union[Dict[str,Any],NamedList,None]) – Additional metadata. Defaults to an empty dictionary._validate (
bool) – Internal use only.
- __setitem__(args, value)[source]¶
Wrapper around
set_columnandset_sliceto modify a slice of aBiocFrameor any of its columns. As this modified the original object in place, a warning is raise.If
argsis a string, it is assumed to be a column name andvalueis expected to be the column contents; these are passed ontoset_columnwithin_place = True.If
argsis a tuple, it is assumed to contain row and column indices.valueis expected to be aBiocFramecontaining replacement values. These are passed toset_slicewithin_place = True.- Return type:
- property colnames: Names¶
Alias for
get_column_names, provided for back-compatibility only.
- column(column)[source]¶
Alias for
get_column(), provided for back-compatibility only.- Return type:
- property column_data: BiocFrame | None¶
Alias for
get_column_data.
- property column_names: Names¶
Alias for
get_column_names.
- property columns: Names¶
Alias for
get_column_names, provided for compatibility with pandas.
- combine(*other)[source]¶
Wrapper around
relaxed_combine_rows(), provided for back-compatibility only.- Return type:
- copy()[source]¶
Alias for
__copy__().- Return type:
- property empty: bool¶
Check if the object is empty.
- Returns:
True if the object has no rows, False otherwise.
- flatten(as_type='dict', delim='.')[source]¶
Flatten a nested BiocFrame object.
- Parameters:
- Return type:
- Returns:
An object with the type specified by
as_typeargument. Ifas_typeis dict, an additional column “rownames” is added if the object contains row names.
- classmethod from_polars(input)[source]¶
Create a
BiocFramefrom aDataFrameobject.- Parameters:
input (
DataFrame) – Input data.- Return type:
- Returns:
A
BiocFrameobject.
- get_column(column)[source]¶
Get the contents of the specified column.
- Parameters:
Name of the column, which must exist in
get_column_names.Alternatively, the integer index of the column of interest.
- Return type:
- Returns:
The contents of the specified column.
- get_column_data(with_names=True)[source]¶
Get column data.
- Parameters:
with_names (
bool) – Whether to set the column names of thisBiocFrameas the row names of the column dataBiocFrame.- Return type:
- Returns:
The annotations for each column. This may be None if no annotation is present, or is a
BiocFramewhere each row corresponds to a column and contains that column’s metadata.
- get_row(row)[source]¶
Get a specified row.
- Parameters:
Integer index of the row to access.
If row names are available (see
get_row_names), a string may be supplied instead. The first occurrence of the string in the row names is used.- Return type:
- Returns:
A dictionary where the keys are column names and the values are the contents of the columns at the specified
row.
- get_slice(rows, columns)[source]¶
Slice
BiocFramealong the rows and/or columns, based on their indices or names.- Parameters:
rows (
Union[str,int,bool,Sequence[Union[str,int,bool]],slice]) –Rows to be extracted. This may be an integer, boolean, string, or any sequence thereof, as supported by
normalize_subscript(). Scalars are treated as length-1 sequences.Strings may only be used if row names are available (see
get_row_names). The first occurrence of each string in the row names is used for extraction.columns (
Union[str,int,bool,Sequence[Union[str,int,bool]],slice]) – Columns to be extracted. This may be an integer, boolean, string, or any sequence thereof, as supported bynormalize_subscript(). Scalars are treated as length-1 sequences.
- Return type:
- Returns:
A
BiocFramewith the specified rows and columns.
- property index: Names | None¶
Alias to
get_row_names, provided for compatibility with pandas.
- is_empty()[source]¶
Check if the object is empty.
- Return type:
- Returns:
True if the object has no rows, False otherwise.
- merge(*other, by=None, join='left', rename_duplicate_columns=False)[source]¶
Wrapper around
merge().- Return type:
- relaxed_combine_columns(*other)[source]¶
Wrapper around
relaxed_combine_columns().- Return type:
- relaxed_combine_rows(*other)[source]¶
Wrapper around
relaxed_combine_rows().- Return type:
- remove_column(column, in_place=False)[source]¶
Remove a column. This is a convenience wrapper around
remove_columns.- Parameters:
- Return type:
- Returns:
A modified
BiocFrameobject, either as a copy of the original or as a reference to the (in-place-modified) original.
- remove_columns(columns, in_place=False)[source]¶
Remove any number of existing columns.
- Parameters:
- Returns:
A modified
BiocFrameobject.- Return type:
- Raises:
TypeError – If columns contains mixed types.
- remove_row(row, in_place=False)[source]¶
Remove a row. This is a convenience wrapper around
remove_rows.- Parameters:
- Return type:
- Returns:
A modified
BiocFrameobject, either as a copy of the original or as a reference to the (in-place-modified) original.
- remove_rows(rows, in_place=False)[source]¶
Remove any number of existing rows.
- Parameters:
- Return type:
- Returns:
A modified
BiocFrameobject, either as a copy of the original or as a reference to the (in-place-modified) original.- Raises:
TypeError – If rows contain mixed types.
- property row_names: Names | None¶
Alias for
get_row_names.
- property rownames: Names | None¶
Alias for
get_row_names, provided for back-compatibility.
- set_column(column, value, in_place=False)[source]¶
Modify an existing column or add a new column. This is a convenience wrapper around
set_columns.- Parameters:
column (
Union[int,str]) – Name of an existing or new column. Alternatively, an index specifying the position of an existing column.value (
Any) – Value of the new column. This should have the same height as the number of rows in the current object.in_place (
bool) – Whether to modify the object in place.
- Return type:
- Returns:
A modified
BiocFrameobject, either as a copy of the original or as a reference to the (in-place-modified) original.
- set_column_data(column_data, in_place=False)[source]¶
Set new column data.
- Parameters:
- Return type:
- Returns:
A modified
BiocFrameobject, either as a copy of the original or as a reference to the (in-place-modified) original.
- set_column_names(names, in_place=False)[source]¶
Set new column names.
- Parameters:
- Return type:
- Returns:
A modified
BiocFrameobject, either as a copy of the original or as a reference to the (in-place-modified) original.
- set_columns(columns, in_place=False)[source]¶
Modify existing columns or add new columns.
- Parameters:
columns (
Dict[Union[str,int],Any]) – Contents of the columns to set. Keys may be strings containing new or existing column names, or integers containing the position of the column. Values should be the contents of each column.in_place (
bool) – Whether to modify the object in place. Defaults to False.
- Return type:
- Returns:
A modified
BiocFrameobject, either as a copy of the original or as a reference to the (in-place-modified) original.
- set_row_names(names, in_place=False)[source]¶
Set new row names.
- Parameters:
- Return type:
- Returns:
A modified
BiocFrameobject, either as a copy of the original or as a reference to the (in-place-modified) original.
- set_slice(rows, columns, value, in_place=True)[source]¶
Replace a slice of the
BiocFramegiven the row and columns of the slice.- Parameters:
rows (
Union[int,str,bool,Sequence[Union[str,int,bool]],slice]) –Rows to be replaced. This may be any sequence of strings, integers, or booleans (or mixture thereof), as supported by
normalize_subscript(). Scalars are treated as length-1 sequences.Strings may only be used if row names are available (see
get_row_names). The first occurrence of each string in the row names is used for extraction.columns (
Union[int,str,bool,Sequence[Union[str,int,bool]],slice]) – Columns to be replaced. This may be any sequence of strings, integers, or booleans (or mixture thereof), as supported bynormalize_subscript(). Scalars are treated as length-1 sequences.value (
BiocFrame) – ABiocFramecontaining replacement values. Each row corresponds to a row inrows, while each column corresponds to a column incolumns. Note that the replacement is based on position, so row and column names invalueare ignored.in_place (
bool) – Whether to modify theBiocFrameobject in place.
- Return type:
- Returns:
A modified
BiocFrameobject, either as a copy of the original or as a reference to the (in-place-modified) original.
- property shape: Tuple[int, int]¶
Returns: Tuple containing the number of rows and columns in this
BiocFrame.
- slice(rows, columns)[source]¶
Alias for
__getitem__, for back-compatibility.- Return type:
- split(column_name, only_indices=False)[source]¶
Split the object by a column.
- Parameters:
- Return type:
- Returns:
A dictionary of biocframe objects, with names representing the group and the value the sliced frames.
if
only_indicesis True, the values contain the row indices that map to the same group.
- to_NamedList()[source]¶
Convert the
BiocFrameto aNamedList.- Return type:
- Returns:
A
NamedListcontaining the columns.
- to_dict()[source]¶
Alias for
get_data().
- class biocframe.BiocFrame.BiocFrameIter(obj)[source]¶
Bases:
objectAn iterator to a
BiocFrameobject.- __init__(obj)[source]¶
Initialize the iterator.
- Parameters:
obj (
BiocFrame) – Source object to iterate.
- biocframe.BiocFrame.merge(x, by=None, join='left', rename_duplicate_columns=False)[source]¶
Merge multiple
BiocFrame`objects together by common columns or row names, yielding a combined object with a union of columns across all objects.- Parameters:
x (
Sequence[BiocFrame]) – Sequence ofBiocFrameobjects. Each object may have any number and identity of rows and columns.by (
Union[None,str,int,Sequence[Union[None,str,int]]]) –If string, the name of column containing the keys. Each entry of
xis assumed to have this column.If integer, the index of column containing the keys. The same index is used for each entry of
x.If None, keys are assumed to be present in the row names.
Alternatively a sequence of strings, integers or None, specifying the location of the keys in each entry of
x.join (
Literal['inner','left','right','outer']) – Strategy for the merge. For left and right joins, we consider the keys for the first and last object inx, respectively.rename_duplicate_columns (
bool) – Whether duplicated non-key columns acrossxshould be automatically renamed in the merged object. If False, an error is raised instead.
- Return type:
- Returns:
A BiocFrame containing the merged contents.
If
by = None, the keys are stored in the row names.If
byis a string, keys are stored in the column of the same name.If
byis a sequence, keys are stored in the row names ifby[0] = None, otherwise they are stored in the column namedby[0].
- biocframe.BiocFrame.relaxed_combine_columns(*x)[source]¶
Wrapper around
merge()that performs a left join on the row names.- Return type:
- biocframe.BiocFrame.relaxed_combine_rows(*x)[source]¶
A relaxed version of the
combine_rows()method forBiocFrameobjects. Whereascombine_rowsexpects that all objects have the same columns,relaxed_combine_rowsallows for different columns. Absent columns in any object are filled in with appropriate placeholder values before combining.- Parameters:
x (
BiocFrame) – One or moreBiocFrameobjects, possibly with differences in the number and identity of their columns.- Return type:
- Returns:
A
BiocFramethat combines allxalong their rows and contains the union of all columns. Columns absent in anyxare filled in with placeholders consisting of Nones or masked NumPy values.