biocframe package¶
Subpackages¶
Submodules¶
biocframe.BiocFrame module¶
- class biocframe.BiocFrame.BiocFrame(data=None, number_of_rows=None, row_names=None, column_names=None, column_data=None, metadata=None, validate=True)[source]¶
Bases:
object
BiocFrame is an alternative to
DataFrame
, with support for nested and flexible column types. Inspired by theDFrame
class from Bioconductor’s S4Vectors package. Any object may be used as a column, provided it has:Some concept of “height”, as defined by
get_height()
from BiocUtils. This defaults to the length as defined by__len__
.The ability to be sliced by integer indices, as implemented by
subset()
from BiocUtils. This defaults to calling__getitem__
.The ability to be combined with other objects, as implemented in
combine()
from BiocUtils.The ability to perform an assignment, as implemented in
assign()
from BiocUtils.
This allows
BiocFrame
to accept arbitrarily complex classes (such as nestedBiocFrame
instances) as columns.- __array_ufunc__(func, method, *inputs, **kwargs)[source]¶
Interface for NumPy array methods.
Note: This is a very primitive implementation and needs tests to support different types.
- Return type:
- Returns:
An object with the same type as the caller.
- __delitem__(name)[source]¶
Alias for
remove_column
within_place = True
.As this mutates the original object, a warning is raised.
- __getitem__(args)[source]¶
Wrapper around
get_column
andget_slice
to obtain a slice of aBiocFrame
or any of its columns.- Parameters:
args (
Union
[int
,str
,Sequence
,tuple
]) –A sequence or a scalar integer or string, specifying the columns to retain based on their names or indices.
Alternatively a tuple of length 1. The first entry specifies the rows to retain based on their names or indices.
Alternatively a tuple of length 2. The first entry specifies the rows to retain, while the second entry specifies the columns to retain, based on their names or indices.
- Return type:
- Returns:
If
args
is a scalar, the specified column is returned. This is achieved internally by callingget_column
.If
args
is a sequence, a newBiocFrame
is returned containing only the specified columns. This is achieved by just callingget_slice
with no row slicing.If
args
is a tuple of length 1, a newBiocFrame
is returned containing the specified rows. This is achieved by just callingget_slice
with no column slicing.If
args
is a tuple of length 2, a newBiocFrame
is returned containing the specified rows and columns. This is achieved by just callingget_slice
with the specified arguments.
- __init__(data=None, number_of_rows=None, row_names=None, column_names=None, column_data=None, metadata=None, validate=True)[source]¶
Initialize a
BiocFrame
object from columns.- Parameters:
data (
Optional
[Dict
[str
,Any
]]) – Dictionary of column names as keys and their values. All columns must have the same length. Defaults to an empty dictionary.number_of_rows (
Optional
[int
]) – Number of rows. If not specified, inferred fromdata
. This needs to be provided ifdata
is empty androw_names
are not present.row_names (
Optional
[List
]) – Row names. This should not contain missing strings.column_names (
Optional
[List
[str
]]) – Column names. If not provided, inferred from thedata
. This may be in a different order than the keys ofdata
. This should not contain missing strings.column_data (
Optional
[BiocFrame
]) – Metadata about columns. Must have the same number of rows as the length ofcolumn_names
. Defaults to None.metadata (
Optional
[dict
]) – Additional metadata. Defaults to an empty dictionary.validate (
bool
) – Internal use only.
- __setitem__(args, value)[source]¶
Wrapper around
set_column
andset_slice
to modify a slice of aBiocFrame
or any of its columns. As this modified the original object in place, a warning is raise.If
args
is a string, it is assumed to be a column name andvalue
is expected to be the column contents; these are passed ontoset_column
within_place = True
.If
args
is a tuple, it is assumed to contain row and column indices.value
is expected to be aBiocFrame
containing replacement values. These are passed toset_slice
within_place = True
.
- property colnames: Names¶
Alias for
get_column_names
, provided for back-compatibility only.
- column(column)[source]¶
Alias for
get_column()
, provided for back-compatibility only.- Return type:
- property column_data: None | BiocFrame¶
Alias for
get_column_data
.
- property column_names: Names¶
Alias for
get_column_names
.
- property columns: Names¶
Alias for
get_column_names
, provided for compatibility with pandas.
- combine(*other)[source]¶
Wrapper around
relaxed_combine_rows()
, provided for back-compatibility only.
- copy()[source]¶
Alias for
__copy__()
.
- flatten(as_type='dict', delim='.')[source]¶
Flatten a nested BiocFrame object.
- Parameters:
- Return type:
- Returns:
An object with the type specified by
as_type
argument. Ifas_type
is dict, an additional column “rownames” is added if the object contains rownames.
- classmethod from_pandas(input)[source]¶
Create a
BiocFrame
from aDataFrame
object.- Parameters:
input (pandas.DataFrame) – Input data.
- Return type:
BiocFrame
- Returns:
A
BiocFrame
object.
- classmethod from_polars(input)[source]¶
Create a
BiocFrame
from aDataFrame
object.- Parameters:
input (polars.DataFrame) – Input data.
- Return type:
BiocFrame
- Returns:
A
BiocFrame
object.
- get_column(column)[source]¶
- Parameters:
Name of the column, which must exist in
get_column_names
.Alternatively, the integer index of the column of interest.
- Return type:
- Returns:
The contents of the specified column.
- get_column_data(with_names=True)[source]¶
- Parameters:
with_names (
bool
) – Whether to set the column names of thisBiocFrame
as the row names of the column dataBiocFrame
.- Return type:
- Returns:
The annotations for each column. This may be None if no annotation is present, or is a
BiocFrame
where each row corresponds to a column and contains that column’s metadata.
- get_row(row)[source]¶
- Parameters:
Integer index of the row to access.
If row names are available (see
get_row_names
), a string may be supplied instead. The first occurrence of the string in the row names is used.- Return type:
- Returns:
A dictionary where the keys are column names and the values are the contents of the columns at the specified
row
.
- get_slice(rows, columns)[source]¶
Slice
BiocFrame
along the rows and/or columns, based on their indices or names.- Parameters:
rows (
Union
[str
,int
,bool
,Sequence
]) –Rows to be extracted. This may be an integer, boolean, string, or any sequence thereof, as supported by
normalize_subscript()
. Scalars are treated as length-1 sequences.Strings may only be used if row names are available (see
get_row_names
). The first occurrence of each string in the row names is used for extraction.columns (
Union
[str
,int
,bool
,Sequence
]) – Columns to be extracted. This may be an integer, boolean, string, or any sequence thereof, as supported bynormalize_subscript()
. Scalars are treated as length-1 sequences.
- Return type:
- Returns:
A
BiocFrame
with the specified rows and columns.
- property index: Names | None¶
Alias to
get_row_names
, provided for compatibility with pandas.
- property metadata: dict¶
Alias for
get_metadata
.
- remove_column(column, in_place=False)[source]¶
Remove a column. This is a convenience wrapper around
remove_columns
.- Parameters:
- Return type:
- Returns:
A modified
BiocFrame
object, either as a copy of the original or as a reference to the (in-place-modified) original.
- remove_columns(columns, in_place=False)[source]¶
Remove any number of existing columns.
- Parameters:
- Returns:
A modified
BiocFrame
object, either as a copy of the original or as a reference to the (in-place-modified) original.- Return type:
- property row_names: Names | None¶
Alias for
get_row_names
.
- property rownames: Names | None¶
Alias for
get_row_names
, provided for back-compatibility.
- set_column(column, value, in_place=False)[source]¶
Modify an existing column or add a new column. This is a convenience wrapper around
set_columns
.- Parameters:
column (
Union
[int
,str
]) – Name of an existing or new column. Alternatively, an index specifying the position of an existing column.value (
Any
) – Value of the new column. This should have the same height as the number of rows in the current object.in_place (
bool
) – Whether to modify the object in place.
- Return type:
- Returns:
A modified
BiocFrame
object, either as a copy of the original or as a reference to the (in-place-modified) original.
- set_column_data(column_data, in_place=False)[source]¶
- Parameters:
- Return type:
- Returns:
A modified
BiocFrame
object, either as a copy of the original or as a reference to the (in-place-modified) original.
- set_column_names(names, in_place=False)[source]¶
- Parameters:
- Return type:
- Returns:
A modified
BiocFrame
object, either as a copy of the original or as a reference to the (in-place-modified) original.
- set_columns(columns, in_place=False)[source]¶
Modify existing columns or add new columns.
- Parameters:
- Return type:
- Returns:
A modified
BiocFrame
object, either as a copy of the original or as a reference to the (in-place-modified) original.
- set_row_names(names, in_place=False)[source]¶
- Parameters:
- Return type:
- Returns:
A modified
BiocFrame
object, either as a copy of the original or as a reference to the (in-place-modified) original.
- set_slice(rows, columns, value, in_place=True)[source]¶
Replace a slice of the
BiocFrame
given the row and columns of the slice.- Parameters:
rows (
Union
[int
,str
,bool
,Sequence
]) –Rows to be replaced. This may be any sequence of strings, integers, or booleans (or mixture thereof), as supported by
normalize_subscript()
. Scalars are treated as length-1 sequences.Strings may only be used if row names are available (see
get_row_names
). The first occurrence of each string in the row names is used for extraction.columns (
Union
[int
,str
,bool
,Sequence
]) – Columns to be replaced. This may be any sequence of strings, integers, or booleans (or mixture thereof), as supported bynormalize_subscript()
. Scalars are treated as length-1 sequences.value (
BiocFrame
) – ABiocFrame
containing replacement values. Each row corresponds to a row inrows
, while each column corresponds to a column incolumns
. Note that the replacement is based on position, so row and column names invalue
are ignored.in_place (
bool
) – Whether to modify theBiocFrame
object in place.
- Return type:
- Returns:
A modified
BiocFrame
object, either as a copy of the original or as a reference to the (in-place-modified) original.
- property shape: Tuple[int, int]¶
Returns: Tuple containing the number of rows and columns in this
BiocFrame
.
- slice(rows, columns)[source]¶
Alias for
__getitem__
, for back-compatibility.- Return type:
- split(column_name, only_indices=False)[source]¶
Split the object by a column.
- Parameters:
- Return type:
- Returns:
A dictionary of biocframe objects, with names representing the group and the value the sliced frames.
if
only_indices
is True, the values contain the row indices that map to the same group.
- class biocframe.BiocFrame.BiocFrameIter(obj)[source]¶
Bases:
object
An iterator to a
BiocFrame
object.- Parameters:
obj (BiocFrame) – Source object to iterate.
- biocframe.BiocFrame.merge(x, by=None, join='left', rename_duplicate_columns=False)[source]¶
Merge multiple
BiocFrame`
objects together by common columns or row names, yielding a combined object with a union of columns across all objects.- Parameters:
x (
Sequence
[BiocFrame
]) – Sequence ofBiocFrame
objects. Each object may have any number and identity of rows and columns.by (
Union
[None
,str
,Sequence
]) –If string, the name of column containing the keys. Each entry of
x
is assumed to have this column.If integer, the index of column containing the keys. The same index is used for each entry of
x
.If None, keys are assumed to be present in the row names.
Alternatively a sequence of strings, integers or None, specifying the location of the keys in each entry of
x
.join (
Literal
['inner'
,'left'
,'right'
,'outer'
]) – Strategy for the merge. For left and right joins, we consider the keys for the first and last object inx
, respectively.rename_duplicate_columns (
bool
) – Whether duplicated non-key columns acrossx
should be automatically renamed in the merged object. If False, an error is raised instead.
- Returns:
A BiocFrame containing the merged contents.
If
by = None
, the keys are stored in the row names.If
by
is a string, keys are stored in the column of the same name.If
by
is a sequence, keys are stored in the row names ifby[0] = None
, otherwise they are stored in the column namedby[0]
.- Return type:
- biocframe.BiocFrame.relaxed_combine_columns(*x)[source]¶
Wrapper around
merge()
that performs a left join on the row names.- Return type:
- biocframe.BiocFrame.relaxed_combine_rows(*x)[source]¶
A relaxed version of the
combine_rows()
method forBiocFrame
objects. Whereascombine_rows
expects that all objects have the same columns,relaxed_combine_rows
allows for different columns. Absent columns in any object are filled in with appropriate placeholder values before combining.- Parameters:
x (
BiocFrame
) – One or moreBiocFrame
objects, possibly with differences in the number and identity of their columns.- Return type:
- Returns:
A
BiocFrame
that combines allx
along their rows and contains the union of all columns. Columns absent in anyx
are filled in with placeholders consisting of Nones or masked NumPy values.