biocframe package¶
Subpackages¶
Submodules¶
biocframe.BiocFrame module¶
- class biocframe.BiocFrame.BiocFrame(data: Dict[str, Any] | None = None, number_of_rows: int | None = None, row_names: List | None = None, column_names: List[str] | None = None, column_data: BiocFrame | None = None, metadata: dict | None = None, validate: bool = True)[source]¶
Bases:
object
BiocFrame is an alternative to
DataFrame
, with support for nested and flexible column types. Inspired by theDFrame
class from Bioconductor’s S4Vectors package. Any object may be used as a column, provided it has:Some concept of “height”, as defined by
get_height()
from BiocUtils. This defaults to the length as defined by__len__
.The ability to be sliced by integer indices, as implemented by
subset()
from BiocUtils. This defaults to calling__getitem__
.The ability to be combined with other objects, as implemented in
combine()
from BiocUtils.The ability to perform an assignment, as implemented in
assign()
from BiocUtils.
This allows
BiocFrame
to accept arbitrarily complex classes (such as nestedBiocFrame
instances) as columns.- __array_ufunc__(func, method, *inputs, **kwargs) BiocFrame [source]¶
Interface for NumPy array methods.
Note: This is a very primitive implementation and needs tests to support different types.
- Returns:
An object with the same type as the caller.
- __delitem__(name: str)[source]¶
Alias for
remove_column
within_place = True
.As this mutates the original object, a warning is raised.
- __getitem__(args: int | str | Sequence | tuple) BiocFrame | Any [source]¶
Wrapper around
get_column
andget_slice
to obtain a slice of aBiocFrame
or any of its columns.- Parameters:
args –
A sequence or a scalar integer or string, specifying the columns to retain based on their names or indices.
Alternatively a tuple of length 1. The first entry specifies the rows to retain based on their names or indices.
Alternatively a tuple of length 2. The first entry specifies the rows to retain, while the second entry specifies the columns to retain, based on their names or indices.
- Returns:
If
args
is a scalar, the specified column is returned. This is achieved internally by callingget_column
.If
args
is a sequence, a newBiocFrame
is returned containing only the specified columns. This is achieved by just callingget_slice
with no row slicing.If
args
is a tuple of length 1, a newBiocFrame
is returned containing the specified rows. This is achieved by just callingget_slice
with no column slicing.If
args
is a tuple of length 2, a newBiocFrame
is returned containing the specified rows and columns. This is achieved by just callingget_slice
with the specified arguments.
- __init__(data: Dict[str, Any] | None = None, number_of_rows: int | None = None, row_names: List | None = None, column_names: List[str] | None = None, column_data: BiocFrame | None = None, metadata: dict | None = None, validate: bool = True) None [source]¶
Initialize a
BiocFrame
object from columns.- Parameters:
data – Dictionary of column names as keys and their values. All columns must have the same length. Defaults to an empty dictionary.
number_of_rows – Number of rows. If not specified, inferred from
data
. This needs to be provided ifdata
is empty androw_names
are not present.row_names – Row names. This should not contain missing strings.
column_names – Column names. If not provided, inferred from the
data
. This may be in a different order than the keys ofdata
. This should not contain missing strings.column_data – Metadata about columns. Must have the same number of rows as the length of
column_names
. Defaults to None.metadata – Additional metadata. Defaults to an empty dictionary.
validate – Internal use only.
- __iter__() BiocFrameIter [source]¶
Iterator over rows.
- __setitem__(args: int | str | Sequence | tuple, value: BiocFrame)[source]¶
Wrapper around
set_column
andset_slice
to modify a slice of aBiocFrame
or any of its columns. As this modified the original object in place, a warning is raise.If
args
is a string, it is assumed to be a column name andvalue
is expected to be the column contents; these are passed ontoset_column
within_place = True
.If
args
is a tuple, it is assumed to contain row and column indices.value
is expected to be aBiocFrame
containing replacement values. These are passed toset_slice
within_place = True
.
- property colnames: Names¶
Alias for
get_column_names
, provided for back-compatibility only.
- column(column: str | int) Any [source]¶
Alias for
get_column()
, provided for back-compatibility only.
- property column_data: None | BiocFrame¶
Alias for
get_column_data
.
- property column_names: Names¶
Alias for
get_column_names
.
- property columns: Names¶
Alias for
get_column_names
, provided for compatibility with pandas.
- combine(*other)[source]¶
Wrapper around
relaxed_combine_rows()
, provided for back-compatibility only.
- copy()[source]¶
Alias for
__copy__()
.
- flatten(as_type: Literal['dict', 'biocframe'] = 'dict', delim: str = '.') BiocFrame [source]¶
Flatten a nested BiocFrame object.
- Parameters:
- Returns:
An object with the type specified by
as_type
argument. Ifas_type
is dict, an additional column “rownames” is added if the object contains rownames.
- classmethod from_pandas(input: pandas.DataFrame) BiocFrame [source]¶
Create a
BiocFrame
from aDataFrame
object.- Parameters:
input – Input data.
- Returns:
A
BiocFrame
object.
- classmethod from_polars(input: polars.DataFrame) BiocFrame [source]¶
Create a
BiocFrame
from aDataFrame
object.- Parameters:
input – Input data.
- Returns:
A
BiocFrame
object.
- get_column(column: str | int) Any [source]¶
- Parameters:
column –
Name of the column, which must exist in
get_column_names
.Alternatively, the integer index of the column of interest.
- Returns:
The contents of the specified column.
- get_column_data(with_names: bool = True) None | BiocFrame [source]¶
- Parameters:
with_names – Whether to set the column names of this
BiocFrame
as the row names of the column dataBiocFrame
.- Returns:
The annotations for each column. This may be None if no annotation is present, or is a
BiocFrame
where each row corresponds to a column and contains that column’s metadata.
- get_row(row: str | int) dict [source]¶
- Parameters:
row –
Integer index of the row to access.
If row names are available (see
get_row_names
), a string may be supplied instead. The first occurrence of the string in the row names is used.- Returns:
A dictionary where the keys are column names and the values are the contents of the columns at the specified
row
.
- get_row_names() Names | None [source]¶
- Returns:
List of row names, or None if no row names are available.
- get_slice(rows: str | int | bool | Sequence, columns: str | int | bool | Sequence) BiocFrame [source]¶
Slice
BiocFrame
along the rows and/or columns, based on their indices or names.- Parameters:
rows –
Rows to be extracted. This may be an integer, boolean, string, or any sequence thereof, as supported by
normalize_subscript()
. Scalars are treated as length-1 sequences.Strings may only be used if row names are available (see
get_row_names
). The first occurrence of each string in the row names is used for extraction.columns – Columns to be extracted. This may be an integer, boolean, string, or any sequence thereof, as supported by
normalize_subscript()
. Scalars are treated as length-1 sequences.
- Returns:
A
BiocFrame
with the specified rows and columns.
- has_column(name: str) bool [source]¶
- Parameters:
name – Name of the column.
- Returns:
Whether a column with the specified
name
exists in this object.
- property index: Names | None¶
Alias to
get_row_names
, provided for compatibility with pandas.
- property metadata: dict¶
Alias for
get_metadata
.
- remove_column(column: int | str, in_place: bool = False) BiocFrame [source]¶
Remove a column. This is a convenience wrapper around
remove_columns
.- Parameters:
column – Name or positional index of the column to remove.
in_place – Whether to modify the object in place. Defaults to False.
- Returns:
A modified
BiocFrame
object, either as a copy of the original or as a reference to the (in-place-modified) original.
- remove_columns(columns: Sequence[int | str], in_place: bool = False) BiocFrame [source]¶
Remove any number of existing columns.
- Parameters:
columns – Names or indices of the columns to remove.
in_place – Whether to modify the object in place. Defaults to False.
- Returns:
A modified
BiocFrame
object, either as a copy of the original or as a reference to the (in-place-modified) original.- Return type:
- property row_names: Names | None¶
Alias for
get_row_names
.
- property rownames: Names | None¶
Alias for
get_row_names
, provided for back-compatibility.
- set_column(column: int | str, value: Any, in_place: bool = False) BiocFrame [source]¶
Modify an existing column or add a new column. This is a convenience wrapper around
set_columns
.- Parameters:
column – Name of an existing or new column. Alternatively, an index specifying the position of an existing column.
value – Value of the new column. This should have the same height as the number of rows in the current object.
in_place – Whether to modify the object in place.
- Returns:
A modified
BiocFrame
object, either as a copy of the original or as a reference to the (in-place-modified) original.
- set_column_data(column_data: None | BiocFrame, in_place: bool = False) BiocFrame [source]¶
- Parameters:
column_data – New column data. This should either be a
BiocFrame
with the numbero of rows equal to the number of columns in the current object, or None to remove existing column data.in_place – Whether to modify the
BiocFrame
object in place.
- Returns:
A modified
BiocFrame
object, either as a copy of the original or as a reference to the (in-place-modified) original.
- set_column_names(names: List[str], in_place: bool = False) BiocFrame [source]¶
- Parameters:
names – List of unique strings, of length equal to the number of columns in this
BiocFrame
.in_place – Whether to modify the
BiocFrame
object in place.
- Returns:
A modified
BiocFrame
object, either as a copy of the original or as a reference to the (in-place-modified) original.
- set_columns(columns: Dict[str, Any], in_place: bool = False) BiocFrame [source]¶
Modify existing columns or add new columns.
- Parameters:
columns – Contents of the columns to set. Keys may be strings containing new or existing column names, or integers containing the position of the column. Values should be the contents of each column.
in_place – Whether to modify the object in place. Defaults to False.
- Returns:
A modified
BiocFrame
object, either as a copy of the original or as a reference to the (in-place-modified) original.
- set_metadata(metadata: dict, in_place: bool = False) BiocFrame [source]¶
- Parameters:
metadata – New metadata for this object.
in_place – Whether to modify the
BiocFrame
object in place.
- Returns:
A modified
BiocFrame
object, either as a copy of the original or as a reference to the (in-place-modified) original.
- set_row_names(names: List | None, in_place: bool = False) BiocFrame [source]¶
- Parameters:
names – List of strings. This should have length equal to the number of rows in the current
BiocFrame
.in_place – Whether to modify the
BiocFrame
object in place.
- Returns:
A modified
BiocFrame
object, either as a copy of the original or as a reference to the (in-place-modified) original.
- set_slice(rows: int | str | bool | Sequence, columns: int | str | bool | Sequence, value: BiocFrame, in_place: bool = True) BiocFrame [source]¶
Replace a slice of the
BiocFrame
given the row and columns of the slice.- Parameters:
rows –
Rows to be replaced. This may be any sequence of strings, integers, or booleans (or mixture thereof), as supported by
normalize_subscript()
. Scalars are treated as length-1 sequences.Strings may only be used if row names are available (see
get_row_names
). The first occurrence of each string in the row names is used for extraction.columns – Columns to be replaced. This may be any sequence of strings, integers, or booleans (or mixture thereof), as supported by
normalize_subscript()
. Scalars are treated as length-1 sequences.value – A
BiocFrame
containing replacement values. Each row corresponds to a row inrows
, while each column corresponds to a column incolumns
. Note that the replacement is based on position, so row and column names invalue
are ignored.in_place – Whether to modify the
BiocFrame
object in place.
- Returns:
A modified
BiocFrame
object, either as a copy of the original or as a reference to the (in-place-modified) original.
- property shape: Tuple[int, int]¶
Returns: Tuple containing the number of rows and columns in this
BiocFrame
.
- slice(rows: Sequence, columns: Sequence) BiocFrame [source]¶
Alias for
__getitem__
, for back-compatibility.
- split(column_name: str, only_indices: bool = False) Dict[str, BiocFrame | List[int]] [source]¶
Split the object by a column.
- Parameters:
column_name – Name of the column to split by.
only_indices – Whether to only return indices. Defaults to False
- Returns:
A dictionary of biocframe objects, with names representing the group and the value the sliced frames.
if
only_indices
is True, the values contain the row indices that map to the same group.
- class biocframe.BiocFrame.BiocFrameIter(obj: BiocFrame)[source]¶
Bases:
object
An iterator to a
BiocFrame
object.- Parameters:
obj (BiocFrame) – Source object to iterate.
- biocframe.BiocFrame.merge(x: Sequence[BiocFrame], by: None | str | Sequence = None, join: Literal['inner', 'left', 'right', 'outer'] = 'left', rename_duplicate_columns: bool = False) BiocFrame [source]¶
Merge multiple
BiocFrame`
objects together by common columns or row names, yielding a combined object with a union of columns across all objects.- Parameters:
x – Sequence of
BiocFrame
objects. Each object may have any number and identity of rows and columns.by –
If string, the name of column containing the keys. Each entry of
x
is assumed to have this column.If integer, the index of column containing the keys. The same index is used for each entry of
x
.If None, keys are assumed to be present in the row names.
Alternatively a sequence of strings, integers or None, specifying the location of the keys in each entry of
x
.join – Strategy for the merge. For left and right joins, we consider the keys for the first and last object in
x
, respectively.rename_duplicate_columns – Whether duplicated non-key columns across
x
should be automatically renamed in the merged object. If False, an error is raised instead.
- Returns:
A BiocFrame containing the merged contents.
If
by = None
, the keys are stored in the row names.If
by
is a string, keys are stored in the column of the same name.If
by
is a sequence, keys are stored in the row names ifby[0] = None
, otherwise they are stored in the column namedby[0]
.- Return type:
- biocframe.BiocFrame.relaxed_combine_columns(*x: BiocFrame) BiocFrame [source]¶
Wrapper around
merge()
that performs a left join on the row names.
- biocframe.BiocFrame.relaxed_combine_rows(*x: BiocFrame) BiocFrame [source]¶
A relaxed version of the
combine_rows()
method forBiocFrame
objects. Whereascombine_rows
expects that all objects have the same columns,relaxed_combine_rows
allows for different columns. Absent columns in any object are filled in with appropriate placeholder values before combining.- Parameters:
x – One or more
BiocFrame
objects, possibly with differences in the number and identity of their columns.- Returns:
A
BiocFrame
that combines allx
along their rows and contains the union of all columns. Columns absent in anyx
are filled in with placeholders consisting of Nones or masked NumPy values.