delayedarray package

Submodules

delayedarray.BinaryIsometricOp module

class delayedarray.BinaryIsometricOp.BinaryIsometricOp(left, right, operation)[source]

Bases: DelayedOp

Binary isometric operation involving two n-dimensional seed arrays with the same dimension extents. This is based on Bioconductor’s DelayedArray::DelayedNaryIsoOp class.

The data type of the result is determined by NumPy casting given the seed and value data types. It is probably safest to cast at least one array to floating-point to avoid problems due to integer overflow.

This class is intended for developers to construct new DelayedArray instances. In general, end users should not be interacting with BinaryIsometricOp objects directly.

__init__(left, right, operation)[source]
Parameters:
  • left – Any object satisfying the seed contract, see DelayedArray() for details.

  • right – Any object of the same dimensions as left that satisfies the seed contract, see DelayedArray() for details.

  • operation (Literal['add', 'subtract', 'multiply', 'divide', 'remainder', 'floor_divide', 'power', 'equal', 'greater_equal', 'greater', 'less_equal', 'less', 'not_equal', 'logical_and', 'logical_or', 'logical_xor']) – String specifying the operation.

property dtype: dtype

Returns: NumPy type for the data after the operation. This may or may not be the same as the left or right objects, depending on how NumPy does the casting for the requested operation.

property left

Returns: The seed object on the left-hand-side of the operation.

property operation: Literal['add', 'subtract', 'multiply', 'divide', 'remainder', 'floor_divide', 'power', 'equal', 'greater_equal', 'greater', 'less_equal', 'less', 'not_equal', 'logical_and', 'logical_or', 'logical_xor']

Returns: Name of the operation.

property right

Returns: The seed object on the right-hand-side of the operation.

property shape: Tuple[int, ...]

Returns: Tuple of integers specifying the extent of each dimension of this object. As the name of the class suggests, this is the same as the shapes of the left and right objects.

delayedarray.BinaryIsometricOp.chunk_grid_BinaryIsometricOp(x)[source]

See chunk_grid().

delayedarray.BinaryIsometricOp.create_dask_array_BinaryIsometricOp(x)[source]

See create_dask_array().

delayedarray.BinaryIsometricOp.extract_dense_array_BinaryIsometricOp(x, subset)[source]

See extract_dense_array().

delayedarray.BinaryIsometricOp.extract_sparse_array_BinaryIsometricOp(x, subset)[source]

See extract_sparse_array().

delayedarray.BinaryIsometricOp.is_masked_BinaryIsometricOp(x)[source]

See is_masked().

delayedarray.BinaryIsometricOp.is_sparse_BinaryIsometricOp(x)[source]

See is_sparse().

delayedarray.Cast module

class delayedarray.Cast.Cast(seed, dtype)[source]

Bases: DelayedOp

Delayed cast to a different NumPy type. This is most useful for promoting integer matrices to floating point to avoid problems with integer overflow in arithmetic operations.

This class is intended for developers to construct new DelayedArray instances. End users should not be interacting with Cast objects directly.

__init__(seed, dtype)[source]
Parameters:
  • seed – Any object that satisfies the seed contract, see DelayedArray for details.

  • dtype (dtype) – The desired type.

property dtype: dtype

Returns: NumPy type for the contents after casting.

property seed

Returns: The seed object.

property shape: Tuple[int, ...]

Returns: Tuple of integers specifying the extent of each dimension of this object. This is the same as the seed object.

delayedarray.Cast.chunk_grid_Cast(x)[source]

See chunk_grid().

delayedarray.Cast.create_dask_array_Cast(x)[source]

See create_dask_array().

delayedarray.Cast.extract_dense_array_Cast(x, subset=None)[source]

See extract_dense_array().

delayedarray.Cast.extract_sparse_array_Cast(x, subset=None)[source]

See extract_sparse_array().

delayedarray.Cast.is_masked_Cast(x)[source]

See is_masked().

delayedarray.Cast.is_sparse_Cast(x)[source]

See is_sparse().

delayedarray.Combine module

class delayedarray.Combine.Combine(seeds, along)[source]

Bases: DelayedOp

Delayed combine operation, based on Bioconductor’s DelayedArray::DelayedAbind class.

This will combine multiple arrays along a specified dimension, provided the extents of all other dimensions are the same.

This class is intended for developers to construct new DelayedArray instances. In general, end users should not be interacting with Combine objects directly.

__init__(seeds, along)[source]
Parameters:
  • seeds (list) – List of objects that satisfy the seed contract, see DelayedArray for details.

  • along (int) – Dimension along which the seeds are to be combined.

property along: int

Returns: Dimension along which the seeds are combined.

property dtype: dtype

Returns: NumPy type for the combined data. This may or may not be the same as those in seeds, depending on casting rules.

property seeds: list

Returns: List of seed objects to be combined.

property shape: Tuple[int, ...]

Returns: Tuple of integers specifying the extent of each dimension of the object after seeds were combined along the specified dimension.

delayedarray.Combine.chunk_grid_Combine(x)[source]

See chunk_grid().

delayedarray.Combine.create_dask_array_Combine(x)[source]

See create_dask_array().

delayedarray.Combine.extract_dense_array_Combine(x, subset)[source]

See extract_dense_array().

delayedarray.Combine.extract_sparse_array_Combine(x, subset)[source]

See extract_sparse_array().

delayedarray.Combine.is_masked_Combine(x)[source]

See is_masked().

delayedarray.Combine.is_sparse_Combine(x)[source]

See is_sparse().

delayedarray.DelayedArray module

class delayedarray.DelayedArray.DelayedArray(seed)[source]

Bases: object

Array containing delayed operations. This is equivalent to the class of the same name from the R/Bioconductor package of the same name. It allows users to efficiently operate on large matrices without actually evaluating the operation or creating new copies; instead, the operations will transparently return another DelayedArray instance containing the delayed operations, which can be realized by calling array() or related methods.

Any object that satisfies the “seed contract” can be wrapped by a DelayedArray. Specifically, a seed should have:

If the seed contains sparse data, it should also implement:

Optionally, a seed class may have:

  • A method for the create_dask_array() generic, if the seed is not already compatible with the dask package.

  • a method for the wrap() generic, to create a DelayedArray subclass that is specific to this seed class.

property T: DelayedArray

Returns: A DelayedArray containing the delayed transpose.

__abs__()[source]

Take the absolute value of the contents of a DelayedArray.

Return type:

DelayedArray

Returns:

A DelayedArray containing the delayed absolute value operation.

__add__(other)[source]

Add something to the right-hand-side of a DelayedArray.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Return type:

DelayedArray

Returns:

A DelayedArray containing the delayed addition operation.

__and__(other)[source]

Element-wise AND with something.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Return type:

DelayedArray

Returns:

A DelayedArray containing the delayed AND operation.

__array__(dtype=None, copy=True)[source]

Convert a DelayedArray to a NumPy array, to be used by array().

Parameters:
  • dtype (Optional[dtype]) – The desired NumPy type of the output array. If None, the type of the seed is used.

  • copy (bool) – Currently ignored. The output is never a reference to the underlying seed, even if the seed is another NumPy array.

Return type:

ndarray

Returns:

NumPy array of the same type as dtype and shape as shape.

__array_function__(func, types, args, kwargs)[source]

Interface to NumPy’s high-level array functions. This is used to implement array operations like NumPy’s concatenate(),

Check out NumPy’s __array_function__ documentation for more details.

Return type:

DelayedArray

Returns:

A DelayedArray instance containing the requested delayed operation.

__array_priority__ = 16
__array_ufunc__(ufunc, method, *inputs, **kwargs)[source]

Interface with NumPy array methods. This is used to implement mathematical operations like NumPy’s log(), or to override operations between NumPy class instances and DelayedArray objects where the former is on the left hand side.

Check out NumPy’s __array_ufunc__ documentation for more details.

Return type:

DelayedArray

Returns:

A DelayedArray instance containing the requested delayed operation.

__eq__(other)[source]

Check for equality between a DelayedArray and something.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Return type:

DelayedArray

Returns:

A DelayedArray containing the delayed check.

__floordiv__(other)[source]

Divide a DelayedArray by something and take the floor.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Returns:

A DelayedArray containing the delayed floor division operation.

__ge__(other)[source]

Check whether a DelayedArray is greater than or equal to something.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Return type:

DelayedArray

Returns:

A DelayedArray containing the delayed check.

__getitem__(subset)[source]

Take a subset of this DelayedArray. This follows the same logic as NumPy slicing and will generate a Subset object when the subset operation preserves the dimensionality of the seed, i.e., args is defined using the ix_() function.

Parameters:

subset (Tuple[Union[slice, Sequence], ...]) – A tuple of length equal to the dimensionality of this DelayedArray. We attempt to support most types of NumPy slicing; however, only subsets that preserve dimensionality will generate a delayed subset operation.

Return type:

Union[DelayedArray, ndarray]

Returns:

If the dimensionality is preserved by subset, a DelayedArray containing a delayed subset operation is returned. Otherwise, a ndarray is returned containing the realized subset.

__gt__(other)[source]

Check whether a DelayedArray is greater than something.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Return type:

DelayedArray

Returns:

A DelayedArray containing the delayed check.

__hash__ = None
__init__(seed)[source]

Most users should use wrap() instead, as this can be specialized by developers to construct subclasses that are optimized for custom seed types.

Parameters:

seed – Any array-like object that satisfies the seed contract.

__le__(other)[source]

Check whether a DelayedArray is less than or equal to something.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Return type:

DelayedArray

Returns:

A DelayedArray containing the delayed check.

__lt__(other)[source]

Check whether a DelayedArray is less than something.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Return type:

DelayedArray

Returns:

A DelayedArray containing the delayed check.

__mod__(other)[source]

Take the remainder after dividing a DelayedArray by something.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray object of the same dimensions as shape.

Returns:

A DelayedArray containing the delayed modulo operation.

__mul__(other)[source]

Multiply a DelayedArray with something on the right hand side.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Returns:

A DelayedArray containing the delayed multiplication operation.

__ne__(other)[source]

Check for non-equality between a DelayedArray and something.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Return type:

DelayedArray

Returns:

A DelayedArray containing the delayed check.

__neg__()[source]

Negate the contents of a DelayedArray.

Return type:

DelayedArray

Returns:

A DelayedArray containing the delayed negation.

__or__(other)[source]

Element-wise OR with something.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Return type:

DelayedArray

Returns:

A DelayedArray containing the delayed OR operation.

__pow__(other)[source]

Raise a DelayedArray to the power of something.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Returns:

A DelayedArray containing the delayed power operation.

__radd__(other)[source]

Add something to the left-hand-side of a DelayedArray.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Return type:

DelayedArray

Returns:

A DelayedArray containing the delayed addition operation.

__rand__(other)[source]

Element-wise AND with the right-hand-side of a DelayedArray.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Return type:

DelayedArray

Returns:

A DelayedArray containing the delayed AND operation.

__repr__()[source]

Pretty-print this DelayedArray. This uses array2string() and responds to all of its options.

Return type:

str

Returns:

String containing a prettified display of the array contents.

__req__(other)[source]

Check for equality between something and a DelayedArray.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Return type:

DelayedArray

Returns:

A DelayedArray containing the delayed check.

__rfloordiv__(other)[source]

Divide something by a DelayedArray and take the floor.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Returns:

A DelayedArray containing the delayed floor division operation.

__rge__(other)[source]

Check whether something is greater than or equal to a DelayedArray.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Return type:

DelayedArray

Returns:

A DelayedArray containing the delayed check.

__rgt__(other)[source]

Check whether something is greater than a DelayedArray.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Return type:

DelayedArray

Returns:

A DelayedArray containing the delayed check.

__rle__(other)[source]

Check whether something is greater than or equal to a DelayedArray.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Return type:

DelayedArray

Returns:

A DelayedArray containing the delayed check.

__rlt__(other)[source]

Check whether something is less than a DelayedArray.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Return type:

DelayedArray

Returns:

A DelayedArray containing the delayed check.

__rmod__(other)[source]

Take the remainder after dividing something by a DelayedArray.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Returns:

A DelayedArray containing the delayed modulo operation.

__rmul__(other)[source]

Multiply a DelayedArray with something on the left hand side.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Returns:

A DelayedArray containing the delayed multiplication operation.

__rne__(other)[source]

Check for non-equality between something and a DelayedArray.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Return type:

DelayedArray

Returns:

A DelayedArray containing the delayed check.

__ror__(other)[source]

Element-wise OR with the right-hand-side of a DelayedArray.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Return type:

DelayedArray

Returns:

A DelayedArray containing the delayed OR operation.

__rpow__(other)[source]

Raise something to the power of the contents of a DelayedArray.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Returns:

A DelayedArray containing the delayed power operation.

__rsub__(other)[source]

Subtract a DelayedArray from something else.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Returns:

A DelayedArray containing the delayed subtraction operation.

__rtruediv__(other)[source]

Divide something by a DelayedArray.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Returns:

A DelayedArray containing the delayed division operation.

__sub__(other)[source]

Subtract something from the right-hand-side of a DelayedArray.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Return type:

DelayedArray

Returns:

A DelayedArray containing the delayed subtraction operation.

__truediv__(other)[source]

Divide a DelayedArray by something.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Returns:

A DelayedArray containing the delayed division operation.

all(axis=None, dtype=None, buffer_size=100000000.0)[source]

Test whether all array elements along a given axis evaluate to True.

Compute this test across the DelayedArray, possibly over a given axis or set of axes. If the seed has a all() method, that method is called directly with the supplied arguments.

Parameters:
  • axis (Union[int, Tuple[int, ...], None]) – A single integer specifying the axis over which to test for all. Alternatively, a tuple (multiple axes) or None (no axes), see all() for details.

  • dtype (Optional[dtype]) – NumPy type for the output array. If None, this is automatically chosen based on the type of the DelayedArray, see all() for details.

  • buffer_size (int) – Buffer size in bytes to use for block processing. Larger values generally improve speed at the cost of memory.

Return type:

ndarray

Returns:

A NumPy array containing the boolean values. If axis = None, this will be a NumPy scalar instead.

any(axis=None, dtype=None, buffer_size=100000000.0)[source]

Test whether any array element along a given axis evaluates to True.

Compute this test across the DelayedArray, possibly over a given axis or set of axes. If the seed has a any() method, that method is called directly with the supplied arguments.

Parameters:
  • axis (Union[int, Tuple[int, ...], None]) – A single integer specifying the axis over which to test for any. Alternatively, a tuple (multiple axes) or None (no axes), see any() for details.

  • dtype (Optional[dtype]) – NumPy type for the output array. If None, this is automatically chosen based on the type of the DelayedArray, see any() for details.

  • buffer_size (int) – Buffer size in bytes to use for block processing. Larger values generally improve speed at the cost of memory.

Return type:

ndarray

Returns:

A NumPy array containing the boolean values. If axis = None, this will be a NumPy scalar instead.

astype(dtype, **kwargs)[source]

See astype() for details.

All keyword arguments are currently ignored.

property dtype: dtype

Returns: NumPy type of the elements in the DelayedArray.

mean(axis=None, dtype=None, buffer_size=100000000.0)[source]

Take the mean of values across the DelayedArray, possibly over a given axis or set of axes. If the seed has a mean() method, that method is called directly with the supplied arguments.

Parameters:
  • axis (Union[int, Tuple[int, ...], None]) – A single integer specifying the axis over which to calculate the mean. Alternatively, a tuple (multiple axes) or None (no axes), see mean() for details.

  • dtype (Optional[dtype]) – NumPy type for the output array. If None, this is automatically chosen based on the type of the DelayedArray, see mean() for details.

  • buffer_size (int) – Buffer size in bytes to use for block processing. Larger values generally improve speed at the cost of memory.

Return type:

ndarray

Returns:

A NumPy array containing the means. If axis = None, this will be a NumPy scalar instead.

property seed

Returns: The seed object.

property shape: Tuple[int, ...]

Returns: Tuple of integers specifying the extent of each dimension of the DelayedArray.

sum(axis=None, dtype=None, buffer_size=100000000.0)[source]

Take the sum of values across the DelayedArray, possibly over a given axis or set of axes. If the seed has a sum() method, that method is called directly with the supplied arguments.

Parameters:
  • axis (Union[int, Tuple[int, ...], None]) – A single integer specifying the axis over which to calculate the sum. Alternatively, a tuple (multiple axes) or None (no axes), see sum() for details.

  • dtype (Optional[dtype]) – NumPy type for the output array. If None, this is automatically chosen based on the type of the DelayedArray, see sum() for details.

  • buffer_size (int) – Buffer size in bytes to use for block processing. Larger values generally improve speed at the cost of memory.

Return type:

ndarray

Returns:

A NumPy array containing the sums. If axis = None, this will be a NumPy scalar instead.

var(axis=None, dtype=None, ddof=0, buffer_size=100000000.0)[source]

Take the variances of values across the DelayedArray, possibly over a given axis or set of axes. If the seed has a var() method, that method is called directly with the supplied arguments.

Parameters:
  • axis (Union[int, Tuple[int, ...], None]) – A single integer specifying the axis over which to calculate the variance. Alternatively, a tuple (multiple axes) or None (no axes), see var() for details.

  • dtype (Optional[dtype]) – NumPy type for the output array. If None, this is automatically chosen based on the type of the DelayedArray, see var() for details.

  • ddof (int) – Delta in the degrees of freedom to subtract from the denominator. Typically set to 1 to obtain the sample variance.

  • buffer_size (int) – Buffer size in bytes to use for block processing. Larger values generally improve speed at the cost of memory.

Return type:

ndarray

Returns:

A NumPy array containing the variances. If axis = None, this will be a NumPy scalar instead.

delayedarray.DelayedArray.chunk_grid_DelayedArray(x)[source]

See chunk_grid().

delayedarray.DelayedArray.create_dask_array_DelayedArray(x)[source]

See create_dask_array().

delayedarray.DelayedArray.extract_dense_array_DelayedArray(x, subset)[source]

See extract_dense_array().

Return type:

ndarray

delayedarray.DelayedArray.extract_sparse_array_DelayedArray(x, subset)[source]

See extract_sparse_array().

Return type:

SparseNdarray

delayedarray.DelayedArray.is_masked_DelayedArray(x)[source]

See is_masked().

delayedarray.DelayedArray.is_sparse_DelayedArray(x)[source]

See is_sparse().

delayedarray.DelayedOp module

class delayedarray.DelayedOp.DelayedOp[source]

Bases: object

Abstract delayed operation class. This is used to distinguish delayed operations from seed classes, e.g., for use in is_pristine().

__init__()[source]

delayedarray.Grid module

class delayedarray.Grid.AbstractGrid[source]

Bases: ABC

Abstract base class for array grids. Each grid subdivides an array to determine how it should be iterated over; this is useful for ensuring that iteration respects the physical layout of an array.

Subclasses should define the shape, boundaries and cost properties, as well as the subset, transpose and iterate methods; see the SimpleGrid and CompositeGrid subclasses for examples.

__abstractmethods__ = frozenset({'boundaries', 'cost', 'iterate', 'shape', 'subset', 'transpose'})
abstract property boundaries: Tuple[Sequence[int], ...]
abstract property cost: int
abstract iterate(dimensions, buffer_elements=1000000.0)[source]
Return type:

Generator[Tuple, None, None]

abstract property shape: Tuple[int, ...]
abstract subset(subset)[source]
Return type:

AbstractGrid

abstract transpose(perm)[source]
Return type:

AbstractGrid

class delayedarray.Grid.CompositeGrid(components, along, internals=None)[source]

Bases: AbstractGrid

A grid to subdivide an array, constructed by combining component grids along a specified dimension. This aims to mirror the same combining operation for the arrays associated with the component grids.

__abstractmethods__ = frozenset({})
__init__(components, along, internals=None)[source]
Parameters:
  • components (Tuple[AbstractGrid, ...]) – Component grids to be combined to form the composite grid. Each grid should have the same dimension extents, except for the along dimension.

  • along (int) – Dimension over which to combine entries of components.

  • internals (Optional[Dict]) – Internal use only.

property boundaries: Tuple[Sequence[int], ...]

Returns: Boundaries on each dimension of the grid. For the along dimension, this is a concatenation of the boundaries for the component grids. For all other dimensions, the boundaries are set to those of the most costly component grid.

property cost: float

Returns: Cost of iteration over the underlying array. This is defined as the sum of the costs of the component arrays.

iterate(dimensions, buffer_elements=1000000.0)[source]

Iterate over an array grid. This assembles blocks of contiguous grid intervals to reduce the number of iterations (and associated overhead) at the cost of increased memory usage during data extraction. For any iteration over the along dimension (i.e., along is in dimensions), this function dispatches to the component grids; otherwise the iteration is performed based on boundaries().

Parameters:
  • dimensions (Tuple[int, ...]) – Dimensions over which to perform the iteration. Any dimensions not listed here are extracted in their entirety, i.e., each block consists of the full extent of unlisted dimensions.

  • buffer_elements (int) – Total number of elements in each block. Larger values increase the block size and reduce the number of iterations, at the cost of increased memory usage at each iteration.

Return type:

Generator[Tuple, None, None]

Returns:

A generator that returns a tuple of length equal to the number of dimensions. Each element contains the start and end of the block on its corresponding dimension.

property shape: Tuple[int, ...]

Returns: Shape of the grid, equivalent to the array’s shape.

subset(subset)[source]

Subset a grid to reflect the same operation on the associated array. This splits up the subset sequence for the along dimension and distributes it to each of the component grids.

Parameters:

subset (Tuple[Sequence[int], ...]) – Tuple of length equal to the number of grid dimensions. Each entry should be a (possibly unsorted) sequence of integers, specifying the subset to apply to each dimension of the grid.

Return type:

CompositeGrid

Returns:

A new CompositeGrid object.

transpose(perm)[source]

Transpose a grid to reflect the same operation on the associated array.

Parameters:

perm (Tuple[int, ...]) – Tuple of length equal to the dimensionality of the array, containing the permutation of dimensions.

Return type:

CompositeGrid

Returns:

A new CompositeGrid object.

class delayedarray.Grid.SimpleGrid(boundaries, cost_factor, internals=None)[source]

Bases: AbstractGrid

A simple grid to subdivide an array, involving arbitrary boundaries on each dimension. Each grid element is defined by boundaries on each dimension.

__abstractmethods__ = frozenset({})
__init__(boundaries, cost_factor, internals=None)[source]
Parameters:
  • boundaries (Tuple[Sequence[int], ...]) – Tuple of length equal to the number of dimensions. Each entry should be a strictly increasing sequence of integers specifying the position of the grid boundaries; the last element should be equal to the extent of the dimension for the array. A tuple entry may also be an empty list for a zero-extent dimension.

  • cost_factor (float) – Positive number representing the cost of iteration over each element of the grid’s array. The actual cost is defined by the product of the cost factor by the array size. This is used to choose between iteration schemes; as a reference, extraction from an in-memory NumPy array has a cost factor of 1.

  • internals (Optional[Dict]) – Internal use only.

property boundaries: Tuple[Sequence[int], ...]

Returns: Boundaries on each dimension of the grid.

property cost: float

Returns: Cost of iteration over the underlying array.

iterate(dimensions, buffer_elements=1000000.0)[source]

Iterate over an array grid. This assembles blocks of contiguous grid intervals to reduce the number of iterations (and associated overhead) at the cost of increased memory usage during data extraction.

Parameters:
  • dimensions (Tuple[int, ...]) – Dimensions over which to perform the iteration. Any dimensions not listed here are extracted in their entirety, i.e., each block consists of the full extent of unlisted dimensions.

  • buffer_elements (int) – Total number of elements in each block. Larger values increase the block size and reduce the number of iterations, at the cost of increased memory usage at each iteration.

Return type:

Generator[Tuple, None, None]

Returns:

A generator that returns a tuple of length equal to the number of dimensions. Each element contains the start and end of the block on its corresponding dimension.

property shape: Tuple[int, ...]

Returns: Shape of the grid, equivalent to the array’s shape.

subset(subset)[source]

Subset a grid to reflect the same operation on the associated array. For any given dimension, consecutive elements in the subset are only placed in the same grid interval in the subsetted grid if they belong to the same grid interval in the original grid.

Parameters:

subset (Tuple[Sequence[int], ...]) – Tuple of length equal to the number of grid dimensions. Each entry should be a (possibly unsorted) sequence of integers, specifying the subset to apply to each dimension of the grid.

Return type:

SimpleGrid

Returns:

A new SimpleGrid object.

transpose(perm)[source]

Transpose a grid to reflect the same operation on the associated array.

Parameters:

perm (Tuple[int, ...]) – Tuple of length equal to the dimensionality of the array, containing the permutation of dimensions.

Return type:

SimpleGrid

Returns:

A new SimpleGrid object.

delayedarray.RegularTicks module

class delayedarray.RegularTicks.RegularTicks(spacing, final)[source]

Bases: Sequence

Regular ticks of equal spacing until a limit is reached, at which point the sequence terminates at that limit. This is intended for use as grid boundaries in SimpleGrid, where the last element of the boundary sequence needs to be equal to the grid extent. (We do not use range as it may omit the last element if the extent is not a multiple of the spacing.)

__abstractmethods__ = frozenset({})
__getitem__(i)[source]
Parameters:

i (int) – Index of the tick of interest.

Return type:

int

Returns:

Position of tick i.

__init__(spacing, final)[source]
Parameters:
  • spacing (int) – Positive integer specifying the spacing between ticks.

  • final (int) – Position of the final tick, should be non-negative.

__len__()[source]
Return type:

int

Returns:

Length of the tick sequence.

property final: int

Returns: Position of the final tick.

property spacing: int

Returns: The spacing between ticks.

delayedarray.Round module

class delayedarray.Round.Round(seed, decimals)[source]

Bases: DelayedOp

Delayed rounding from round(). This is very similar to UnaryIsometricOpSimple but accepts an argument for the number of decimal places.

This class is intended for developers to construct new DelayedArray instances. End users should not be interacting with Round objects directly.

__init__(seed, decimals)[source]
Parameters:
  • seed – Any object that satisfies the seed contract, see DelayedArray for details.

  • decimals (int) – Number of decimal places, possibly negative.

property decimals: int

Returns: Number of decimal places to round to.

property dtype: dtype

Returns: NumPy type for the Round, same as the seed array.

property seed

Returns: The seed object.

property shape: Tuple[int, ...]

Returns: Tuple of integers specifying the extent of each dimension of the Round object. This is the same as the seed array.

delayedarray.Round.chunk_grid_Round(x)[source]

See chunk_grid().

delayedarray.Round.create_dask_array_Round(x)[source]

See create_dask_array().

delayedarray.Round.extract_dense_array_Round(x, subset)[source]

See extract_dense_array().

Return type:

ndarray

delayedarray.Round.extract_sparse_array_Round(x, subset)[source]

See extract_sparse_array().

Return type:

SparseNdarray

delayedarray.Round.is_masked_Round(x)[source]

See is_masked().

delayedarray.Round.is_sparse_Round(x)[source]

See is_sparse().

delayedarray.SparseNdarray module

class delayedarray.SparseNdarray.SparseNdarray(shape, contents, dtype=None, index_dtype=None, is_masked=None, check=True)[source]

Bases: object

The SparseNdarray, as its name suggests, is a sparse n-dimensional array. It is inspired by the SVT_Array class from the DelayedArray R/Bioconductor package.

Internally, the SparseNdarray is represented as a nested list where each nesting level corresponds to a dimension in reverse order, i.e., the outer-most list corresponds to elements of the last dimension in shape. The list at each level has length equal to the extent of its dimension, where each entry contains another list representing the contents of the corresponding element of that dimension. This recursion continues until the second dimension (i.e., the penultimate nesting level), where each entry instead contains (index, value) tuples. In effect, this is a tree where the non-leaf nodes are lists and the leaf nodes are tuples.

Each (index, value) tuple represents a sparse vector for the corresponding element of the first dimension of the SparseNdarray. index should be a ndarray of integers where entries are strictly increasing and less than the extent of the first dimension. All index objects in the same SparseNdarray should have the same dtype (defined by the index_dtype property). value may be any numeric/boolean ndarray but the dtype should be consistent across all value objects in the SparseNdarray. If the array contains masked values, all value objects should be a MaskedArray, otherwise they should be regular NumPy arrays.

Any entry of any (nested) list may also be None, indicating that the corresponding element of the dimension contains no non-zero values. In fact, the entire tree may be None, indicating that there are no non-zero values in the entire array.

For 1-dimensional arrays, the contents should be a single (index, value) tuple containing the sparse contents. This may also be None if there are no non-zero values in the array.

property T: SparseNdarray

Returns: A SparseNdarray containing the transposed contents.

__abs__()[source]

Take the absolute value of the contents of a SparseNdarray.

Returns:

A SparseNdarray containing the delayed absolute value operation.

__add__(other)[source]

Add something to the right-hand-side of a SparseNdarray.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or any seed object of the same dimensions as shape.

Return type:

Union[SparseNdarray, ndarray]

Returns:

Array containing the result of the addition. This may or may not be sparse depending on other.

__and__(other)[source]

Element-wise AND with something.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Return type:

Union[SparseNdarray, ndarray]

Returns:

Array containing the result of the check. This may or may not be sparse depending on other.

__array__(dtype=None, copy=True)[source]

Convert a SparseNdarray to a NumPy array.

Parameters:
  • dtype (Optional[dtype]) – The desired NumPy type of the output array. If None, the type of the seed is used.

  • copy (bool) – Currently ignored. The output is never a reference to the underlying seed, even if the seed is another NumPy array.

Return type:

ndarray

Returns:

Dense array of the same type as dtype and shape as shape.

__array_function__(func, types, args, kwargs)[source]

Interface to NumPy’s high-level array functions. This is used to implement array operations like NumPy’s concatenate(),

Check out NumPy’s __array_function__ documentation for more details.

Return type:

SparseNdarray

Returns:

A SparseNdarray instance containing the requested operation.

__array_ufunc__(ufunc, method, *inputs, **kwargs)[source]

Interface with NumPy array methods. This is used to implement mathematical operations like NumPy’s log(), or to override operations between NumPy class instances and SparseNdarray objects where the former is on the left hand side.

Check out NumPy’s __array_ufunc__ documentation for more details.

Return type:

SparseNdarray

Returns:

A SparseNdarray instance containing the requested delayed operation.

__copy__()[source]
Return type:

SparseNdarray

Returns:

A deep copy of this object.

__eq__(other)[source]

Check for equality between a SparseNdarray and something.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or any seed object of the same dimensions as shape.

Return type:

Union[SparseNdarray, ndarray]

Returns:

Array containing the result of the check. This may or may not be sparse depending on other.

__floordiv__(other)[source]

Divide a SparseNdarray by something and take the floor.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or any seed object of the same dimensions as shape.

Returns:

Array containing the result of the floor division. This may or may not be sparse depending on other.

__ge__(other)[source]

Check whether a SparseNdarray is greater than or equal to something.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or any seed object of the same dimensions as shape.

Return type:

Union[SparseNdarray, ndarray]

Returns:

Array containing the result of the check. This may or may not be sparse depending on other.

__getitem__(subset)[source]

Take a subset of this SparseNdarray. This follows the same logic as NumPy slicing and will generate a Subset object when the subset operation preserves the dimensionality of the seed, i.e., args is defined using the ix_() function.

Parameters:

args – A tuple of length equal to the dimensionality of this SparseNdarray. Any NumPy slicing is supported but only subsets that preserve dimensionality will generate a delayed subset operation.

Raises:

ValueError – If args contain more dimensions than the shape of the array.

Return type:

Union[SparseNdarray, ndarray]

Returns:

If the dimensionality is preserved by args, a SparseNdarray containing a delayed subset operation is returned. Otherwise, a ndarray is returned containing the realized subset.

__gt__(other)[source]

Check whether a SparseNdarray is greater than something.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or any seed object of the same dimensions as shape.

Return type:

Union[SparseNdarray, ndarray]

Returns:

Array containing the result of the check. This may or may not be sparse depending on other.

__hash__ = None
__init__(shape, contents, dtype=None, index_dtype=None, is_masked=None, check=True)[source]
Parameters:
  • shape (Tuple[int, ...]) – Tuple specifying the dimensions of the array.

  • contents

    For n-dimensional arrays where n > 1, a nested list representing a tree where each leaf node is a tuple containing a sparse vector (or None).

    For 1-dimensional arrays, a tuple containing a sparse vector.

    Alternatively None, if the array is empty.

  • dtype (Optional[dtype]) – NumPy type of the array values. If None, this is inferred from contents.

  • index_dtype (Optional[dtype]) – NumPy type of the array indices. If None, this is inferred from contents.

  • is_masked (Optional[bool]) – Whether contents contains masked values. If None, this is inferred from contents.

  • check (bool) – Whether to check the consistency of the contents during construction. This can be set to False for speed.

__le__(other)[source]

Check whether a SparseNdarray is less than or equal to something.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or any seed object of the same dimensions as shape.

Return type:

Union[SparseNdarray, ndarray]

Returns:

Array containing the result of the check. This may or may not be sparse depending on other.

__lt__(other)[source]

Check whether a SparseNdarray is less than something.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or any seed object of the same dimensions as shape.

Return type:

Union[SparseNdarray, ndarray]

Returns:

Array containing the result of the check. This may or may not be sparse depending on other.

__mod__(other)[source]

Take the remainder after dividing a SparseNdarray by something.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or any seed object of the same dimensions as shape.

Returns:

Array containing the result of the modulo. This may or may not be sparse depending on other.

__mul__(other)[source]

Multiply a SparseNdarray with something on the right hand side.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or any seed object of the same dimensions as shape.

Returns:

Array containing the result of the multiplication. This may or may not be sparse depending on other.

__ne__(other)[source]

Check for non-equality between a SparseNdarray and something.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or any seed object of the same dimensions as shape.

Return type:

Union[SparseNdarray, ndarray]

Returns:

Array containing the result of the check. This may or may not be sparse depending on other.

__neg__()[source]

Negate the contents of a SparseNdarray.

Returns:

A SparseNdarray containing the delayed negation.

__or__(other)[source]

Element-wise OR with something.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Return type:

Union[SparseNdarray, ndarray]

Returns:

Array containing the result of the check. This may or may not be sparse depending on other.

__pow__(other)[source]

Raise a SparseNdarray to the power of something.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or any seed object of the same dimensions as shape.

Returns:

Array containing the result of the power operation. This may or may not be sparse depending on other.

__radd__(other)[source]

Add something to the left-hand-side of a SparseNdarray.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or any seed object of the same dimensions as shape.

Return type:

Union[SparseNdarray, ndarray]

Returns:

Array containing the result of the addition. This may or may not be sparse depending on other.

__rand__(other)[source]

Element-wise AND with the right-hand-side of a DelayedArray.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Return type:

Union[SparseNdarray, ndarray]

Returns:

Array containing the result of the check. This may or may not be sparse depending on other.

__repr__()[source]

Pretty-print this SparseNdarray. This uses array2string() and responds to all of its options.

Return type:

str

Returns:

String containing a prettified display of the array contents.

__req__(other)[source]

Check for equality between something and a SparseNdarray.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or any seed object of the same dimensions as shape.

Return type:

Union[SparseNdarray, ndarray]

Returns:

Array containing the result of the check. This may or may not be sparse depending on other.

__rfloordiv__(other)[source]

Divide something by a SparseNdarray and take the floor.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or any seed object of the same dimensions as shape.

Returns:

Array containing the result of the floor division. This may or may not be sparse depending on other.

__rge__(other)[source]

Check whether something is greater than or equal to a SparseNdarray.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or any seed object of the same dimensions as shape.

Return type:

Union[SparseNdarray, ndarray]

Returns:

Array containing the result of the check. This may or may not be sparse depending on other.

__rgt__(other)[source]

Check whether something is greater than a SparseNdarray.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or any seed object of the same dimensions as shape.

Return type:

Union[SparseNdarray, ndarray]

Returns:

Array containing the result of the check. This may or may not be sparse depending on other.

__rle__(other)[source]

Check whether something is greater than or equal to a SparseNdarray.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or any seed object of the same dimensions as shape.

Return type:

Union[SparseNdarray, ndarray]

Returns:

Array containing the result of the check. This may or may not be sparse depending on other.

__rlt__(other)[source]

Check whether something is less than a SparseNdarray.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or any seed object of the same dimensions as shape.

Return type:

Union[SparseNdarray, ndarray]

Returns:

Array containing the result of the check. This may or may not be sparse depending on other.

__rmod__(other)[source]

Take the remainder after dividing something by a SparseNdarray.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or any seed object of the same dimensions as shape.

Returns:

Array containing the result of the modulo. This may or may not be sparse depending on other.

__rmul__(other)[source]

Multiply a SparseNdarray with something on the left hand side.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or any seed object of the same dimensions as shape.

Returns:

Array containing the result of the multiplication. This may or may not be sparse depending on other.

__rne__(other)[source]

Check for non-equality between something and a SparseNdarray.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or any seed object of the same dimensions as shape.

Return type:

Union[SparseNdarray, ndarray]

Returns:

Array containing the result of the check. This may or may not be sparse depending on other.

__ror__(other)[source]

Element-wise OR with the right-hand-side of a DelayedArray.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or a DelayedArray of the same dimensions as shape.

Return type:

Union[SparseNdarray, ndarray]

Returns:

Array containing the result of the check. This may or may not be sparse depending on other.

__rpow__(other)[source]

Raise something to the power of the contents of a SparseNdarray.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or any seed object of the same dimensions as shape.

Returns:

Array containing the result of the power operation. This may or may not be sparse depending on other.

__rsub__(other)[source]

Subtract a SparseNdarray from something else.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or any seed object of the same dimensions as shape.

Returns:

Array containing the result of the subtraction. This may or may not be sparse depending on other.

__rtruediv__(other)[source]

Divide something by a SparseNdarray.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or any seed object of the same dimensions as shape.

Returns:

Array containing the result of the division. This may or may not be sparse depending on other.

__sub__(other)[source]

Subtract something from the right-hand-side of a SparseNdarray.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or any seed object of the same dimensions as shape.

Return type:

Union[SparseNdarray, ndarray]

Returns:

Array containing the result of the subtraction. This may or may not be sparse depending on other.

__truediv__(other)[source]

Divide a SparseNdarray by something.

Parameters:

other – A numeric scalar; or a NumPy array with dimensions as described in UnaryIsometricOpWithArgs; or any seed object of the same dimensions as shape.

Returns:

Array containing the result of the division. This may or may not be sparse depending on other.

all(axis=None, dtype=None)[source]

Test whether all array elements along a given axis evaluate to True.

Compute this test across the SparseNdarray, possibly over a given axis or set of axes. If the seed has a all() method, that method is called directly with the supplied arguments. :type axis: Union[int, Tuple[int, ...], None] :param axis: A single integer specifying the axis over which to test

for any. Alternatively, a tuple (multiple axes) or None (no axes), see any() for details.

Parameters:

dtype (Optional[dtype]) – NumPy type for the output array. If None, this is automatically chosen based on the type of the SparseNdarray, see any() for details.

Return type:

ndarray

Returns:

A NumPy array containing the variances. If axis = None, this will be a NumPy scalar instead.

any(axis=None, dtype=None)[source]

Test whether any array element along a given axis evaluates to True.

Compute this test across the SparseNdarray, possibly over a given axis or set of axes. If the seed has a any() method, that method is called directly with the supplied arguments.

Parameters:
  • axis (Union[int, Tuple[int, ...], None]) – A single integer specifying the axis over which to test for any. Alternatively, a tuple (multiple axes) or None (no axes), see any() for details.

  • dtype (Optional[dtype]) – NumPy type for the output array. If None, this is automatically chosen based on the type of the SparseNdarray, see any() for details.

Return type:

ndarray

Returns:

A NumPy array containing the variances. If axis = None, this will be a NumPy scalar instead.

astype(dtype, **kwargs)[source]

See astype() for details.

All keyword arguments are currently ignored.

Return type:

SparseNdarray

property contents

Contents of the array. This is intended to be read-only and should only be modified if you really know what you’re doing.

Returns:

A nested list, for a n-dimensional array where n > 1.

A tuple containing a sparse vector (i.e., indices and values), for a 1-dimensional array.

Alternatively None, if the array contains no non-zero elements.

copy()[source]
Return type:

SparseNdarray

Returns:

A deep copy of this object.

property dtype: dtype

Returns: NumPy type of the values.

property index_dtype: dtype

Returns: NumPy type of the indices.

property is_masked: bool

Returns: Whether the values are masked.

mean(axis=None, dtype=None)[source]

Take the mean of values across the SparseNdarray, possibly over a given axis or set of axes.

Parameters:
  • axis (Union[int, Tuple[int, ...], None]) – A single integer specifying the axis over which to calculate the mean. Alternatively, a tuple (multiple axes) or None (no axes), see mean() for details.

  • dtype (Optional[dtype]) – NumPy type for the output array. If None, this is automatically chosen based on the type of the SparseNdarray, see mean() for details.

Return type:

ndarray

Returns:

A NumPy array containing the mean values. If axis = None, this will be a NumPy scalar instead.

property shape: Tuple[int, ...]

Returns: Tuple of integers specifying the extent of each dimension.

sum(axis=None, dtype=None)[source]

Take the sum of values across the SparseNdarray, possibly over a given axis or set of axes.

Parameters:
  • axis (Union[int, Tuple[int, ...], None]) – A single integer specifying the axis over which to calculate the sum. Alternatively, a tuple (multiple axes) or None (no axes), see sum() for details.

  • dtype (Optional[dtype]) – NumPy type for the output array. If None, this is automatically chosen based on the type of the SparseNdarray, see sum() for details.

Return type:

ndarray

Returns:

A NumPy array containing the summed values. If axis = None, this will be a NumPy scalar instead.

var(axis=None, dtype=None, ddof=0)[source]

Take the variances of values across the SparseNdarray, possibly over a given axis or set of axes.

Parameters:
  • axis (Union[int, Tuple[int, ...], None]) – A single integer specifying the axis over which to calculate the variance. Alternatively, a tuple (multiple axes) or None (no axes), see var() for details.

  • dtype (Optional[dtype]) – NumPy type for the output array. If None, this is automatically chosen based on the type of the SparseNdarray, see var() for details.

  • ddof (int) – Delta in the degrees of freedom to subtract from the denominator. Typically set to 1 to obtain the sample variance.

Return type:

ndarray

Returns:

A NumPy array containing the variances. If axis = None, this will be a NumPy scalar instead.

delayedarray.Subset module

class delayedarray.Subset.Subset(seed, subset)[source]

Bases: DelayedOp

Delayed subset operation, based on Bioconductor’s DelayedArray::DelayedSubset class. This will slice the array along one or more dimensions, equivalent to the outer product of subset indices.

This class is intended for developers to construct new DelayedArray instances. In general, end users should not be interacting with Subset objects directly.

__init__(seed, subset)[source]
Parameters:
  • seed – Any object that satisfies the seed contract, see DelayedArray for details.

  • subset (Tuple[Sequence[int], ...]) – Tuple of length equal to the dimensionality of seed, containing the subsetted elements for each dimension. Each entry should be a vector of integer indices specifying the elements of the corresponding dimension to retain, where each integer is non-negative and less than the extent of the dimension. Unsorted and/or duplicate indices are allowed.

property dtype: dtype

Returns: NumPy type for the subsetted contents, same as seed.

property seed

Returns: The seed object.

property shape: Tuple[int, ...]

Returns: Tuple of integers specifying the extent of each dimension of the subsetted object.

property subset: Tuple[Sequence[int], ...]

Returns: Subset sequences to be applied to each dimension of the seed.

delayedarray.Subset.chunk_grid_Subset(x)[source]

See chunk_grid().

delayedarray.Subset.create_dask_array_Subset(x)[source]

See create_dask_array().

delayedarray.Subset.extract_dense_array_Subset(x, subset)[source]

See extract_dense_array().

Return type:

ndarray

delayedarray.Subset.extract_sparse_array_Subset(x, subset)[source]

See extract_sparse_array().

Return type:

SparseNdarray

delayedarray.Subset.is_masked_Subset(x)[source]

See is_masked().

delayedarray.Subset.is_sparse_Subset(x)[source]

See is_sparse().

delayedarray.Transpose module

class delayedarray.Transpose.Transpose(seed, perm)[source]

Bases: DelayedOp

Delayed transposition, based on Bioconductor’s DelayedArray::DelayedAperm class.

This will create a matrix transpose in the 2-dimensional case; for a high-dimensional array, it will permute the dimensions.

This class is intended for developers to construct new DelayedArray instances. In general, end users should not be interacting with Transpose objects directly.

__init__(seed, perm)[source]
Parameters:
  • seed – Any object that satisfies the seed contract, see DelayedArray for details.

  • perm (Optional[Tuple[int, ...]]) – Tuple of length equal to the dimensionality of seed, containing the permutation of dimensions. If None, the dimension ordering is assumed to be reversed.

property dtype: dtype

Returns: NumPy type for the transposed contents, same as seed.

property perm: Tuple[int, ...]

Returns: Permutation of dimensions in the transposition.

property seed

Returns: The seed object.

property shape: Tuple[int, ...]

Returns: Tuple of integers specifying the extent of each dimension of the transposed object.

delayedarray.Transpose.chunk_grid_Transpose(x)[source]

See chunk_grid().

delayedarray.Transpose.create_dask_array_Transpose(x)[source]

See create_dask_array().

delayedarray.Transpose.extract_dense_array_Transpose(x, subset)[source]

See extract_dense_array().

Return type:

ndarray

delayedarray.Transpose.extract_sparse_array_Transpose(x, subset)[source]

See extract_sparse_array().

Return type:

SparseNdarray

delayedarray.Transpose.is_masked_Transpose(x)[source]

See is_masked().

delayedarray.Transpose.is_sparse_Transpose(x)[source]

See is_sparse().

delayedarray.UnaryIsometricOpSimple module

class delayedarray.UnaryIsometricOpSimple.UnaryIsometricOpSimple(seed, operation)[source]

Bases: DelayedOp

Delayed unary isometric operation involving an n-dimensional seed array with no additional arguments, similar to Bioconductor’s DelayedArray::DelayedUnaryIsoOpStack class. This is used for simple mathematical operations like NumPy’s log().

This class is intended for developers to construct new DelayedArray instances. End-users should not be interacting with UnaryIsometricOpSimple objects directly.

__init__(seed, operation)[source]
Parameters:
  • seed – Any object that satisfies the seed contract, see DelayedArray for details.

  • operation (Literal['log', 'log1p', 'log2', 'log10', 'exp', 'expm1', 'sqrt', 'abs', 'sin', 'cos', 'tan', 'sinh', 'cosh', 'tanh', 'arcsin', 'arccos', 'arctan', 'arcsinh', 'arccosh', 'arctanh', 'ceil', 'floor', 'trunc', 'sign', 'logical_not']) – String specifying the unary operation.

property dtype: dtype

Returns: NumPy type for the contents of the object after the operation. This may or may not be the same as the seed array, depending on how NumPy does the casting for the requested operation.

property operation: Literal['log', 'log1p', 'log2', 'log10', 'exp', 'expm1', 'sqrt', 'abs', 'sin', 'cos', 'tan', 'sinh', 'cosh', 'tanh', 'arcsin', 'arccos', 'arctan', 'arcsinh', 'arccosh', 'arctanh', 'ceil', 'floor', 'trunc', 'sign', 'logical_not']

Returns: Name of the operation.

property seed

Returns: The seed object.

property shape: Tuple[int, ...]

Returns: Tuple of integers specifying the extent of each dimension of the object after the operation. This should be the same as seed.

delayedarray.UnaryIsometricOpSimple.chunk_grid_UnaryIsometricOpSimple(x)[source]

See chunk_grid().

delayedarray.UnaryIsometricOpSimple.create_dask_array_UnaryIsometricOpSimple(x)[source]

See create_dask_array().

delayedarray.UnaryIsometricOpSimple.extract_dense_array_UnaryIsometricOpSimple(x, subset)[source]

See extract_dense_array().

Return type:

ndarray

delayedarray.UnaryIsometricOpSimple.extract_sparse_array_UnaryIsometricOpSimple(x, subset)[source]

See extract_sparse_array().

Return type:

SparseNdarray

delayedarray.UnaryIsometricOpSimple.is_masked_UnaryIsometricOpSimple(x)[source]

See is_masked().

delayedarray.UnaryIsometricOpSimple.is_sparse_UnaryIsometricOpSimple(x)[source]

See is_sparse().

delayedarray.UnaryIsometricOpWithArgs module

class delayedarray.UnaryIsometricOpWithArgs.UnaryIsometricOpWithArgs(seed, value, operation, right=True)[source]

Bases: DelayedOp

Unary isometric operation involving an n-dimensional seed array with a scalar or 1-dimensional vector, based on Bioconductor’s DelayedArray::DelayedUnaryIsoOpWithArgs class. Only one n-dimensional array is involved here, hence the “unary” in the name. (Hey, I don’t make the rules.)

The data type of the result is determined by NumPy casting given the seed and value data types. We suggest supplying a floating-point value to avoid unexpected results from integer truncation or overflow.

This class is intended for developers to construct new DelayedArray instances. In general, end-users should not be interacting with UnaryIsometricOpWithArgs objects directly.

__init__(seed, value, operation, right=True)[source]
Parameters:
  • seed – Any object satisfying the seed contract, see DelayedArray() for details.

  • value (Union[float, ndarray]) –

    A scalar or NumPy array with which to perform an operation on the seed.

    If scalar, the operation is applied element-wise to all entries of seed.

    If a 1-dimensional NumPy array, the operation is broadcast along the last dimension of seed.

    If an n-dimensional NumPy array, the number of dimensions should be equal to the dmensionality of seed. All dimensions should be of extent 1, except for exactly one dimension that should have extent equal to the corresponding dimension of seed. The operation is then broadcast along that dimension.

  • operation (Literal['add', 'subtract', 'multiply', 'divide', 'remainder', 'floor_divide', 'power', 'equal', 'greater_equal', 'greater', 'less_equal', 'less', 'not_equal', 'logical_and', 'logical_or', 'logical_xor']) – String specifying the operation.

  • right (bool) – Whether value is to the right of seed in the operation. If False, value is put to the left of seed. Ignored for commutative operations in op.

property along: int | None

Returns: The dimension of :py:attr:~seed along which the array values are broadcast, for an array value. Otherwise None, if value is a scalar.

property dtype: dtype

Returns: NumPy type for the data after the operation was applied. This may or may not be the same as the seed array, depending on how NumPy does the casting for the requested operation.

property operation: Literal['add', 'subtract', 'multiply', 'divide', 'remainder', 'floor_divide', 'power', 'equal', 'greater_equal', 'greater', 'less_equal', 'less', 'not_equal', 'logical_and', 'logical_or', 'logical_xor']

Returns: Name of the operation.

property right: bool

Returns: Whether the operation was applied to the right of the seed.

property seed

Returns: The seed object.

property shape: Tuple[int, ...]

Returns: Tuple of integers specifying the extent of each dimension of this object. This should be the same as seed.

property value: float | ndarray

Returns: The other operand used in the operation.

delayedarray.UnaryIsometricOpWithArgs.chunk_grid_UnaryIsometricOpWithArgs(x)[source]

See chunk_grid().

delayedarray.UnaryIsometricOpWithArgs.create_dask_array_UnaryIsometricOpWithArgs(x)[source]

See create_dask_array().

delayedarray.UnaryIsometricOpWithArgs.extract_dense_array_UnaryIsometricOpWithArgs(x, subset)[source]

See extract_dense_array().

Return type:

ndarray

delayedarray.UnaryIsometricOpWithArgs.extract_sparse_array_UnaryIsometricOpWithArgs(x, subset)[source]

See extract_sparse_array().

Return type:

SparseNdarray

delayedarray.UnaryIsometricOpWithArgs.is_masked_UnaryIsometricOpWithArgs(x)[source]

See is_masked().

delayedarray.UnaryIsometricOpWithArgs.is_sparse_UnaryIsometricOpWithArgs(x)[source]

See is_sparse().

delayedarray.apply_over_blocks module

delayedarray.apply_over_blocks.apply_over_blocks(x, fun, allow_sparse=False, grid=None, buffer_size=100000000.0)[source]

Iterate over an array by blocks. We apply a user-provided function and collect the results before proceeding to the next block.

Parameters:
  • x – An array-like object.

  • fun (Callable) – Function to apply to each block. This should accept two arguments; the first is a list containing the start/end of the current block on each dimension, and the second is the block contents. Each block is typically provided as a ndarray.

  • allow_sparse (bool) – Whether to allow extraction of sparse subarrays. If true and x contains a sparse array, the block contents are instead represented by a SparseNdarray.

  • grid (Optional[AbstractGrid]) – Grid to subdivide x for iteration. Specifically, iteration will attempt to extract blocks that are aligned with the grid boundaries, e.g., to optimize extraction of chunked data. Defaults to the output of chunk_grid() on x.

  • buffer_size (int) – Buffer_size in bytes, to hold a single block per iteration. Larger values generally improve speed at the cost of memory.

Return type:

list

Returns:

List containing the output of fun on each block.

delayedarray.apply_over_dimension module

delayedarray.apply_over_dimension.apply_over_dimension(x, dimension, fun, allow_sparse=False, grid=None, buffer_size=100000000.0)[source]

Iterate over an array on a certain dimension. At each iteration, the block of observations consists of the full extent of all dimensions other than the one being iterated over. We apply a user-provided function and collect the results before proceeding to the next block.

Parameters:
  • x – An array-like object.

  • dimension (int) – Dimension to iterate over.

  • fun (Callable) – Function to apply to each block. This should accept two arguments; the first is a tuple containing the start/end of the current block on the chosen dimension, and the second is the block contents. Each block is typically provided as a ndarray.

  • allow_sparse (bool) – Whether to allow extraction of sparse subarrays. If true and x contains a sparse array, the block contents are instead represented by a SparseNdarray.

  • grid (Optional[AbstractGrid]) – Grid to subdivide x for iteration. Specifically, iteration will attempt to extract blocks that are aligned with the grid boundaries, e.g., to optimize extraction of chunked data. Defaults to the output of chunk_grid() on x.

  • buffer_size (int) – Buffer_size in bytes, to hold a single block per iteration. Larger values generally improve speed at the cost of memory.

Return type:

list

Returns:

List containing the output of fun on each block.

delayedarray.chunk_grid module

delayedarray.chunk_grid.chunk_grid(x)[source]

Create a grid over the array, used to determine how a caller should iterate over that array. The intervals of the grid usually reflects a particular layout of the data on disk or in memory.

Parameters:

x (Any) – An array-like object.

Return type:

AbstractGrid

Returns:

An instance of a AbstractGrid.

delayedarray.chunk_grid.chunk_grid_SparseNdarray(x)[source]

See chunk_grid().

The cost factor for iteration is set to 1.5. This is slightly higher than that of dense NumPy arrays as the SparseNdarray is a bit more expensive for random access on the first dimension.

Return type:

SimpleGrid

delayedarray.chunk_grid.chunk_grid_coo_matrix(x)[source]

See chunk_grid().

The cost factor for iteration is set to 5, as any extraction from a COO matrix requires a full scan through all elements.

Return type:

SimpleGrid

delayedarray.chunk_grid.chunk_grid_csc_matrix(x)[source]

See chunk_grid().

The cost factor for iteration is set to 1.5. This is slightly higher than that of dense NumPy arrays as CSC matrices are a bit more expensive for random row access.

Return type:

SimpleGrid

delayedarray.chunk_grid.chunk_grid_csr_matrix(x)[source]

See chunk_grid().

The cost factor for iteration is set to 1.5. This is slightly higher than that of dense NumPy arrays as CSR matrices are a bit more expensive for random column access.

Return type:

SimpleGrid

delayedarray.chunk_grid.chunk_grid_ndarray(x)[source]

See chunk_grid().

The cost factor for iteration is set to 1, which is considered the lowest cost for data extraction given that everything is stored in memory.

Return type:

SimpleGrid

delayedarray.chunk_grid.chunk_shape_to_grid(chunks, shape, cost_factor)[source]

Convert a chunk shape to a SimpleGrid. This assumes that the underlying array is split up into regular intervals on each dimension; the first chunk should start from zero, and only the last chunk may be of a different size (bounded by the dimension extent).

Parameters:
  • chunks (Sequence[int]) – Chunk size for each dimension. These should be positive.

  • shape (Tuple[int, ...]) – Extent of each dimension of the array. These should be non-negative and of the same length as chunks.

  • cost_factor (int) – Cost factor for iterating over each element of the associated array. This is used to decide between iteration schemes and can be increased for more expensive types, e.g., file-backed arrays. As a reference, in-memory NumPy arrays are assigned a cost factor of 1.

Return type:

SimpleGrid

Returns:

A SimpleGrid object with the chunk shape as the boundaries.

delayedarray.create_dask_array module

delayedarray.create_dask_array.create_dask_array(x)[source]

Create a dask array containing the delayed operations, assuming the dask package is installed.

Parameters:

x (Any) – Any array-like object.

Return type:

Array

Returns:

A dask array, possibly containing delayed operations.

delayedarray.extract_dense_array module

delayedarray.extract_dense_array.extract_dense_array(x, subset)[source]

Extract a subset of an array-like object into a dense NumPy array.

Parameters:
  • x (Any) – Any array-like object.

  • subset (Tuple[Sequence[int], ...]) – Tuple of length equal to the number of dimensions, each containing a sorted and unique sequence of integers specifying the elements of each dimension to extract.

Return type:

ndarray

Returns:

NumPy array for the specified subset. This may be a view so callers should create a copy if they intend to modify it.

If is_masked() is True for x, a NumPy MaskedArray is returned instead.

delayedarray.extract_dense_array.extract_dense_array_SparseNdarray(x, subset)[source]

See extract_dense_array().

Return type:

ndarray

delayedarray.extract_dense_array.extract_dense_array_coo_matrix(x, subset)[source]

See extract_dense_array().

Return type:

ndarray

delayedarray.extract_dense_array.extract_dense_array_csc_matrix(x, subset)[source]

See extract_dense_array().

Return type:

ndarray

delayedarray.extract_dense_array.extract_dense_array_csr_matrix(x, subset)[source]

See extract_dense_array().

Return type:

ndarray

delayedarray.extract_dense_array.extract_dense_array_ndarray(x, subset)[source]

See extract_dense_array().

Return type:

ndarray

delayedarray.extract_dense_array.extract_dense_array_sparse_array(x, subset)[source]

See extract_dense_array().

Return type:

ndarray

delayedarray.extract_sparse_array module

delayedarray.extract_sparse_array.extract_sparse_array(x, subset)[source]

Extract the contents of x (or a subset thereof) into a SparseNdarray. This should only be used for x where is_sparse() is True.

Parameters:
  • x (Any) – Any array-like object containing sparse data.

  • subset (Tuple[Sequence[int], ...]) – Tuple of length equal to the number of dimensions, each containing a sorted and unique sequence of integers specifying the elements of each dimension to extract.

Return type:

SparseNdarray

Returns:

SparseNdarray for the requested subset. This may be a view so callers should create a copy if they intend to modify it.

If is_masked() is True for x, the SparseNdarray will contain NumPy ``MaskedArray``s internally.

delayedarray.extract_sparse_array.extract_sparse_array_SparseNdarray(x, subset)[source]

See extract_sparse_array().

Return type:

SparseNdarray

delayedarray.extract_sparse_array.extract_sparse_array_coo_matrix(x, subset)[source]

See extract_sparse_array().

Return type:

SparseNdarray

delayedarray.extract_sparse_array.extract_sparse_array_csc_matrix(x, subset)[source]

See extract_sparse_array().

Return type:

SparseNdarray

delayedarray.extract_sparse_array.extract_sparse_array_csr_matrix(x, subset)[source]

See extract_sparse_array().

Return type:

SparseNdarray

delayedarray.is_masked module

delayedarray.is_masked.is_masked(x)[source]

Determine whether an array-like object contains masked values, equivalent to those in NumPy’s MaskedArray class.

Parameters:

x (Any) – Any array-like object.

Return type:

bool

Returns:

Whether x contains masked values.

delayedarray.is_masked.is_masked_MaskedArray(x)[source]

See is_masked().

delayedarray.is_masked.is_masked_SparseNdarray(x)[source]

See is_masked().

delayedarray.is_masked.is_masked_coo_matrix(x)[source]

See is_masked().

delayedarray.is_masked.is_masked_csc_matrix(x)[source]

See is_masked().

delayedarray.is_masked.is_masked_csr_matrix(x)[source]

See is_masked().

delayedarray.is_masked.is_masked_ndarray(x)[source]

See is_masked().

delayedarray.is_pristine module

delayedarray.is_pristine.is_pristine(x)[source]

Determine whether an object is pristine, i.e., has no delayed operations.

Parameters:

x – Some array-like object.

Return type:

bool

Returns:

Whether x is a DelayedArray containing delayed operations.

delayedarray.is_sparse module

delayedarray.is_sparse.is_sparse(x)[source]

Determine whether an array-like object contains sparse data.

Parameters:

x (Any) – Any array-like object.

Return type:

bool

Returns:

Whether x contains sparse data. If no method is defined for x, False is returned by default.

delayedarray.is_sparse.is_sparse_SparseNdarray(x)[source]

See is_sparse().

delayedarray.is_sparse.is_sparse_coo_matrix(x)[source]

See is_sparse().

delayedarray.is_sparse.is_sparse_csc_matrix(x)[source]

See is_sparse().

delayedarray.is_sparse.is_sparse_csr_matrix(x)[source]

See is_sparse().

delayedarray.to_dense_array module

delayedarray.to_dense_array.to_dense_array(x)[source]

Extract x as a dense NumPy array. The default method simply calls extract_dense_array() with subset set to the full extent of all dimensions.

Parameters:

x (Any) – Any array-like object.

Return type:

ndarray

Returns:

NumPy array contains the full contents of x. This may be masked.

delayedarray.to_scipy_sparse_matrix module

delayedarray.to_scipy_sparse_matrix.to_scipy_sparse_matrix(x, format='csc')[source]

Convert a 2-dimensional array into a SciPy sparse matrix.

Parameters:
  • x (Any) – Input matrix where is_sparse() returns True and is_masked() returns False.

  • format (Literal['coo', 'csr', 'csc']) – Type of SciPy matrix to create - coordinate (coo), compressed sparse row (csr) or compressed sparse column (csc).

Return type:

spmatrix

Returns:

A SciPy sparse matrix with the contents of x.

delayedarray.to_scipy_sparse_matrix.to_scipy_sparse_matrix_from_SparseNdarray(x, format='csc')[source]

See to_scipy_sparse_matrix().

Return type:

spmatrix

delayedarray.to_sparse_array module

delayedarray.to_sparse_array.to_sparse_array(x)[source]

Convert x to a SparseNdarray. This calls extract_sparse_array() with subset set to the full extent of all dimensions.

Parameters:

x (Any) – Any array-like object containing sparse data.

Return type:

SparseNdarray

Returns:

SparseNdarray with the full contents of x.

delayedarray.wrap module

delayedarray.wrap.wrap(x)[source]

Create a DelayedArray from an object satisfying the seed contract. Developers can implement methods for this generic to create DelayedArray subclasses based on the seed type.

Parameters:

x (Any) – Any object satisfiying the seed contract, see documentation for DelayedArray for details.

Return type:

DelayedArray

Returns:

A DelayedArray or one of its subclasses.

delayedarray.wrap.wrap_DelayedArray(x)[source]

See wrap().

Module contents