Python bindings for tatami¶
Overview¶
The mattress package implements Python bindings to the tatami C++ library for matrix representations. Downstream packages can use mattress to develop C++ extensions that are interoperable with many different matrix classes, e.g., dense, sparse, delayed or file-backed. mattress is inspired by the beachmat Bioconductor package, which does the same thing for R packages.
Instructions¶
mattress is published to PyPI, so installation is simple:
pip install mattress
mattress is intended for Python package developers writing C++ extensions that operate on matrices.
Add
mattress.includes()
to theinclude_dirs=
of yourExtension()
definition insetup.py
. This will give you access to the various tatami headers to compile your C++ code.Add
#include "Mattress.h"
to your C++ source files. This defines aMattress
class where theptr
member is a pointer to a tatami matrix. Python-visible C++ functions should expect to take aMattress*
or equivalent address (e.g.,uintptr_t
), after which theptr
should be extracted for use in tatami-compatible functions.Call
mattress.tatamize()
on Python matrix objects within each of your functions that call tatami C++ code. This will wrap the Python matrix in a tatami-compatible C++ representation for use in the C++ code. The pointer to the C++ instance can be accessed through theptr
property of the returned object, which can then be passed to C++ code as anuintptr_t
to aMattress
instance.
So, for example, we can write ctypes bindings like:
#include "Mattress.h"
extern "C" {
int do_something_interesting(const void* mat) {
return reinterpret_cast<const Mattress*>(mat)->ptr->nrow();
}
}
Which we can subsequently call like:
import mattress
import ctypes as ct
lib = ct.CDLL("compiled.so")
lib.do_something_interesting.restype = ct.c_int
lib.do_something_interesting.argtypes = [ ct.c_void_p ]
def do_something_interesting(x):
mat = mattress.tatamize(x)
return do_something_interesting(x.ptr)
Of course, any FFI that accepts a pointer address can be used here.
Supported matrices¶
Dense numpy matrices of varying numeric type:
import numpy as np
from mattress import tatamize
x = np.random.rand(1000, 100)
tatamat = tatamize(x)
ix = (x * 100).astype(np.uint16)
tatamat2 = tatamize(ix)
Compressed sparse matrices from scipy with varying index/data types:
from scipy import sparse as sp
from mattress import tatamize
xc = sp.random(100, 20, format="csc")
tatamat = tatamize(xc)
xr = sp.random(100, 20, format="csc", dtype=np.uint8)
tatamat2 = tatamize(xr)
Delayed arrays from the delayedarray package:
from delayedarray import DelayedArray
from scipy import sparse as sp
from mattress import tatamize
import numpy
xd = DelayedArray(sp.random(100, 20, format="csc"))
xd = numpy.log1p(xd * 5)
tatada = tatamize(xd)
To be added:
File-backed matrices from the FileBackedArray package, including HDF5 and TileDB.
Arbitrary Python matrices?
Utility methods¶
The TatamiNumericPointer
instance returned by tatamize()
provides a few Python-visible methods for querying the C++ matrix.
tatamat.nrow() // number of rows
tatamat.column(1) // contents of column 1
tatamat.sparse() // whether the matrix is sparse.
It also has a few methods for computing common statistics:
tatamat.row_sums()
tatamat.column_variances(num_threads = 2)
grouping = [i%3 for i in range(tatamat.ncol())]
tatamat.row_medians_by_group(grouping)
tatamat.row_nan_counts()
tatamat.column_ranges()
These are mostly intended for non-intensive work or testing/debugging. It is expected that any serious computation should be performed by iterating over the matrix in C++.
Operating on an existing pointer¶
If we already have a TatamiNumericPointer
, we can easily apply additional operations by wrapping it in the relevant delayedarray layers and calling tatamize()
afterwards.
For example, if we want to add a scalar, we might do:
from delayedarray import DelayedArray
from mattress import tatamize
import numpy
x = numpy.random.rand(1000, 10)
tatamat = tatamize(x)
wrapped = DelayedArray(tatamat) + 1
tatamat2 = tatamize(wrapped)
This avoids relying on x
and is more efficient as it re-uses the TatamiNumericPointer
generated from x
.
Developer Notes¶
Build the shared object file:
python setup.py build_ext --inplace
For quick testing, we usually do:
pytest
For more complex testing, we do:
python setup.py build_ext --inplace && tox
To rebuild the ctypes bindings with cpptypes:
cpptypes src/mattress/lib --py src/mattress/_cpphelpers.py --cpp src/mattress/lib/bindings.cpp --dll _core