Usage Notes

While our package provides methods to generate TileDB files, it makes certain assumptions. We will continue to document some of the gotcha’s as we run into them. Please review the following for a smooth experience:

Experimental Data Consistency

All experimental data objects (either as AnnData or H5AD paths) are expected to be fairly consistent:

  • Matrix Location: If the matrix to use is “counts”, all objects must contain this matrix in the layers slot, not in X or under a different name.

  • Feature IDs/Gene Symbols: These should be consistent across objects, either as the index or as a column in the var dataframe.

Cell Metadata

If cell_metadata is not provided, the build process scans all files to count the number of cells and creates a simple range index.

Sample Information

  • Each file is considered a sample, hence a mapping between cells and samples is automatically created.

  • The sample information provided must match the number of input files.

Handling Metadata Columns with None/NaN Values

For metadata columns containing None, nan, or NaN values:

  • It’s best to specify float as the type of the column

  • Even if most values are integers, TileDB may behave unexpectedly with mixed types

Metadata contains unicode characters?

We’ve run into a few issues when metadata objects containing unicode characters are written into a TileDB frame. The best solution I can think of is to ignore them

print(u'aあä'.encode('ascii', 'ignore'))

  ## output
  b'a'

Compared to,

print(u'aあä'.encode('ascii'))

  ## output
  ---------------------------------------------------------------------------
  UnicodeEncodeError                        Traceback (most recent call last)
  Cell In[1], line 1
  ----> 1 print(u'aあä'.encode('ascii'))

  UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-2: ordinal not in range(128)
  ---------------------------------------------------------------------------

Additionally since the build options helps specify column types, 'ascii' is preferred compared to str. We’ve run into issues writing large chunks of string columns to TileDB.

For further assistance or clarification, please refer to our documentation or raise an issue on GitHub.