# Tutorial: Structure and Client Interface This tutorial explains the directory structure generated by `wobbegongify` and how to query the data efficiently. ### 1. The Wobbegong Format Wobbegong flattens complex Bioconductor objects into simple binary files accompanied by a JSON summary. - **`summary.json`**: Contains metadata (dimensions, types) and **byte offsets**. This is the map clients use to figure out _where_ data lives. - **`content`**: A binary file containing compressed chunks of data. - **`stats`**: (For matrices) A binary file containing pre-calculated statistics like row sums. When you run `wobbegongify(obj, "dir")`, it creates a structured hierarchy: ```text my_study/ ├── summary.json # Top-level metadata ├── assays/ # Matrix data │ ├── 0/ │ │ ├── summary.json │ │ ├── content │ │ └── stats │ └── ... └── reduced_dimensions/ # Reduced dims (stored as DataFrames) ├── 0/ │ ├── summary.json │ └── content └── ... ``` ### 2. Supported Objects #### BiocFrame Saved as a series of compressed columns. ```python df = BiocFrame({"gene": ["A", "B"], "val": [1, 2]}) wobbegongify(df, "data/df") ``` #### Matrices (Dense & Sparse) Matrices are saved **row-wise**. This is optimized for genomic viewers that need to show expression of a specific gene across all cells. - **Dense**: Rows are written sequentially. - **Sparse**: Values and Indices (delta-encoded) are written for each row. #### SingleCellExperiment Recursively converts all supported components: - `assays` -> Matrices - `row_data` / `col_data` -> BiocFrames - `reduced_dims` -> BiocFrames (Column-wise) - `alternative_experiments` -> Nested SingleCellExperiments ### 3. Client Interface The `wobbegong.load()` function acts as a factory, returning the appropriate reader object based on the `summary.json`. **Accessing Matrices:** ```python mat = wobbegong.load("data/matrix") # Get expression for the 5th gene row_vec = mat.get_row(4) # Get pre-calculated statistics (instant access) total_counts = mat.get_statistic("row_sum") ``` **Accessing DataFrames:** ```python df = wobbegong.load("data/metadata") # Get a specific column cell_ids = df.get_column("cell_id") ``` Check out the [R package](https://github.com/kanaverse/wobbegong-R) for more details.