---
file_format: mystnb
kernelspec:
  name: python
---

# Multiple experiments

`MultiAssayExperiment` (MAE) simplifies the management of multiple experimental assays conducted on a shared set of specimens.

:::{note}
These classes follow a functional paradigm for accessing or setting properties, with further details discussed in [functional paradigm](https://biocpy.github.io/tutorial/chapters/philosophy.html#functional-discipline) section.
:::

## Construction

An MAE contains three main entities,

- **Primary information** (`column_data`): Bio-specimen/sample information. The `column_data` may provide information about patients, cell lines, or other biological units. Each row in this table represents an independent biological unit. It must contain an `index` that maps to the 'primary' in `sample_map`.

- **Experiments** (`experiments`): Genomic data from each experiment. either a `SingleCellExperiment`, `SummarizedExperiment`, `RangedSummarizedExperiment` or any class that extends a `SummarizedExperiment`.

- **Sample Map** (`sample_map`): Map biological units from `column_data` to the list of `experiments`. Must contain columns,
    - **assay** provides the names of the different experiments performed on the biological units. All experiment names from experiments must be present in this column.
    - **primary** contains the sample name. All names in this column must match with row labels from col_data.
    - **colname** is the mapping of samples/cells within each experiment back to its biosample information in col_data.

    Each sample in ``column_data`` may map to one or more columns per assay.

Let's start by first creating few experiments:

```{code-cell}

from random import random

import numpy as np
from biocframe import BiocFrame
from genomicranges import GenomicRanges
from iranges import IRanges

nrows = 200
ncols = 6
counts = np.random.rand(nrows, ncols)
gr = GenomicRanges(
    seqnames=[
            "chr1",
            "chr2",
            "chr2",
            "chr2",
            "chr1",
            "chr1",
            "chr3",
            "chr3",
            "chr3",
            "chr3",
        ] * 20,
    ranges=IRanges(range(100, 300), range(110, 310)),
    strand = ["-", "+", "+", "*", "*", "+", "+", "+", "-", "-"] * 20,
    mcols=BiocFrame({
        "score": range(0, 200),
        "GC": [random() for _ in range(10)] * 20,
    })
)

col_data_sce = BiocFrame({"treatment": ["ChIP", "Input"] * 3},
    row_names=[f"sce_{i}" for i in range(6)],
)

col_data_se = BiocFrame({"treatment": ["ChIP", "Input"] * 3},
    row_names=[f"se_{i}" for i in range(6)],
)
```

More importantly, we need to provide `sample_map` information:

```{code-cell}
sample_map = BiocFrame({
    "assay": ["sce", "se"] * 6,
    "primary": ["sample1", "sample2"] * 6,
    "colname": ["sce_0", "se_0", "sce_1", "se_1", "sce_2", "se_2", "sce_3", "se_3", "sce_4", "se_4", "sce_5", "se_5"]
})

sample_data = BiocFrame({"samples": ["sample1", "sample2"]}, row_names= ["sample1", "sample2"])

print(sample_map)
```


Finally, we can create an `MultiAssayExperiment` object:

```{code-cell}
from multiassayexperiment import MultiAssayExperiment
from singlecellexperiment import SingleCellExperiment
from summarizedexperiment import SummarizedExperiment

tsce = SingleCellExperiment(
    assays={"counts": counts}, row_data=gr.to_pandas(), column_data=col_data_sce
)

tse2 = SummarizedExperiment(
    assays={"counts": counts.copy()},
    row_data=gr.to_pandas().copy(),
    column_data=col_data_se.copy(),
)

mae = MultiAssayExperiment(
    experiments={"sce": tsce, "se": tse2},
    column_data=sample_data,
    sample_map=sample_map,
    metadata={"could be": "anything"},
)

print(mae)
```

### No sample mapping?

If both `column_data` and `sample_map` are `None`, the constructor naively creates sample mapping, with each `experiment` considered to be a independent `sample`. We add a sample to `column_data` in this pattern - ``unknown_sample_{experiment_name}``.

All cells from the each experiment are considered to be from the same sample and is reflected in `sample_map`.

:::{important}
***This is not a recommended approach, but if you don’t have sample mapping, then it doesn’t matter***.
:::

```{code-cell}
mae = MultiAssayExperiment(
    experiments={"sce": tsce, "se": tse2},
    metadata={"could be": "anything"},
)

print(mae)
```

### Interop with `anndata` or `mudata`

We provide convenient methods to easily convert a `MuData` object into an `MultiAssayExperiment`.

Let's create a mudata object:

```{code-cell}

import numpy as np
from anndata import AnnData

np.random.seed(1)

n, d, k = 1000, 100, 10

z = np.random.normal(loc=np.arange(k), scale=np.arange(k) * 2, size=(n, k))
w = np.random.normal(size=(d, k))
y = np.dot(z, w.T)

adata = AnnData(y)
adata.obs_names = [f"obs_{i+1}" for i in range(n)]
adata.var_names = [f"var_{j+1}" for j in range(d)]

d2 = 50
w2 = np.random.normal(size=(d2, k))
y2 = np.dot(z, w2.T)

adata2 = AnnData(y2)
adata2.obs_names = [f"obs_{i+1}" for i in range(n)]
adata2.var_names = [f"var2_{j+1}" for j in range(d2)]

from mudata import MuData
mdata = MuData({"rna": adata, "spatial": adata2})

print(mdata)
```

Lets convert this object to an `MAE`:

```{code-cell}
from multiassayexperiment import MultiAssayExperiment

mae_obj = MultiAssayExperiment.from_mudata(input=mdata)
print(mae_obj)
```


## Getters/Setters

Getters are available to access various attributes using either the property notation or functional style.

```{code-cell}
# access assays
print("experiment names (as property): ", mae.experiment_names)
print("experiment names (functional style): ", mae.get_experiment_names())

# access sample data
print(mae.column_data)
```

Check out the [class documentation](https://biocpy.github.io/MultiAssayExperiment/api/multiassayexperiment.html#multiassayexperiment.MultiAssayExperiment.MultiAssayExperiment) for the full list of accessors and setters.


#### Row or column name accessors

A helper method is available to easily access row or column names across all experiments. This method returns a dictionary with experiment names as keys and the corresponding values, which can be either the row or column names depending on the function:

```{code-cell}
from rich import print as pprint
pprint("row names:", mae.get_row_names())
pprint("column names:", mae.get_column_names())
```

#### Access an experiment

One can access an experiment by name:

```{code-cell}
print(mae.experiment("se"))
```

Additionally you may access an experiment with the sample information included in the column data of the experiment:

:::{note}
This creates a copy of the experiment.
:::

```{code-cell}
expt_with_sample_info = mae.experiment("se", with_sample_data=True)
print(expt_with_sample_info)
```

:::{note}
For consistency with the R MAE's interface, we also provide `get_with_column_data` method, that performs the same operation.
:::

### Setters

::: {important}
All property-based setters are `in_place` operations, with further details discussed in [functional paradigm](../philosophy.qmd#functional-discipline) section.
:::

```{code-cell}
modified_column_data = mae.column_data.set_column("score", range(len(mae.column_data)))
modified_mae = mae.set_column_data(modified_column_data)
print(modified_mae)
```

Now, lets check the `column_data` on the original object.

```{code-cell}
print(mae.column_data)
```


## Subsetting

You can subset `MultiAssayExperiment` by using the subset (`[]`) operator. This operation accepts different slice input types, such as a boolean vector, a `slice` object, a list of indices, or names (if available) to subset.

`MultiAssayExperiment` allows subsetting by three dimensions: `rows`, `columns`, and `experiments`. ***`sample_map` is automatically filtered during this operation***.

### Subset by indices

```{code-cell}
subset_mae = mae[1:5, 0:4]
print(subset_mae)
```

### Subset by experiments dimension

The following creates a subset based on the experiments dimension:

```{code-cell}
subset_mae = mae[1:5, 0:1, ["se"]]
print(subset_mae)
```

:::{note}
If you're wondering about why the experiment "se" has 0 columns, it's important to note that our MAE implementation does not remove columns from an experiment solely because none of the columns map to the samples of interest. This approach aims to prevent unexpected outcomes in complex subset operations.
:::

## Helper functions

The `MultiAssayExperiment` class also provides a few methods for sample management.

### Complete cases

The `complete_cases` function is designed to identify samples that contain data across all experiments. It produces a boolean vector with the same length as the number of samples in `column_data`. Each element in the vector is `True` if the sample is present in all experiments, or `False` otherwise.

```{code-cell}
print(mae.complete_cases())
```

You can use this boolean vector to select samples with complete data across all assays or experiments.

```{code-cell}
subset_mae = mae[:, mae.complete_cases(),]
print(subset_mae)
```


### Replicates

This method identifies 'samples' with replicates within each experiment. The result is a dictionary where experiment names serve as keys, and the corresponding values indicate whether the sample is replicated within each experiment.


```{code-cell}
from rich import print as pprint # mainly for pretty printing
pprint(mae.replicated())
```


### Intersect rows

The `intersect_rows` finds common `row_names` across all experiments and returns a `MultiAssayExperiment` with those rows.

```{code-cell}
common_rows_mae = mae.intersect_rows()
print(common_rows_mae)
```

If you are only interested in finding common `row_names` across all experiments:

```{code-cell}
common_rows = mae.find_common_row_names()
print(common_rows)
```

### Empty MAE

While the necessity of an empty `MultiAssayExperiment` might not be apparent, for the sake of consistency with the rest of the tutorials:

```{code-cell}
mae = MultiAssayExperiment(experiments={})
print(mae)
```