Language-agnostic genomic data store
In this section, we will illustrate a workflow that utilizes language-agnostic representations for storing genomic data, facilitating seamless access to datasets and analysis results across multiple programming frameworks such as R and Python. The ArtifactDB framework provides this functionality.
To begin, we will download the “zilionis lung” dataset from the scRNAseq package. Subsequently, we will store this dataset in a language-agnostic format using the alabaster suite of R packages.
library(scRNAseq)
library(alabaster)
<- ZilionisLungData()
sce saveObject(sce, path=paste(getwd(), "zilinoislung", sep="/"))
Additionally, you can save this dataset as an RDS object for access in Python. Refer to interop with R section for more details.
We can now load this dataset in Python using the dolomite suite of Python packages. Both dolomite and alabaster are integral parts of the ArtifactDB ecosystem designed to read artifacts stored in language-agnostic formats.
from dolomite_base import read_object
= read_object("./zilinoislung")
data print(data)
You can now convert this to AnnData
representations for downstream analysis.
= data.to_anndata() adata
Leveraging the generic read functions readObject
(R) and read_object
(Python), along with the save functions saveObject
(R) and save_object
(Python), you can seamlessly store most Bioconductor objects in language-agnostic formats.
Further reading
- Check out ArtifactDB framework for more information.