In this section, we will illustrate a workflow that utilizes either language-agnostic representations for storing genomic data or reading RDS files directly in Python, to facilitate seamless access to datasets and analysis results.
Load dataset
Note
Check out
the interop with R section for reading RDS files directly in Python or
To demonstrate this workflow, we will use the models from the CellTypist package to annotate cell types for this dataset. CellTypist operates on an AnnData representation.
from rds2py import read_rds, as_summarized_experimentimport numpy as npr_object = read_rds("../assets/data/zilinois-lung-subset.rds")sce = as_summarized_experiment(r_object)adata, _ = sce.to_anndata()adata.X = np.log1p(adata.layers["counts"])adata.var.index = adata.var["genes"].tolist()print(adata)
/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/genomicranges/SeqInfo.py:348: UserWarning: 'seqnames' is deprecated, use 'get_seqnames' instead
warn("'seqnames' is deprecated, use 'get_seqnames' instead", UserWarning)
📜 Retrieving model list from server https://celltypist.cog.sanger.ac.uk/models/models.json
📚 Total models in list: 44
📂 Storing models in /home/runner/.celltypist/data/models
💾 Downloading model [1/44]: Immune_All_Low.pkl
💾 Downloading model [2/44]: Immune_All_High.pkl
💾 Downloading model [3/44]: Adult_CynomolgusMacaque_Hippocampus.pkl
💾 Downloading model [4/44]: Adult_Human_PancreaticIslet.pkl
💾 Downloading model [5/44]: Adult_Human_Skin.pkl
💾 Downloading model [6/44]: Adult_Mouse_Gut.pkl
💾 Downloading model [7/44]: Adult_Mouse_OlfactoryBulb.pkl
💾 Downloading model [8/44]: Adult_Pig_Hippocampus.pkl
💾 Downloading model [9/44]: Adult_RhesusMacaque_Hippocampus.pkl
💾 Downloading model [10/44]: Autopsy_COVID19_Lung.pkl
💾 Downloading model [11/44]: COVID19_HumanChallenge_Blood.pkl
💾 Downloading model [12/44]: COVID19_Immune_Landscape.pkl
💾 Downloading model [13/44]: Cells_Fetal_Lung.pkl
💾 Downloading model [14/44]: Cells_Intestinal_Tract.pkl
💾 Downloading model [15/44]: Cells_Lung_Airway.pkl
💾 Downloading model [16/44]: Developing_Human_Brain.pkl
💾 Downloading model [17/44]: Developing_Human_Gonads.pkl
💾 Downloading model [18/44]: Developing_Human_Hippocampus.pkl
💾 Downloading model [19/44]: Developing_Human_Organs.pkl
💾 Downloading model [20/44]: Developing_Human_Thymus.pkl
💾 Downloading model [21/44]: Developing_Mouse_Brain.pkl
💾 Downloading model [22/44]: Developing_Mouse_Hippocampus.pkl
💾 Downloading model [23/44]: Fetal_Human_AdrenalGlands.pkl
💾 Downloading model [24/44]: Fetal_Human_Pancreas.pkl
💾 Downloading model [25/44]: Fetal_Human_Pituitary.pkl
💾 Downloading model [26/44]: Fetal_Human_Retina.pkl
💾 Downloading model [27/44]: Fetal_Human_Skin.pkl
💾 Downloading model [28/44]: Healthy_Adult_Heart.pkl
💾 Downloading model [29/44]: Healthy_COVID19_PBMC.pkl
💾 Downloading model [30/44]: Healthy_Human_Liver.pkl
💾 Downloading model [31/44]: Healthy_Mouse_Liver.pkl
💾 Downloading model [32/44]: Human_AdultAged_Hippocampus.pkl
💾 Downloading model [33/44]: Human_Developmental_Retina.pkl
💾 Downloading model [34/44]: Human_Embryonic_YolkSac.pkl
💾 Downloading model [35/44]: Human_IPF_Lung.pkl
💾 Downloading model [36/44]: Human_Longitudinal_Hippocampus.pkl
💾 Downloading model [37/44]: Human_Lung_Atlas.pkl
💾 Downloading model [38/44]: Human_PF_Lung.pkl
💾 Downloading model [39/44]: Lethal_COVID19_Lung.pkl
💾 Downloading model [40/44]: Mouse_Dentate_Gyrus.pkl
💾 Downloading model [41/44]: Mouse_Isocortex_Hippocampus.pkl
💾 Downloading model [42/44]: Mouse_Postnatal_DentateGyrus.pkl
💾 Downloading model [43/44]: Nuclei_Lung_Airway.pkl
💾 Downloading model [44/44]: Pan_Fetal_Human.pkl
CellTypist model with 61 cell types and 5017 features
date: 2023-05-17 19:51:55.661237
details: integrated Human Lung Cell Atlas (HLCA) combining multiple datasets of the healthy respiratory system
source: https://doi.org/10.1101/2022.03.10.483747
version: v2
cell types: AT0, AT1, ..., pre-TB secretory
features: TSPAN6, FGR, ..., RP1-34B20.21
Now, let’s annotate our dataset.
predictions = celltypist.annotate(adata, model = model_name, majority_voting =True)print(predictions.predicted_labels)
⚠️ Warning: invalid expression matrix, expect ALL genes and log1p normalized expression to 10000 counts per cell. The prediction result may not be accurate
🔬 Input data has 2000 cells and 41861 genes
🔗 Matching reference genes in the model
🧬 4829 features used for prediction
⚖️ Scaling input data
🖋️ Predicting labels
✅ Prediction done!
👀 Can not detect a neighborhood graph, will construct one before the over-clustering
⛓️ Over-clustering input data with resolution set to 5
🗳️ Majority voting the predictions
✅ Majority voting done!
We can now reverse the workflow and save this object into an Artifactdb format from Python. However, the object needs to be converted to a SingleCellExperiment class first. Read more about our experiment representations here.
from singlecellexperiment import SingleCellExperimentsce = SingleCellExperiment.from_anndata(adata)print(sce)
And that concludes the workflow. Leveraging the generic read functions readObject (R) and read_object (Python), along with the save functions saveObject (R) and save_object (Python), you can seamlessly store most Bioconductor objects in language-agnostic formats.