from rds2py import read_rds
= read_rds("../assets/data/zeisel-brain-subset.rds")
r_object
from rich import print as pprint
# hiding the response pprint(r_object)
Interop with RDS files
The rds2py package serves as a Python interface to the rds2cpp library, enabling direct reading of RDS files within Python. This eliminates the need for additional data conversion tools or intermediate formats, streamlining the transition between Python and R for seamless analysis.
One notable feature is the use of memory views (excluding strings) to access the same memory from C++ in Python, facilitated through Cython. This approach is particularly advantageous for handling large datasets, as it avoids unnecessary duplication of data.
What sets rds2py
apart from other similar parsers is its capability to read S4 classes. This unique feature allows the parsing of Bioconductor data types directly from R into Python.
Installation
To get started, install the package from PyPI
pip install rds2py
Reading RDS objects
Reading an RDS file in Python involves a two-step process. First, we parse the serialized RDS into a readable Python object, typically a dictionary. This object contains both the data and relevant metadata about the structure and internal representation of the R object. Subsequently, we use one of the available functions to convert this object into a Python representation.
Step 0: Save an RDS file
Before we begin, let’s create a test dataset from R. In this example, we’ll download the “zeisel brain” dataset from the scRNAseq package. For tutorial purposes, we’ll filter the dataset to the first 1000 rows and save it as an RDS file.
library(scRNAseq)
<- ZeiselBrainData()
sce <- sce[1:1000,]
sub saveRDS(sub, "../assets/data/zeisel-brain-subset.rds")
Step 1: Parse the RDS file
Now, we can read the RDS file in Python using the read_rds
function, which parses the file contents and returns a dictionary of the R object.
The output of the above code block is hidden to maintain the cleanliness and visual appeal of this document :)
Once we have a realized structure, we can convert this object into useful Python representations. It contains two keys: - data: If atomic entities, contains the NumPy view of the memory space. - attributes: Additional properties available for the object.
Step 2: Conversion to Python representations
The package provides functions to convert these R objects into useful Python representations.
from rds2py import as_summarized_experiment
# to convert an robject to SCE
= as_summarized_experiment(r_object)
sce
print(sce)
class: SingleCellExperiment
dimensions: (1000, 3005)
assays(1): ['counts']
row_data columns(1): ['featureType']
row_names(1000): ['0', '1', '2', ..., '997', '998', '999']
column_data columns(10): ['tissue', 'group #', 'total mRNA mol', 'well', 'sex', 'age', 'diameter', 'cell_id', 'level1class', 'level2class']
column_names(3005): ['1772071015_C02', '1772071017_G12', '1772071017_A05', ..., '1772063068_D01', '1772066098_A12', '1772058148_F03']
main_experiment_name:
reduced_dims(0): []
alternative_experiments(2): ['ERCC', 'repeat']
row_pairs(0): []
column_pairs(0): []
metadata(0):
/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/genomicranges/SeqInfo.py:348: UserWarning: 'seqnames' is deprecated, use 'get_seqnames' instead
warn("'seqnames' is deprecated, use 'get_seqnames' instead", UserWarning)
/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/genomicranges/SeqInfo.py:348: UserWarning: 'seqnames' is deprecated, use 'get_seqnames' instead
warn("'seqnames' is deprecated, use 'get_seqnames' instead", UserWarning)
/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/genomicranges/SeqInfo.py:348: UserWarning: 'seqnames' is deprecated, use 'get_seqnames' instead
warn("'seqnames' is deprecated, use 'get_seqnames' instead", UserWarning)
And that’s it! It’s as straightforward as that. The as_summarized_experiment
function serves as an example of how to convert complex R structures into Python representations. Similarly, the package offers parsers for atomic vectors, lists, sparse/dense matrices, data frames, and most R data structures.
You can continue to convert this object into AnnData
representation and perform analysis. For more details on SingleCellExperiment
, refer to the documentation here.
sce.to_anndata()
(AnnData object with n_obs × n_vars = 3005 × 1000
obs: 'tissue', 'group #', 'total mRNA mol', 'well', 'sex', 'age', 'diameter', 'cell_id', 'level1class', 'level2class'
var: 'featureType'
layers: 'counts',
None)
Well, that’s all. Dive in, explore, and create more base representations to encapsulate complex R structures. If you wish to add more representations, we are more than happy to accept contributions.