Custom Serialization and Deserialization Guide¶
This guide shows you how to extend rds2py to support custom Python classes. By implementing custom readers and writers, you can serialize your custom Python representations directly into native R RDS/RData structures, and read them back seamlessly.
rds2py achieves this two-way extensibility using:
Python’s
functools.singledispatchmechanism for writing/serialization (save_rds).A global class mapping registry for reading/deserialization (
read_rds).
1. Custom Serialization (Python -> RDS)¶
To serialize a custom Python class, you register it with the save_rds generic dispatcher. Your custom function needs to take your object and convert it into a structured dictionary that matches R’s internal representation format.
The Structured RDS Representation Format¶
R objects are represented in Python as nested dictionaries containing the following keys:
"type": The R type descriptor (e.g.,"S4","vector","integer","double","string","logical", or"null")."class_name": The target R class name (e.g.,"MyCustomRClass")."package_name": (Optional, for S4 classes) The name of the R package where the class is defined."attributes": A dictionary representing R attributes or S4 slots. Each slot value must also be a structured representation dictionary."data": The flat list or array of values for vector/atomic types.
Example: Implementing a Custom Serializer¶
Let’s say we have a custom Python class named MyFeature:
class MyFeature:
def __init__(self, name: str, values: list):
self.name = name
self.values = values
To serialize MyFeature as a native R S4 class called "MyCustomRClass" from package "MyRPackage", we register it using @save_rds.register:
from typing import Optional
from rds2py import save_rds
@save_rds.register(MyFeature)
def _save_rds_myfeature(x: MyFeature, path: Optional[str] = None):
# Native C++ writer call
from rds2py.lib_rds_parser import write_rds as write_rds_native
# 1. Structure the Python object into the expected R dictionary format
converted = {
"type": "S4",
"class_name": "MyCustomRClass",
"package_name": "MyRPackage",
"attributes": {
# Recursively call save_rds to serialize internal elements
"featureName": save_rds(x.name),
"featureValues": save_rds(x.values),
},
}
# 2. If a save path is specified, write directly using the native writer
if path is not None:
write_rds_native(converted, path)
return converted
2. Custom Deserialization (RDS -> Python)¶
To read custom S4 objects back into Python classes via read_rds, you need to:
Write a deserialization function that constructs your Python class from the raw parsed dictionary.
Register your deserializer function in
rds2py’s global class mapping registry.
Example: Implementing the Reader¶
from rds2py.generics import _dispatcher
from rds2py.rdsutils import get_class
def read_my_custom_class(robject: dict, **kwargs) -> MyFeature:
# 1. Verify the incoming R class name
cls_name = get_class(robject)
if cls_name != "MyCustomRClass":
raise ValueError(f"Expected class 'MyCustomRClass', but received '{cls_name}'")
# 2. Extract and parse the slots recursively
# We call the internal _dispatcher helper to parse child structures
feature_name = _dispatcher(robject["attributes"]["featureName"], **kwargs)
feature_values = _dispatcher(robject["attributes"]["featureValues"], **kwargs)
# 3. Instantiate and return your custom Python class
return MyFeature(name=feature_name, values=list(feature_values))
Registering the Reader¶
Map your class name to the reader function in the global class registry (REGISTRY from rds2py.generics):
from rds2py.generics import REGISTRY
# Register our custom deserializer in the global map
REGISTRY["MyCustomRClass"] = read_my_custom_class
3. Full Roundtrip¶
Here is how the entire custom serialization and deserialization workflow works together:
import tempfile
import os
from rds2py import write_rds, read_rds
# 1. Create a custom instance
feature = MyFeature(name="expression_level", values=[10, 20, 30])
# 2. Serialize to a temporary RDS file
with tempfile.NamedTemporaryFile(suffix=".rds", delete=False) as tmp:
path = tmp.name
try:
# Write custom class to RDS format
write_rds(feature, path)
# Read the RDS file back into Python
recreated = read_rds(path)
# 3. Verify that the roundtrip correctly recreated the custom class
assert isinstance(recreated, MyFeature)
assert recreated.name == "expression_level"
assert recreated.values == [10, 20, 30]
print("Roundtrip validation successful!")
finally:
if os.path.exists(path):
os.unlink(path)