BiocPy: Facilitate Bioconductor Workflows in Python
Welcome to BiocPy

BiocPy is designed to build a bridge between the mature Bioconductor ecosystem and the Python landscape. Bioconductor is an open-source software project that provides tools for the analysis and comprehension of genomic data. One of the main advantages of Bioconductor is the availability of standard data representations and large number of analysis tools tailored for genomic experiments. These data structures allow researchers to seamlessly store, manipulate, and analyze data across multiple packages and workflows in R.
Inspired by Bioconductor, BiocPy aims to facilitate Bioconductor workflows in Python. To achieve this goal, we developed several core data structures that align closely to the Bioconductor implementations. By implementing these core Bioconductor data structures, BiocPy allows data to be easily interoperable between R and Python.
About this Book
This book is currently in active development and is organized into the following sections:
- Foundations: The core data structures that underpin the ecosystem.
- GenomicRanges: Manipulation of genomic intervals.
- Data Containers: Rich semantic data containers like
SummarizedExperiment.
- BioC Hubs: Access to Bioconductor’s cloud resources.
- ExperimentHub: Discover and download datasets.
- AnnotationHub: Interact with gene models (
TxDb) and organism databases (OrgDb).
- Interoperability: Tools to bridge R and Python.
- R Interoperability: Read R data files (
RDS) directly in Python. - ArtifactDB: Language-agnostic storage for genomic data in R and Python.
- R Interoperability: Read R data files (
- Workflows: Real-world usage examples.
- Single-Cell Analysis: Multi-modal single-cell analysis with
scranpy. - Annotation: Automated cell type annotation with
singler.
- Single-Cell Analysis: Multi-modal single-cell analysis with
Further Reading
Many online resources offer detailed information on Bioconductor data structures, namely: