orgdb package

Submodules

orgdb.orgdb module

class orgdb.orgdb.OrgDb(dbpath)[source]

Bases: object

Interface for accessing OrgDb SQLite databases in Python.

__enter__()[source]
__exit__(exc_type, exc_val, exc_tb)[source]
__init__(dbpath)[source]

Initialize the OrgDb object.

Parameters:

dbpath (str) – Path to the SQLite database file.

close()[source]

Close the database connection.

columns()[source]

List all available columns/keytypes.

Return type:

List[str]

genes()[source]

Retrieve gene locations as GenomicRanges.

Requires ‘chromosome_locations’ table in the DB.

Return type:

GenomicRanges

keys(keytype)[source]

Return keys for the given keytype.

Return type:

List[str]

keytypes()[source]

List all available keytypes (same as columns).

Return type:

List[str]

mapIds(keys, column, keytype, multiVals='first')[source]

Map keys to a specific column. A wrapper around select.

Parameters:
  • keys (Union[List[str], str]) – Keys to map.

  • column (str) – The column to map to.

  • keytype (str) – The ID type of the keys.

  • multiVals (str) – How to handle multiple values (‘first’, ‘list’, ‘filter’).

Return type:

Union[dict, list]

property metadata: BiocFrame

Get the metadata table from the database.

select(keys, columns, keytype)[source]

Retrieve data from the database.

Parameters:
  • keys (Union[List[str], str]) – A list of keys to select.

  • columns (Union[List[str], str]) – List of columns to retrieve.

  • keytype (str) – The type of the provided keys (must be one of columns()).

Return type:

BiocFrame

property species: str

Get the organism/species name from metadata.

orgdb.orgdbregistry module

class orgdb.orgdbregistry.OrgDbRegistry(cache_dir=None, force=False)[source]

Bases: object

Registry for OrgDb resources, dynamically populated from AnnotationHub.

__init__(cache_dir=None, force=False)[source]

Initialize the OrgDb registry.

Parameters:
  • cache_dir (Union[str, Path, None]) – Directory for the BiocFileCache database and cached files. If None, defaults to “~/.cache/orgdb_bfc”.

  • force (bool) – If True, force re-download of the AnnotationHub metadata database.

download(orgdb_id, force=False)[source]

Download and cache the OrgDb file.

Parameters:
  • orgdb_id (str) – The OrgDb ID to fetch.

  • force (bool) – If True, forces re-download even if already cached.

Return type:

str

Returns:

Local filesystem path to the cached file.

get_record(orgdb_id)[source]

Get the metadata record for a given OrgDb ID.

Parameters:

orgdb_id (str) – The OrgDb ID to look up (e.g., ‘org.Hs.eg.db’).

Return type:

OrgDbRecord

Returns:

A OrgDbRecord object containing metadata.

Raises:

KeyError – If the ID is not found.

list_orgdb()[source]

List all available OrgDb IDs (e.g., ‘org.Hs.eg.db’).

Return type:

List[str]

Returns:

A sorted list of valid OrgDb ID strings.

load_db(orgdb_id, force=False)[source]

Load an OrgDb object for the given ID.

Parameters:
  • orgdb_id (str) – The ID of the OrgDb to load.

  • force (bool) – If True, forces re-download of the database file.

Return type:

OrgDb

Returns:

An initialized OrgDb object.

orgdb.record module

class orgdb.record.OrgDbRecord(orgdb_id, release_date, url, species=None, id_type=None, bioc_version=None)[source]

Bases: object

Container for a single OrgDb entry.

__annotations__ = {'bioc_version': 'Optional[str]', 'id_type': 'Optional[str]', 'orgdb_id': 'str', 'release_date': 'Optional[date]', 'species': 'Optional[str]', 'url': 'str'}
__dataclass_fields__ = {'bioc_version': Field(name='bioc_version',type='Optional[str]',default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'id_type': Field(name='id_type',type='Optional[str]',default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'orgdb_id': Field(name='orgdb_id',type='str',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'release_date': Field(name='release_date',type='Optional[date]',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'species': Field(name='species',type='Optional[str]',default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'url': Field(name='url',type='str',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}
__dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=True,match_args=True,kw_only=False,slots=False,weakref_slot=False)
__delattr__(name)

Implement delattr(self, name).

__eq__(other)

Return self==value.

__hash__()

Return hash(self).

__init__(orgdb_id, release_date, url, species=None, id_type=None, bioc_version=None)
__match_args__ = ('orgdb_id', 'release_date', 'url', 'species', 'id_type', 'bioc_version')
__repr__()

Return repr(self).

__setattr__(name, value)

Implement setattr(self, name, value).

bioc_version: Optional[str] = None
classmethod from_config_entry(orgdb_id, entry)[source]
Return type:

OrgDbRecord

Build a record from a ORGDB_CONFIG entry: {

“release_date”: “YYYY-MM-DD”, # optional “url”: “https://…”

}

id_type: Optional[str] = None
orgdb_id: str
release_date: Optional[date]
species: Optional[str] = None
url: str

Module contents

class orgdb.OrgDb(dbpath)[source]

Bases: object

Interface for accessing OrgDb SQLite databases in Python.

__annotations__ = {}
__enter__()[source]
__exit__(exc_type, exc_val, exc_tb)[source]
__init__(dbpath)[source]

Initialize the OrgDb object.

Parameters:

dbpath (str) – Path to the SQLite database file.

close()[source]

Close the database connection.

columns()[source]

List all available columns/keytypes.

Return type:

List[str]

genes()[source]

Retrieve gene locations as GenomicRanges.

Requires ‘chromosome_locations’ table in the DB.

Return type:

GenomicRanges

keys(keytype)[source]

Return keys for the given keytype.

Return type:

List[str]

keytypes()[source]

List all available keytypes (same as columns).

Return type:

List[str]

mapIds(keys, column, keytype, multiVals='first')[source]

Map keys to a specific column. A wrapper around select.

Parameters:
  • keys (Union[List[str], str]) – Keys to map.

  • column (str) – The column to map to.

  • keytype (str) – The ID type of the keys.

  • multiVals (str) – How to handle multiple values (‘first’, ‘list’, ‘filter’).

Return type:

Union[dict, list]

property metadata: BiocFrame

Get the metadata table from the database.

select(keys, columns, keytype)[source]

Retrieve data from the database.

Parameters:
  • keys (Union[List[str], str]) – A list of keys to select.

  • columns (Union[List[str], str]) – List of columns to retrieve.

  • keytype (str) – The type of the provided keys (must be one of columns()).

Return type:

BiocFrame

property species: str

Get the organism/species name from metadata.

class orgdb.OrgDbRecord(orgdb_id, release_date, url, species=None, id_type=None, bioc_version=None)[source]

Bases: object

Container for a single OrgDb entry.

__annotations__ = {'bioc_version': 'Optional[str]', 'id_type': 'Optional[str]', 'orgdb_id': 'str', 'release_date': 'Optional[date]', 'species': 'Optional[str]', 'url': 'str'}
__dataclass_fields__ = {'bioc_version': Field(name='bioc_version',type='Optional[str]',default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'id_type': Field(name='id_type',type='Optional[str]',default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'orgdb_id': Field(name='orgdb_id',type='str',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'release_date': Field(name='release_date',type='Optional[date]',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'species': Field(name='species',type='Optional[str]',default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'url': Field(name='url',type='str',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}
__dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=True,match_args=True,kw_only=False,slots=False,weakref_slot=False)
__delattr__(name)

Implement delattr(self, name).

__eq__(other)

Return self==value.

__hash__()

Return hash(self).

__init__(orgdb_id, release_date, url, species=None, id_type=None, bioc_version=None)
__match_args__ = ('orgdb_id', 'release_date', 'url', 'species', 'id_type', 'bioc_version')
__repr__()

Return repr(self).

__setattr__(name, value)

Implement setattr(self, name, value).

bioc_version: Optional[str] = None
classmethod from_config_entry(orgdb_id, entry)[source]
Return type:

OrgDbRecord

Build a record from a ORGDB_CONFIG entry: {

“release_date”: “YYYY-MM-DD”, # optional “url”: “https://…”

}

id_type: Optional[str] = None
orgdb_id: str
release_date: Optional[date]
species: Optional[str] = None
url: str
class orgdb.OrgDbRegistry(cache_dir=None, force=False)[source]

Bases: object

Registry for OrgDb resources, dynamically populated from AnnotationHub.

__annotations__ = {}
__init__(cache_dir=None, force=False)[source]

Initialize the OrgDb registry.

Parameters:
  • cache_dir (Union[str, Path, None]) – Directory for the BiocFileCache database and cached files. If None, defaults to “~/.cache/orgdb_bfc”.

  • force (bool) – If True, force re-download of the AnnotationHub metadata database.

download(orgdb_id, force=False)[source]

Download and cache the OrgDb file.

Parameters:
  • orgdb_id (str) – The OrgDb ID to fetch.

  • force (bool) – If True, forces re-download even if already cached.

Return type:

str

Returns:

Local filesystem path to the cached file.

get_record(orgdb_id)[source]

Get the metadata record for a given OrgDb ID.

Parameters:

orgdb_id (str) – The OrgDb ID to look up (e.g., ‘org.Hs.eg.db’).

Return type:

OrgDbRecord

Returns:

A OrgDbRecord object containing metadata.

Raises:

KeyError – If the ID is not found.

list_orgdb()[source]

List all available OrgDb IDs (e.g., ‘org.Hs.eg.db’).

Return type:

List[str]

Returns:

A sorted list of valid OrgDb ID strings.

load_db(orgdb_id, force=False)[source]

Load an OrgDb object for the given ID.

Parameters:
  • orgdb_id (str) – The ID of the OrgDb to load.

  • force (bool) – If True, forces re-download of the database file.

Return type:

OrgDb

Returns:

An initialized OrgDb object.