txdb package

Submodules

txdb.record module

class txdb.record.TxDbRecord(txdb_id, release_date, url, organism=None, source=None, build=None, bioc_version=None)[source]

Bases: object

Container for a single TxDb entry.

__annotations__ = {'bioc_version': 'Optional[str]', 'build': 'Optional[str]', 'organism': 'Optional[str]', 'release_date': 'Optional[date]', 'source': 'Optional[str]', 'txdb_id': 'str', 'url': 'str'}
__dataclass_fields__ = {'bioc_version': Field(name='bioc_version',type='Optional[str]',default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'build': Field(name='build',type='Optional[str]',default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'organism': Field(name='organism',type='Optional[str]',default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'release_date': Field(name='release_date',type='Optional[date]',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'source': Field(name='source',type='Optional[str]',default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'txdb_id': Field(name='txdb_id',type='str',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'url': Field(name='url',type='str',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}
__dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=True,match_args=True,kw_only=False,slots=False,weakref_slot=False)
__delattr__(name)

Implement delattr(self, name).

__eq__(other)

Return self==value.

__hash__()

Return hash(self).

__init__(txdb_id, release_date, url, organism=None, source=None, build=None, bioc_version=None)
__match_args__ = ('txdb_id', 'release_date', 'url', 'organism', 'source', 'build', 'bioc_version')
__repr__()

Return repr(self).

__setattr__(name, value)

Implement setattr(self, name, value).

bioc_version: Optional[str] = None
build: Optional[str] = None
classmethod from_config_entry(txdb_id, entry)[source]
Return type:

TxDbRecord

Build a record from a TXDB_CONFIG entry: {

“release_date”: “YYYY-MM-DD”, # optional “url”: “https://…”

}

organism: Optional[str] = None
release_date: Optional[date]
source: Optional[str] = None
txdb_id: str
url: str

txdb.txdb module

class txdb.txdb.TxDb(dbpath)[source]

Bases: object

Interface for accessing TxDb SQLite databases in Python.

__enter__()[source]
__exit__(exc_type, exc_val, exc_tb)[source]
__init__(dbpath)[source]

Initialize the TxDb object.

Parameters:

dbpath (str) – Path to the SQLite database file.

cds(filter=None)[source]

Retrieve coding sequences (CDS) as a GenomicRanges object.

Parameters:

filter (Optional[dict]) – Dictionary of filters.

Return type:

GenomicRanges

Returns:

GenomicRanges object containing CDS regions.

cds_by_overlaps(query)[source]

Retrieve cds that overlap with the query ranges.

Parameters:

query (GenomicRanges) – Query genomic ranges.

Return type:

GenomicRanges

Returns:

GenomicRanges object of overlapping cds.

close()[source]

Close the database connection.

exons(filter=None)[source]

Retrieve exons as a GenomicRanges object.

Parameters:

filter (Optional[dict]) – Dictionary of filters.

Return type:

GenomicRanges

Returns:

GenomicRanges object containing exons.

exons_by_overlaps(query)[source]

Retrieve exons that overlap with the query ranges.

Parameters:

query (GenomicRanges) – Query genomic ranges.

Return type:

GenomicRanges

Returns:

GenomicRanges object of overlapping exons.

genes(single_strand_only=True)[source]

Retrieve genes as a GenomicRanges object.

Aggregates transcripts by gene_id and calculates the genomic range (min start to max end) for each gene.

Parameters:

single_strand_only (bool) – If True, genes spanning multiple chromosomes or strands are dropped. Defaults to True.

Return type:

GenomicRanges

Returns:

GenomicRanges object containing gene extents.

promoters(upstream=2000, downstream=200)[source]

Retrieve promoter regions for transcripts.

Parameters:
  • upstream (int) – Number of bases upstream of TSS. Defaults to 2000.

  • downstream (int) – Number of bases downstream of TSS. Defaults to 200.

Return type:

GenomicRanges

Returns:

GenomicRanges object containing promoters.

property seqinfo: SeqInfo

Get the sequence information from the database.

Returns:

A SeqInfo object representing the chrominfo table.

transcript_lengths(with_cds_len=False, with_utr5_len=False, with_utr3_len=False)[source]

Calculate lengths of transcripts, and optionally CDS and UTRs.

Parameters:
  • with_cds_len (bool) – Include CDS length.

  • with_utr5_len (bool) – Include 5’ UTR length (Not yet implemented).

  • with_utr3_len (bool) – Include 3’ UTR length (Not yet implemented).

Returns:

tx_id, tx_name, gene_id, nexon, tx_len.

Return type:

BiocFrame with columns

transcripts(filter=None)[source]

Retrieve transcripts as a GenomicRanges object.

Parameters:

filter (Optional[dict]) – Dictionary of filters (e.g. {‘tx_chrom’: ‘chr1’}).

Return type:

GenomicRanges

Returns:

GenomicRanges object containing transcripts.

transcripts_by_overlaps(query)[source]

Retrieve transcripts that overlap with the query ranges.

Parameters:

query (GenomicRanges) – Query genomic ranges.

Return type:

GenomicRanges

Returns:

GenomicRanges object of overlapping transcripts.

txdb.txdbregistry module

class txdb.txdbregistry.TxDbRegistry(cache_dir=None, force=False)[source]

Bases: object

Registry for TxDb resources, populated from AnnotationHub.

__init__(cache_dir=None, force=False)[source]

Initialize the TxDB registry.

Parameters:
  • cache_dir (Union[str, Path, None]) – Directory for the BiocFileCache database and cached files. If None, defaults to “~/.cache/txdb_bfc”.

  • force (bool) – If True, force re-download of the AnnotationHub metadata database.

download(txdb_id, force=False)[source]

Download and cache the TxDb file.

Parameters:
  • txdb_id (str) – The TxDb ID to fetch.

  • force (bool) – If True, forces re-download even if already cached. Defaults to False.

Return type:

str

Returns:

Local filesystem path to the cached file.

get_record(txdb_id)[source]

Get the metadata record for a given TxDb ID.

Parameters:

txdb_id (str) – The TxDb ID to look up.

Return type:

TxDbRecord

Returns:

A TxDbRecord object containing metadata.

Raises:

KeyError – If the ID is not found in the configuration.

list_txdb()[source]

List all available TxDb IDs.

Return type:

list[str]

Returns:

A list of valid TxDb ID strings.

load_db(txdb_id, force=False)[source]

Load a TxDb object for the given ID.

If the resource is already downloaded and valid, it returns the local copy immediately (unless force=True).

Parameters:
  • txdb_id (str) – The ID of the TxDb to load.

  • force (bool) – If True, forces re-download of the database file.

Return type:

TxDb

Returns:

An initialized TxDb object connected to the cached database.

Module contents

class txdb.TxDb(dbpath)[source]

Bases: object

Interface for accessing TxDb SQLite databases in Python.

__annotations__ = {}
__enter__()[source]
__exit__(exc_type, exc_val, exc_tb)[source]
__init__(dbpath)[source]

Initialize the TxDb object.

Parameters:

dbpath (str) – Path to the SQLite database file.

cds(filter=None)[source]

Retrieve coding sequences (CDS) as a GenomicRanges object.

Parameters:

filter (Optional[dict]) – Dictionary of filters.

Return type:

GenomicRanges

Returns:

GenomicRanges object containing CDS regions.

cds_by_overlaps(query)[source]

Retrieve cds that overlap with the query ranges.

Parameters:

query (GenomicRanges) – Query genomic ranges.

Return type:

GenomicRanges

Returns:

GenomicRanges object of overlapping cds.

close()[source]

Close the database connection.

exons(filter=None)[source]

Retrieve exons as a GenomicRanges object.

Parameters:

filter (Optional[dict]) – Dictionary of filters.

Return type:

GenomicRanges

Returns:

GenomicRanges object containing exons.

exons_by_overlaps(query)[source]

Retrieve exons that overlap with the query ranges.

Parameters:

query (GenomicRanges) – Query genomic ranges.

Return type:

GenomicRanges

Returns:

GenomicRanges object of overlapping exons.

genes(single_strand_only=True)[source]

Retrieve genes as a GenomicRanges object.

Aggregates transcripts by gene_id and calculates the genomic range (min start to max end) for each gene.

Parameters:

single_strand_only (bool) – If True, genes spanning multiple chromosomes or strands are dropped. Defaults to True.

Return type:

GenomicRanges

Returns:

GenomicRanges object containing gene extents.

promoters(upstream=2000, downstream=200)[source]

Retrieve promoter regions for transcripts.

Parameters:
  • upstream (int) – Number of bases upstream of TSS. Defaults to 2000.

  • downstream (int) – Number of bases downstream of TSS. Defaults to 200.

Return type:

GenomicRanges

Returns:

GenomicRanges object containing promoters.

property seqinfo: SeqInfo

Get the sequence information from the database.

Returns:

A SeqInfo object representing the chrominfo table.

transcript_lengths(with_cds_len=False, with_utr5_len=False, with_utr3_len=False)[source]

Calculate lengths of transcripts, and optionally CDS and UTRs.

Parameters:
  • with_cds_len (bool) – Include CDS length.

  • with_utr5_len (bool) – Include 5’ UTR length (Not yet implemented).

  • with_utr3_len (bool) – Include 3’ UTR length (Not yet implemented).

Returns:

tx_id, tx_name, gene_id, nexon, tx_len.

Return type:

BiocFrame with columns

transcripts(filter=None)[source]

Retrieve transcripts as a GenomicRanges object.

Parameters:

filter (Optional[dict]) – Dictionary of filters (e.g. {‘tx_chrom’: ‘chr1’}).

Return type:

GenomicRanges

Returns:

GenomicRanges object containing transcripts.

transcripts_by_overlaps(query)[source]

Retrieve transcripts that overlap with the query ranges.

Parameters:

query (GenomicRanges) – Query genomic ranges.

Return type:

GenomicRanges

Returns:

GenomicRanges object of overlapping transcripts.

class txdb.TxDbRecord(txdb_id, release_date, url, organism=None, source=None, build=None, bioc_version=None)[source]

Bases: object

Container for a single TxDb entry.

__annotations__ = {'bioc_version': 'Optional[str]', 'build': 'Optional[str]', 'organism': 'Optional[str]', 'release_date': 'Optional[date]', 'source': 'Optional[str]', 'txdb_id': 'str', 'url': 'str'}
__dataclass_fields__ = {'bioc_version': Field(name='bioc_version',type='Optional[str]',default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'build': Field(name='build',type='Optional[str]',default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'organism': Field(name='organism',type='Optional[str]',default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'release_date': Field(name='release_date',type='Optional[date]',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'source': Field(name='source',type='Optional[str]',default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'txdb_id': Field(name='txdb_id',type='str',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'url': Field(name='url',type='str',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}
__dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=True,match_args=True,kw_only=False,slots=False,weakref_slot=False)
__delattr__(name)

Implement delattr(self, name).

__eq__(other)

Return self==value.

__hash__()

Return hash(self).

__init__(txdb_id, release_date, url, organism=None, source=None, build=None, bioc_version=None)
__match_args__ = ('txdb_id', 'release_date', 'url', 'organism', 'source', 'build', 'bioc_version')
__repr__()

Return repr(self).

__setattr__(name, value)

Implement setattr(self, name, value).

bioc_version: Optional[str] = None
build: Optional[str] = None
classmethod from_config_entry(txdb_id, entry)[source]
Return type:

TxDbRecord

Build a record from a TXDB_CONFIG entry: {

“release_date”: “YYYY-MM-DD”, # optional “url”: “https://…”

}

organism: Optional[str] = None
release_date: Optional[date]
source: Optional[str] = None
txdb_id: str
url: str
class txdb.TxDbRegistry(cache_dir=None, force=False)[source]

Bases: object

Registry for TxDb resources, populated from AnnotationHub.

__annotations__ = {}
__init__(cache_dir=None, force=False)[source]

Initialize the TxDB registry.

Parameters:
  • cache_dir (Union[str, Path, None]) – Directory for the BiocFileCache database and cached files. If None, defaults to “~/.cache/txdb_bfc”.

  • force (bool) – If True, force re-download of the AnnotationHub metadata database.

download(txdb_id, force=False)[source]

Download and cache the TxDb file.

Parameters:
  • txdb_id (str) – The TxDb ID to fetch.

  • force (bool) – If True, forces re-download even if already cached. Defaults to False.

Return type:

str

Returns:

Local filesystem path to the cached file.

get_record(txdb_id)[source]

Get the metadata record for a given TxDb ID.

Parameters:

txdb_id (str) – The TxDb ID to look up.

Return type:

TxDbRecord

Returns:

A TxDbRecord object containing metadata.

Raises:

KeyError – If the ID is not found in the configuration.

list_txdb()[source]

List all available TxDb IDs.

Return type:

list[str]

Returns:

A list of valid TxDb ID strings.

load_db(txdb_id, force=False)[source]

Load a TxDb object for the given ID.

If the resource is already downloaded and valid, it returns the local copy immediately (unless force=True).

Parameters:
  • txdb_id (str) – The ID of the TxDb to load.

  • force (bool) – If True, forces re-download of the database file.

Return type:

TxDb

Returns:

An initialized TxDb object connected to the cached database.