pybiocfilecache package

Submodules

pybiocfilecache.cache module

class pybiocfilecache.cache.BiocFileCache(cache_dir=None, config=None)[source]

Bases: object

Enhanced file caching module.

Features: - Resource validation and integrity checking - Cache size management - Cleanup of expired resources

__enter__()[source]
Return type:

BiocFileCache

__exit__(exc_type, exc_val, exc_tb)[source]
Return type:

None

__init__(cache_dir=None, config=None)[source]

Initialize cache with optional configuration.

Parameters:
__len__()[source]
add(rname, fpath, rtype='relative', action='copy', expires=None, download=True, ext=True)[source]

Add a resource to the cache.

Parameters:
  • rname (str) – Name to identify the resource in cache.

  • fpath (Union[str, Path]) – Path to the source file.

  • rtype (Literal['local', 'web', 'relative']) – Type of resource. One of local, web, or relative. Defaults to local.

  • action (Literal['copy', 'move', 'asis']) – How to handle the file (“copy”, “move”, or “asis”). Defaults to copy.

  • download (bool) – Whether to download the resource. Only used if ‘rtype’ is “web”.

  • expires (Optional[datetime]) – Optional expiration datetime. If None, resource never expires.

  • ext (bool) – Whether to use filepath extension when storing in cache. Defaults to True.

Return type:

Resource

Returns:

The Resource object added to the cache.

add_batch(resources)[source]

Add multiple resources in a single transaction.

Parameters:

resources (List[Dict[str, Any]]) – List of resources to add.

Return type:

List[Resource]

add_metadata(key, value)[source]

Add a new metadata key

check_metadata_key(key)[source]

Check if a key exists in the metadata table.

Parameters:

key (str) – Key to search.

Return type:

bool

Returns:

True if the key exists, else False.

cleanup()[source]

Remove expired resources from the cache.

Return type:

int

Returns:

Number of resources removed.

Note

  • If cleanup_interval is None, this method will still run if called explicitly.

  • Only removes resources with non-None expiration dates.

close()[source]

Clean up resources.

Return type:

None

get(rname=None, rid=None)[source]

Get resource by name from cache.

Parameters:
  • rname (str) – Name to identify the resource in cache.

  • rid (str) – Resource id to search by.

Return type:

Optional[Resource]

get_metadata(key)[source]

Add a new metadata key

get_session()[source]

Provide database session with automatic cleanup.

Return type:

Iterator[Session]

get_stats()[source]

Get statistics about the cache.

Return type:

Dict[str, Any]

list_resources(rtype=None, expired=None)[source]

List resources in the cache with optional filtering.

Parameters:
  • rtype (Optional[str]) – Filter resources by type.

  • expired (Optional[bool]) –

    Filter by expiration status

    True: only expired resources False: only non-expired resources None: all resources

    Note: Resources with no expiration are always considered non-expired.

Return type:

List[Resource]

Returns:

List of Resource objects matching the filters

purge(force=False)[source]

Remove all resources from cache and reset database.

Parameters:

force (bool) – If True, skip validation and remove all files even if database operations fail.

Return type:

bool

Returns:

True if purge was successful, False otherwise.

Raises:

Exception – If purge fails and force=False.

remove(rname)[source]

Remove a resource from cache by name.

Removes both the cached file and its database entry.

Parameters:

rname (str) – Name to identify the resource in cache.

Raises:

Exception – If resource removal fails

Return type:

None

remove_metadata(key)[source]

Remove a metadata key.

Return type:

None

search(query, field='rname', exact=False)[source]

Search for resources by field value.

Parameters:
  • query (str) – Search string.

  • field (str) – Resource field to search (“rname”, “rtype”, etc.).

  • exact (bool) – Whether to require exact match.

Return type:

List[Resource]

Returns:

List of matching resources.

update(rname, fpath, action='copy')[source]

Update an existing resource.

Parameters:
  • rname (str) – Name to identify the resource in cache.

  • fpath (Union[str, Path]) – Path to the new source file.

  • action (Literal['copy', 'move', 'asis']) – Either copy, move or asis. Defaults to copy.

Return type:

Resource

Returns:

Updated Resource object.

validate_resource(resource)[source]

Validate resource integrity.

Parameters:

resource (Resource) – Resource to validate.

Return type:

bool

Returns:

True if resource is valid, False otherwise.

verify_cache()[source]

Verify integrity of all cached resources.

Return type:

Tuple[int, int]

Returns:

Tuple of (valid_count, invalid_count).

pybiocfilecache.config module

class pybiocfilecache.config.CacheConfig(cache_dir, cleanup_interval=None, rname_pattern='^[a-zA-Z0-9_-]+$', hash_algorithm='md5')[source]

Bases: object

Configuration for BiocFileCache.

cache_dir

Directory to store cached files.

cleanup_interval

How often to run expired resource cleanup. None for no cleanup.

rname_pattern

Regex pattern for valid resource names.

hash_algorithm

Algorithm to use for file checksums.

__dataclass_fields__ = {'cache_dir': Field(name='cache_dir',type=<class 'pathlib.Path'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'cleanup_interval': Field(name='cleanup_interval',type=typing.Optional[datetime.timedelta],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'hash_algorithm': Field(name='hash_algorithm',type=<class 'str'>,default='md5',default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'rname_pattern': Field(name='rname_pattern',type=<class 'str'>,default='^[a-zA-Z0-9_-]+$',default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}
__dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)
__eq__(other)

Return self==value.

__hash__ = None
__init__(cache_dir, cleanup_interval=None, rname_pattern='^[a-zA-Z0-9_-]+$', hash_algorithm='md5')
__match_args__ = ('cache_dir', 'cleanup_interval', 'rname_pattern', 'hash_algorithm')
__repr__()

Return repr(self).

cache_dir: Path
cleanup_interval: Optional[timedelta] = None
hash_algorithm: str = 'md5'
rname_pattern: str = '^[a-zA-Z0-9_-]+$'

pybiocfilecache.const module

pybiocfilecache.models module

class pybiocfilecache.models.Metadata(**kwargs)[source]

Bases: Base

Database metadata information.

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

__mapper__ = <Mapper at 0x7f93e6c64550; Metadata>
__repr__()[source]

Return repr(self).

Return type:

str

__table__ = Table('metadata', MetaData(), Column('key', Text(), table=<metadata>, primary_key=True, nullable=False), Column('value', Text(), table=<metadata>), schema=None)
__tablename__ = 'metadata'
key
value
class pybiocfilecache.models.Resource(**kwargs)[source]

Bases: Base

Resource information stored in cache.

id

Auto-incrementing primary key.

rid

Unique resource identifier (UUID).

rname

User-provided resource name.

create_time

When the resource was first added.

access_time

Last time the resource was accessed.

rpath

Path to the resource in the cache.

rtype

Type of resource (local, web, relative).

fpath

Original file path.

last_modified_time

Last time the resource was modified.

etag

Checksum/hash of the resource.

expires

When the resource should be considered expired.

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

__mapper__ = <Mapper at 0x7f93e5b81a50; Resource>
__repr__()[source]

Return repr(self).

Return type:

str

__table__ = Table('resource', MetaData(), Column('id', Integer(), table=<resource>, primary_key=True, nullable=False), Column('rid', Text(), table=<resource>), Column('rname', Text(), table=<resource>), Column('create_time', DateTime(), table=<resource>, server_default=DefaultClause(<sqlalchemy.sql.functions.now at 0x7f93e5b80750; now>, for_update=False)), Column('access_time', DateTime(), table=<resource>, server_default=DefaultClause(<sqlalchemy.sql.functions.now at 0x7f93e5b808d0; now>, for_update=False)), Column('rpath', Text(), table=<resource>), Column('rtype', Text(), table=<resource>), Column('fpath', Text(), table=<resource>), Column('last_modified_time', DateTime(), table=<resource>, onupdate=ColumnElementColumnDefault(<sqlalchemy.sql.functions.now at 0x7f93e5b80f10; now>)), Column('etag', Text(), table=<resource>), Column('expires', DateTime(), table=<resource>), schema=None)
__tablename__ = 'resource'
access_time
create_time
etag
expires
fpath
id
last_modified_time
rid
rname
rpath
rtype

pybiocfilecache.utils module

pybiocfilecache.utils.calculate_file_hash(path, algorithm='md5')[source]

Calculate file checksum.

Return type:

str

pybiocfilecache.utils.compress_file(source, target)[source]

Compress file using zlib.

Return type:

None

pybiocfilecache.utils.copy_or_move(source, target, rname, action='copy', compress=False)[source]

Copy or move a resource.

Return type:

None

pybiocfilecache.utils.create_tmp_dir()[source]

Create a temporary directory.

Return type:

Path

pybiocfilecache.utils.decompress_file(source, target)[source]

Decompress file using zlib.

Return type:

None

pybiocfilecache.utils.download_web_file(url, filename, download)[source]
pybiocfilecache.utils.generate_id(size)[source]

Generate unique identifier.

Return type:

str

pybiocfilecache.utils.generate_uuid()[source]

Generate unique identifier.

Return type:

str

pybiocfilecache.utils.get_file_size(path)[source]

Get file size in bytes.

Return type:

int

pybiocfilecache.utils.validate_rname(rname, pattern)[source]

Validate resource name format.

Return type:

bool

Module contents