celldex package¶

Submodules¶

celldex.fetch_reference module¶

celldex.fetch_reference.fetch_metadata(name: str, version: str, path: str | None = None, package: str = 'celldex', cache_dir: str = '/home/runner/gypsum/cache', overwrite: bool = False)[source]¶

Fetch metadata for a reference from the gypsum backend.

See also

fetch_reference(), to fetch a reference.

Example:

meta = fetch_metadata("immgen", "2024-02-26")

Parameters:

name – Name of the reference dataset.
version – Version of the reference dataset.
path – Path to a subdataset, if name contains multiple datasets. Defaults to None.
package – Name of the package. Defaults to “celldex”.
cache_dir – Path to the cache directory.
overwrite – Whether to overwrite existing files. Defaults to False.

Returns:

Dictionary containing metadata for the specified dataset.

celldex.fetch_reference.fetch_reference(name: str, version: str, path: str | None = None, package: str = 'celldex', cache_dir: str = '/home/runner/gypsum/cache', overwrite: bool = False, realize_assays: bool = False, **kwargs) → SummarizedExperiment[source]¶

Fetch a reference dataset from the gypsum backend.

See also

metadata index, on the expected schema for the metadata.

save_reference() and upload_directory(), to save and upload a reference.

list_references() and list_versions(), to get possible values for name and version.

fetch_metadata(), to fetch the metadata for the reference.

Example

ref = fetch_reference("immgen", "2024-02-26")

Parameters:

name – Name of the reference dataset.
version – Version of the reference dataset.
path – Path to a subdataset, if name contains multiple datasets. Defaults to None.
package – Name of the package. Defaults to “celldex”.
cache_dir – Path to cache directory.
overwrite – Whether to overwrite existing files. Defaults to False.
realize_assays – Whether to realize assays into memory. Defaults to False.
**kwargs – Further arguments to pass to read_object().

Returns:

The dataset as a SummarizedExperiment or one of its subclasses.

celldex.list_references module¶

celldex.list_references.list_references(cache_dir: str = '/home/runner/gypsum/cache', overwrite: bool = False, latest: bool = True) → DataFrame[source]¶

List all available reference datasets.

Example

refs = list_references()

Parameters:

cache_dir – Path to cache directory.
overwrite – Whether to overwrite the database in cache. Defaults to False.
latest – Whether to only fetch the latest version of each reference. Defaults to True.

Returns:

A DataFrame where each row corresponds to a reference dataset. Each row contains title and description for each reference, the number of rows and columns, the organisms and genome builds involved, whether the dataset has any pre-computed reduced dimensions, and so on. More details can be found in the Bioconductor metadata schema.

celldex.list_versions module¶

celldex.list_versions.fetch_latest_version(name: str) → str[source]¶

Fetch the latest version for a reference from the gypsum backend.

See also

fetch_reference(), to fetch a reference.

fetch_metadata(), to fetch the metadata for the reference.

Example:

meta = fetch_latest_version("immgen")

Parameters:: name – Name of the reference.
Returns:: String specifying the latest version for the reference.

celldex.list_versions.list_versions(name: str) → List[str][source]¶

List all available versions for a reference.

Example

versions = list_versions("immgen")

Parameters:: name – Name of the reference dataset.
Returns:: A list of version names.

celldex.save_reference module¶

celldex.save_reference.save_reference(x: Any, labels: List[str], path: str, metadata: dict)[source]¶

celldex.save_reference.save_reference(x: SummarizedExperiment, path: str, metadata: dict)

Save a reference dataset to disk.

Parameters:

x –
An object containing reference data. May be a SummarizedExperiment containing a assay matricx called logcounts of log-normalized expression values.

Each row of column_data corresponds to a column of x and contains the label(s) for that column. Each column of labels represents a different label type; typically, the column name has a label. prefix to distinguish between, e.g., label.fine, label.broad and so on.

At least one column should be present.
path – Path to a new directory to save the dataset.
metadata –
Dictionary containing the metadata for this dataset. see the schema returned by fetch_metadata_schema().

Note that the applications.takane property will be automatically added by this function and does not have to be supplied.

See also

metadata index, on the expected schema for the metadata.

upload_reference(), to upload the saved contents.

Example

# create a summarized experiment object
mat = np.random.poisson(1, (100, 10))
row_names = [f"GENE_{i}" for i in range(mat.shape[0])]
col_names = list("ABCDEFGHIJ")
sce = SummarizedExperiment(
    assays={"logcounts": mat},
    row_data=BiocFrame(row_names=row_names),
    column_data=BiocFrame({
      "label.fine": col_names
    }),
)

# Provide metadata for search and findability
meta = {
    "title": "New reference dataset",
    "description": "This is a new reference dataset",
    "taxonomy_id": ["10090"], # NCBI ID
    "genome": ["GRCm38"], # genome build
    "sources": [{"provider": "GEO", "id": "GSE12345"}],
    "maintainer_name": "Jayaram kancherla",
    "maintainer_email": "jayaram.kancherla@gmail.com",
}

import shutil
import tempfile

cache_dir = tempfile.mkdtemp()

# Make sure the directory is clean
shutil.rmtree(cache_dir)

# Save the reference
celldex.save_reference(sce, cache_dir, meta)

celldex.save_reference.save_reference_se(x: SummarizedExperiment, path: str, metadata: dict)[source]¶: Save SummarizedExperiment to disk.

celldex.search_references module¶

celldex.search_references.search_references(query: str | GypsumSearchClause, cache_dir: str = '/home/runner/gypsum/cache', overwrite: bool = False, latest: bool = True) → DataFrame[source]¶

Search for reference datasets of interest based on matching text in the associated metadata.

This is a wrapper around search_metadata_text().

The returned DataFrame contains the usual suspects like the title and description for each dataset, the number of rows and columns, the organisms and genome builds involved, whether the dataset has any pre-computed reduced dimensions, and so on.

More details can be found in the Bioconductor metadata index.

See also

list_references(), to list all available datasets.

search_metadata_text(), to search metadata.

Examples:

res = search_references("human")
res = search_references(define_text_query("Immun%", partial="True"))
res = search_references(define_text_query("10090", field="taxonomy_id"))

Parameters:

query – The search query string or a GypsumSearchClause for more complex queries.
cache_directory – Path to cache directory.
overwrite – Whether to overwrite the existing cache. Defaults to False.
latest – Whether to fetch only the latest versions of datasets. Defaults to True.

Returns:

A DataFrame where each row corresponds to a dataset, containing various columns of metadata. Some columns may be lists to capture 1:many mappings.

celldex.upload_reference module¶

celldex.upload_reference.upload_reference(directory: str, name: str, version: str, package: str = 'celldex', cache_dir: str = '/home/runner/gypsum/cache', deduplicate: bool = True, probation: bool = False, url: str = 'https://gypsum.artifactdb.com', token: str | None = None, concurrent: int = 1, abort_failed: bool = True)[source]¶

Upload the reference dataset to the gypsum bucket.

This is a wrapper around upload_directory() specific to the celldex package.

See also

upload_directory(), to upload a directory to the gypsum backend.

Parameters:

Name – Reference dataset name.
version – Version name for the reference.
directory – Path to a directory containing the files to be uploaded. This directory is assumed to correspond to a version of an asset.
cache_dir –
Path to the cache for saving files, e.g., in save_version().

Used to convert symbolic links to upload links,see prepare_directory_upload().
deduplicate – Whether the backend should attempt deduplication of files in the immediately previous version. Defaults to True.
probation – Whether to perform a probational upload. Defaults to False.
url – URL of the gypsum REST API.
token – GitHub access token to authenticate to the gypsum REST API.
concurrent – Number of concurrent downloads. Defaults to 1.
abort_failed –
Whether to abort the upload on any failure.

Setting this to False can be helpful for diagnosing upload problems.

Returns:

True if successfull, otherwise False.

celldex.utils module¶

celldex.utils.celldex_load_object(path: str, metadata: dict | None = None, celldex_realize_assays: bool = False, **kwargs)[source]¶

Load a SummarizedExperiment object from a file.

Parameters:

path – Path to the reference dataset.
metadata –
Metadata for the reference dataset.

Defaults to None.
celldex_realize_assays – Whether to realize assays into memory. Defaults to False.
**kwargs – Further arguments to pass to read_object().

Returns:

A SummarizedExperiment derivative of the object.

celldex.utils.format_object_metadata(x) → dict[source]¶

Format object related metadata.

Create object-related metadata to validate against the default schema from fetch_metadata_schema(). This is intended for downstream package developers who are auto-generating metadata documents to be validated by validate_metadata().

Parameters:: x – An Python object, typically an instance of a BiocPy class.
Returns:: Dictionary containing metadata for the object.

celldex.utils.realize_array(x)[source]¶

Realize a ReloadedArray into a dense array or sparse matrix.

Parameters:: x – ReloadedArray object.
Returns:: Realized array or matrix.

celldex package¶

Submodules¶

celldex.fetch_reference module¶

celldex.list_references module¶

celldex.list_versions module¶

celldex.save_reference module¶

celldex.search_references module¶

celldex.upload_reference module¶

celldex.utils module¶

Module contents¶