celldex package¶
Submodules¶
celldex.fetch_reference module¶
- celldex.fetch_reference.fetch_metadata(name: str, version: str, path: str | None = None, package: str = 'celldex', cache_dir: str = '/home/runner/gypsum/cache', overwrite: bool = False)[source]¶
Fetch metadata for a reference from the gypsum backend.
See also
fetch_reference()
, to fetch a reference.Example:
meta = fetch_metadata("immgen", "2024-02-26")
- Parameters:
name – Name of the reference dataset.
version – Version of the reference dataset.
path – Path to a subdataset, if name contains multiple datasets. Defaults to None.
package – Name of the package. Defaults to “celldex”.
cache_dir – Path to the cache directory.
overwrite – Whether to overwrite existing files. Defaults to False.
- Returns:
Dictionary containing metadata for the specified dataset.
- celldex.fetch_reference.fetch_reference(name: str, version: str, path: str | None = None, package: str = 'celldex', cache_dir: str = '/home/runner/gypsum/cache', overwrite: bool = False, realize_assays: bool = False, **kwargs) SummarizedExperiment [source]¶
Fetch a reference dataset from the gypsum backend.
See also
metadata index, on the expected schema for the metadata.
save_reference()
andupload_directory()
, to save and upload a reference.list_references()
andlist_versions()
, to get possible values for name and version.fetch_metadata()
, to fetch the metadata for the reference.Example
ref = fetch_reference("immgen", "2024-02-26")
- Parameters:
name – Name of the reference dataset.
version – Version of the reference dataset.
path – Path to a subdataset, if name contains multiple datasets. Defaults to None.
package – Name of the package. Defaults to “celldex”.
cache_dir – Path to cache directory.
overwrite – Whether to overwrite existing files. Defaults to False.
realize_assays – Whether to realize assays into memory. Defaults to False.
**kwargs – Further arguments to pass to
read_object()
.
- Returns:
The dataset as a
SummarizedExperiment
or one of its subclasses.
celldex.list_references module¶
- celldex.list_references.list_references(cache_dir: str = '/home/runner/gypsum/cache', overwrite: bool = False, latest: bool = True) DataFrame [source]¶
List all available reference datasets.
Example
refs = list_references()
- Parameters:
cache_dir – Path to cache directory.
overwrite – Whether to overwrite the database in cache. Defaults to False.
latest – Whether to only fetch the latest version of each reference. Defaults to True.
- Returns:
A
DataFrame
where each row corresponds to a reference dataset. Each row contains title and description for each reference, the number of rows and columns, the organisms and genome builds involved, whether the dataset has any pre-computed reduced dimensions, and so on. More details can be found in the Bioconductor metadata schema.
celldex.list_versions module¶
- celldex.list_versions.fetch_latest_version(name: str) str [source]¶
Fetch the latest version for a reference from the gypsum backend.
See also
fetch_reference()
, to fetch a reference.fetch_metadata()
, to fetch the metadata for the reference.Example:
meta = fetch_latest_version("immgen")
- Parameters:
name – Name of the reference.
- Returns:
String specifying the latest version for the reference.
celldex.save_reference module¶
- celldex.save_reference.save_reference(x: Any, labels: List[str], path: str, metadata: dict)[source]¶
- celldex.save_reference.save_reference(x: SummarizedExperiment, path: str, metadata: dict)
Save a reference dataset to disk.
- Parameters:
x –
An object containing reference data. May be a
SummarizedExperiment
containing a assay matricx called logcounts of log-normalized expression values.Each row of
column_data
corresponds to a column ofx
and contains the label(s) for that column. Each column oflabels
represents a different label type; typically, the column name has alabel.
prefix to distinguish between, e.g.,label.fine
,label.broad
and so on.At least one column should be present.
path – Path to a new directory to save the dataset.
metadata –
Dictionary containing the metadata for this dataset. see the schema returned by
fetch_metadata_schema()
.Note that the
applications.takane
property will be automatically added by this function and does not have to be supplied.
See also
metadata index, on the expected schema for the metadata.
upload_reference()
, to upload the saved contents.Example
# create a summarized experiment object mat = np.random.poisson(1, (100, 10)) row_names = [f"GENE_{i}" for i in range(mat.shape[0])] col_names = list("ABCDEFGHIJ") sce = SummarizedExperiment( assays={"logcounts": mat}, row_data=BiocFrame(row_names=row_names), column_data=BiocFrame({ "label.fine": col_names }), ) # Provide metadata for search and findability meta = { "title": "New reference dataset", "description": "This is a new reference dataset", "taxonomy_id": ["10090"], # NCBI ID "genome": ["GRCm38"], # genome build "sources": [{"provider": "GEO", "id": "GSE12345"}], "maintainer_name": "Jayaram kancherla", "maintainer_email": "jayaram.kancherla@gmail.com", } import shutil import tempfile cache_dir = tempfile.mkdtemp() # Make sure the directory is clean shutil.rmtree(cache_dir) # Save the reference celldex.save_reference(sce, cache_dir, meta)
- celldex.save_reference.save_reference_se(x: SummarizedExperiment, path: str, metadata: dict)[source]¶
Save
SummarizedExperiment
to disk.
celldex.search_references module¶
- celldex.search_references.search_references(query: str | GypsumSearchClause, cache_dir: str = '/home/runner/gypsum/cache', overwrite: bool = False, latest: bool = True) DataFrame [source]¶
Search for reference datasets of interest based on matching text in the associated metadata.
This is a wrapper around
search_metadata_text()
.The returned
DataFrame
contains the usual suspects like the title and description for each dataset, the number of rows and columns, the organisms and genome builds involved, whether the dataset has any pre-computed reduced dimensions, and so on.More details can be found in the Bioconductor metadata index.
See also
list_references()
, to list all available datasets.search_metadata_text()
, to search metadata.Examples:
res = search_references("human") res = search_references(define_text_query("Immun%", partial="True")) res = search_references(define_text_query("10090", field="taxonomy_id"))
- Parameters:
query – The search query string or a
GypsumSearchClause
for more complex queries.cache_directory – Path to cache directory.
overwrite – Whether to overwrite the existing cache. Defaults to False.
latest – Whether to fetch only the latest versions of datasets. Defaults to True.
- Returns:
A
DataFrame
where each row corresponds to a dataset, containing various columns of metadata. Some columns may be lists to capture 1:many mappings.
celldex.upload_reference module¶
- celldex.upload_reference.upload_reference(directory: str, name: str, version: str, package: str = 'celldex', cache_dir: str = '/home/runner/gypsum/cache', deduplicate: bool = True, probation: bool = False, url: str = 'https://gypsum.artifactdb.com', token: str | None = None, concurrent: int = 1, abort_failed: bool = True)[source]¶
Upload the reference dataset to the gypsum bucket.
This is a wrapper around
upload_directory()
specific to the celldex package.See also
upload_directory()
, to upload a directory to the gypsum backend.- Parameters:
Name – Reference dataset name.
version – Version name for the reference.
directory – Path to a directory containing the
files
to be uploaded. This directory is assumed to correspond to a version of an asset.cache_dir –
Path to the cache for saving files, e.g., in
save_version()
.Used to convert symbolic links to upload links,see
prepare_directory_upload()
.deduplicate – Whether the backend should attempt deduplication of
files
in the immediately previous version. Defaults to True.probation – Whether to perform a probational upload. Defaults to False.
url – URL of the gypsum REST API.
token – GitHub access token to authenticate to the gypsum REST API.
concurrent – Number of concurrent downloads. Defaults to 1.
abort_failed –
Whether to abort the upload on any failure.
Setting this to False can be helpful for diagnosing upload problems.
- Returns:
True if successfull, otherwise False.
celldex.utils module¶
- celldex.utils.celldex_load_object(path: str, metadata: dict | None = None, celldex_realize_assays: bool = False, **kwargs)[source]¶
Load a
SummarizedExperiment
object from a file.- Parameters:
path – Path to the reference dataset.
metadata –
Metadata for the reference dataset.
Defaults to None.
celldex_realize_assays – Whether to realize assays into memory. Defaults to False.
**kwargs – Further arguments to pass to
read_object()
.
- Returns:
A SummarizedExperiment derivative of the object.
- celldex.utils.format_object_metadata(x) dict [source]¶
Format object related metadata.
Create object-related metadata to validate against the default schema from
fetch_metadata_schema()
. This is intended for downstream package developers who are auto-generating metadata documents to be validated byvalidate_metadata()
.- Parameters:
x – An Python object, typically an instance of a BiocPy class.
- Returns:
Dictionary containing metadata for the object.