[docs]defpolish_dataset(x:Type[SummarizedExperiment],reformat_assay_by_density:float=0.3,attempt_integer_conversion:bool=True,remove_altexp_coldata:bool=True,forbid_nested_altexp:bool=True,)->Type[SummarizedExperiment]:"""Optimize dataset for saving. Prepare a :py:class:`~summarizedexperiment.SummarizedExperiment.SummarizedExperiment` or :py:class:`~singlecellexperiment.SingleCellExperiment.SingleCellExperiment` to be saved with :py:func:`scrnaseq.save_dataset.save_dataset`. This performs minor changes to improve storage efficiency, especially with matrices. Args: x: A :py:class:`~summarizedexperiment.SummarizedExperiment.SummarizedExperiment` or one of its derivative. reformat_assay_by_density: Whether to optimize assay formats based on the density of non-zero values. Assays with densities above this number are converted to ordinary dense arrays (if they are not already), while those with lower densities are converted to sparse matrices. This can be disabled by setting it to `None`. attempt_integer_conversion: Whether to convert double-precision assays containing integer values to actually have the integer type. This can improve efficiency of downstream applications by avoiding the need to operate in double precision. remove_altexp_coldata: Whether column data for alternative experiments should be removed. Defaults to `True` as the alternative experiment column data is usually redundant compared to the main experiment. forbid_nested_altexp: Whether nested alternative experiments (i.e., alternative experiments of alternative experiments) should be forbidden. Returns: A modifed object with the same type as ``x``. """return_polish_dataset(x,reformat_assay_by_density,attempt_integer_conversion,remove_altexp_coldata,forbid_nested_altexp,)
def_polish_dataset(x:Type[SummarizedExperiment],reformat_assay_by_density:float,attempt_integer_conversion:bool,remove_altexp_coldata:bool,forbid_nested_altexp:bool,level:int=0,):new_assays={}forasyname,asyinx.assays.items():ifreformat_assay_by_densityisnotNone:density=min(np.mean(asy!=0),np.mean(asy!=np.nan))fromscipyimportsparseasspifdensity<reformat_assay_by_density:ifnotsp.issparse(asy):asy=sp.csr_matrix(asy)else:ifsp.issparse(asy):asy=asy.toarray()ifattempt_integer_conversion:ifnp.issubdtype(asy.dtype,np.floating):_cast=Falsefromscipyimportsparseasspifsp.issparse(asy):ifnotnp.any(asy.data%1!=0):_cast=Trueelifnotnp.any(asy%1!=0):_cast=Trueif_castisTrue:asy=asy.astype(np.int_)new_assays[asyname]=asyx=x.set_assays(new_assays)ifisinstance(x,SingleCellExperiment):iflen(x.get_alternative_experiment_names())>0:ifforbid_nested_altexpandlevel>0:raiseValueError("Nested alternative experiments are forbidden.")new_alts={}foraltname,altexpinx.alternative_experiments.items():ifremove_altexp_coldata:altexp=altexp.set_column_data(None)altexp=_polish_dataset(altexp,reformat_assay_by_density=reformat_assay_by_density,attempt_integer_conversion=attempt_integer_conversion,remove_altexp_coldata=remove_altexp_coldata,forbid_nested_altexp=forbid_nested_altexp,level=level+1,)new_alts[altname]=altexpx=x.set_alternative_experiments(new_alts)returnx