pilotpy.tl.wasserstein_distance

pilotpy.tl.wasserstein_distance(adata, emb_matrix='X_PCA', clusters_col='cell_types', sample_col='sampleID', status='status', metric='cosine', regulizer=0.2, normalization=True, regularized='unreg', reg=0.1, res=0.01, steper=0.01, data_type='scRNA', return_sil_ari=False, use_centroids=True)

Calculate the Wasserstein (W) distance among samples using PCA representation and clustering information.

Parameters

adataAnnData: Loaded AnnData object containing the data.
emb_matrixnumpy.ndarray: PCA representation of data (variable).
clusters_colstr: Column name in the observation level of ‘adata’ that represents cell types or clustering.
sample_colstr: Column name in the observation level of ‘adata’ that represents samples or patients.
statusstr: Column name in the observation level of ‘adata’ that represents status or disease, e.g., control/case.
regulizerfloat, optional: Hyper-parameter of a Dirichlet distribution for regularization, by default 0.1.
metricstr, optional: Metric for calculating the cost matrix, by default ‘cosine’.
regularizedbool, optional: Whether to use regularized optimal transport, by default True.
regfloat, optional: Regularization parameter if ‘regularized’ is True, by default 0.1.
resfloat, optional: Resolution for Leiden clustering to achieve desired cluster count, by default 0.1.
steperfloat, optional: Stepper value for finding the best Leiden resolution, by default 0.01.
data_typestr, optional: Type of your data, e.g., ‘scRNA’ or ‘pathomics’, by default ‘scRNA’.
use_centroids: str, optional,: Use centriod in Cost function.
return_sil_aribool, optional: Whether to return ARI (Adjusted Rand Index) or Silhouette score for assessing W distance effects, by default False.

Returns

None: Calculates and stores the W distance among samples in the adata object.