pilotpy.tl.wasserstein_distance

pilotpy.tl.wasserstein_distance(adata, emb_matrix='X_PCA', clusters_col='cell_types', sample_col='sampleID', status='status', metric='cosine', regulizer=0.2, normalization=True, regularized='unreg', reg=0.1, res=0.01, steper=0.01, data_type='scRNA', return_sil_ari=False)

Calculate the Wasserstein (W) distance among samples using PCA representation and clustering information.

Parameters

adataAnnData

Loaded AnnData object containing the data.

emb_matrixnumpy.ndarray

PCA representation of data (variable).

clusters_colstr

Column name in the observation level of ‘adata’ that represents cell types or clustering.

sample_colstr

Column name in the observation level of ‘adata’ that represents samples or patients.

statusstr

Column name in the observation level of ‘adata’ that represents status or disease, e.g., control/case.

regulizerfloat, optional

Hyper-parameter of a Dirichlet distribution for regularization, by default 0.1.

metricstr, optional

Metric for calculating the cost matrix, by default ‘cosine’.

regularizedbool, optional

Whether to use regularized optimal transport, by default True.

regfloat, optional

Regularization parameter if ‘regularized’ is True, by default 0.1.

resfloat, optional

Resolution for Leiden clustering to achieve desired cluster count, by default 0.1.

steperfloat, optional

Stepper value for finding the best Leiden resolution, by default 0.01.

data_typestr, optional

Type of your data, e.g., ‘scRNA’ or ‘pathomics’, by default ‘scRNA’.

return_sil_aribool, optional

Whether to return ARI (Adjusted Rand Index) or Silhouette score for assessing W distance effects, by default False.

Returns

None

Calculates and stores the W distance among samples in the adata object.