### Trajectory Analysis of Kidney IgAN Data with PILOT
PILOT
Welcome to the PILOT Package Tutorial for pathomics Data!
You can find the pathomics data [here](https://github.com/CostaLab/PILOT/tree/main/Tutorial/Datasets).
```python
import pilotpy as pl
import scanpy as sc
```
#### Kidney_IgAN Tubuli
##### Reading Anndata
The Anndata (h5ad) file is in the Datasets folder.
```python
adata_T=sc.read_h5ad('Datasets/Kidney_IgAN_T.h5ad')
```
#### Loading the required information and computing the Wasserstein distance:
In order to work with PILOT, ensure that your Anndata object is loaded and contains the required information.
Use the following parameters to configure PILOT for your analysis (Setting Parameters):
- adata: Pass your loaded Anndata object to PILOT.
- emb_matrix: Provide the name of the variable in the obsm level that holds the dimension reduction ( PCA representation).
- clusters_col: Specify the name of the column in the observation level of your Anndata that corresponds to cell types or clusters.
- sample_col: Indicate the column name in the observation level of your Anndata that contains information about samples or patients.
- status: Provide the column name that represents the status or disease (e.g., "control" or "case").
```python
pl.tl.wasserstein_distance(
adata_T,
clusters_col = 'Cell_type',
sample_col = 'sampleID',
status = 'status',
data_type = 'Pathomics'
)
```
##### plotting the Cost matrix and the Wasserstein distance:
Here we show the heatmaps of the Cost matrix (cells) and Wasserstein distance (samples). At this point, you should have a 'Results_PILOT' folder inside the folder where you are running your analysis, which includes all the plots created by PILOT.
```python
pl.pl.heatmaps(adata_T)
```


##### Trajectory:
Here we show the Diffusion map of Wasserstein distance. In the upcoming trajectory analysis, the labels "<30" mean an eGFR level below 30 (considered as Low), "30-60" denotes a reduced eGFR range (marked as Reduced), and ">60" indicates a normal eGFR level
```python
pl.pl.trajectory(adata_T, colors = ['red','blue','orange'])
```

#### Kidney_IgAN Glomeruli
##### Reading Anndata
The Anndata (h5ad) file is in the Datasets folder.
```python
adata_G=sc.read_h5ad('Datasets/Kidney_IgAN_G.h5ad') #First read the object
```
```python
pl.tl.wasserstein_distance(
adata_G,
clusters_col = 'Cell_type',
sample_col = 'sampleID',
status = 'status',
data_type = 'Pathomics'
)
```
```python
pl.pl.heatmaps(adata_G)
```


```python
pl.pl.trajectory(adata_G, colors = ['red','blue','orange'])
```

#### Combination:
Here, we combine the distances of samples. We get the sum of distances of samples based on Tubuli and Glomeruli distances.
```python
adata_Com = adata_G
adata_Com.uns['EMD'] = adata_G.uns['EMD'] + adata_T.uns['EMD']
```
```python
pl.pl.trajectory(adata_Com, colors = ['red','blue','orange'])
```

#### Fit a principal graph:
The difussion map creates an embedding that potentially reveals a trajectory in the data. Next, PILOT explores EIPLGraph to find the structure of the trajectory. An important parameter is the source_node, which indicates the start of the trajectory. Here, we selected a normal sample (node with id = 2). This method returns ranked samples, which we define as a disease progression score (t = t1, ..., tn), where tl represents the ranking of the nth sample.
```python
pl.pl.fit_pricipla_graph(adata_Com, source_node = 2)
```

##### Cell-type Importance Glomeruli:
```python
pl.tl.cell_importance(adata_Com, xlim = 125)
```


#### Feature selection for Glomeruli based on Combination:
##### Saving morphological features and maps them with the obtained order by PILOT (for Glomeruli):
In this step, morphological features are extracted from all clusters and linked to pseudotime using the PILOT trajectory order.
- The "extract_cells_from_pathomics" function creates a 'cells' folder inside the 'Results_PILOT' folder to store these morphological features and corresponding pseudotime values as 'All.csv'.
```python
pl.tl.extract_cells_from_pathomics(adata_Com)
```
##### Getting the log scale of features
```python
data=pl.tl.norm_morphological_features(
column_names = ['glom_sizes', 'glom_distance_to_closest_glom','glom_diameters','glom_tuft_sizes',
'glom_bowman_sizes'],
name_cell = 'All'
)
```
##### Morphological features changes for Glomeruli:
Firstly, we should note that for pathomics data, instead of using cluster-specific changes for features/genes (please see MI tutorial analysis), we use the fit models to find the structural changes.
We apply the morphological_features_importance function to catch the features/structures that are changing over the trajectory (combination) for Glomeruli.
```python
pl.tl.morphological_features_importance(data, height = 10, x_lim = 160, width = 20)
```
Name of Cluster : [1mAll[0m
Sparsity:6.486269746268921e-05
For this cell_type, p-value of 14 genes are statistically significant.
Expression pattern count
0 linear down 6
1 linear up 4
2 quadratic down 2
3 quadratic up 2
data saved successfully

#### Feature selection for Tubuli based on Combination
We repeat the same process for Tubuli.
#### Saving morphological features and map them with the obtained order by PILOT (for Tubuli):
```python
adata_T.uns['orders'] = adata_Com.uns['orders']
pl.tl.extract_cells_from_pathomics(adata_T)
```
##### Getting the log scale of features
```python
data=pl.tl.norm_morphological_features(
column_names = ['tubule_diameters', 'tubule_sizes', 'tubule_distance_to_closest_instance'],
name_cell = 'All'
)
```
##### Morphological features changes for Tubuli
```python
pl.tl.morphological_features_importance(data, x_lim = 160, height = 5)
```
Name of Cluster : [1mAll[0m
Sparsity:0.0
For this cell_type, p-value of 3 genes are statistically significant.
Expression pattern count
0 linear down 2
1 linear up 1
data saved successfully
