You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is for internal use only; if you'd like to open an issue or request a new feature, please open a bug or enhancement issue
Relevant background
Now that we have the AnnData conversion notebook, we can start implementing AnnData with some of the spatial analysis functions.
Design overview
We have two options:
Hard fork: Change all the functions to pass in AnnData directory and load the cell table information. The AnnData table conversion is a very quick process so forcing people to use it moving forwards seems reasonable.
Soft fork: Add an optional anndata_dir arg which will load in and save to the AnnData table if provided. When left as None, the functions will continue to use the cell_table.csv and process the same as before. This isn't too much double code because the Anndata reading and writing calls are very simple, but it'll add quite a few extra if statements.
Things which will be adapted for AnnData compatibility in this new PR:
Distance matrices calculation
Neighborhood matrices calculation (will save to both the existing AnnData table and as individual csv)
Cell Distances analysis
Since all the spatial notebooks (except cell distances) only take the neighborhood mats as inputs, these changes will not break any other notebooks if we choose to do a hard fork.
** Note: This will break the spatial enrichment code we have in ark, but I'm pretty confident we can just scrap the old stuff and build on squidpy's nhood_enrichment function.
Code mockup
I can write up more specific pseudocode when we decide between a hard or soft fork, but for now here's which functions will need to be altered regardless.
1. Distance matrices calculation
Currently we load in the segmentations masks to get the labels and centroids of the cells by running regionprops(). This doesn't make much sense in our pipeline anymore since this is already done in the segmentation notebook and the centroids are stored in the cell table; I think moving forward we can skip loading in the segmentation masks and re-running regionprops (it's very time consuming using the NAS and also redundant).
Also I will probably do some efficiency testing to find a replacement for cdist() that creates a sparse matrix rather than computing ever pairwise distance between cells.
Adjusted functions: calc_dist_matrix - adjust for AnnData
Soft fork example:
def calc_dist_matrix(cell_table, save_path, anndata_dir=None):
"""Generate matrix of distances between center of pairs of cells.
Saves each one individually to `save_path`.
Args:
anndata_dir (str):
path where the AnnData object is stored
"""
if anndata_dir:
fov_names = list_files(anndata_dir, substr=".zarr")
else:
# check that both label_dir and save_path exist
io_utils.validate_paths([label_dir, save_path])
fov_names = cell_table.fov.unique()
# iterate for each fov
with tqdm(total=len(fov_names), desc="Distance Matrix Generation", unit="FOVs") \
as dist_mat_progress:
for fov in fov_names:
if anndata_dir:
fov_adata = anndata.read_zarr(os.path.join(anndata_dir, f"{fov}.zarr"))
centroids = fov_adata.obsm["spatial"]
else:
fov_data = cell_table[cell_table["fov"] == fov]
fov_data = [fov_data.centroid-0, fov_data.centroid-1]
# generate the distance matrix, then assign centroid_labels as coords
dist_matrix = cdist(centroids, centroids).astype(np.float32)
dist_mat_xarr = xr.DataArray(dist_matrix, coords=[centroid_labels, centroid_labels])
# save distances to AnnData table
if anndata_dir:
fov_adata.obsp["distance"] = dist_mat_xarr
fov_adata.write_zarr(os.path.join(anndata_dir, f"{fov}.zarr"), chunks=(1000, 1000))
# save the distance matrix to save_path
dist_mat_xarr.to_netcdf(
os.path.join(save_path, fov_name + '_dist_mat.xr'),
format='NETCDF3_64BIT'
)
dist_mat_progress.update(1)
Hard fork example:
def calc_dist_matrix(anndata_dir):
"""Generate matrix of distances between center of pairs of cells.
Args:
anndata_dir (str):
path where the AnnData object is stored
"""
# load fov names
fov_names = list_files(anndata_dir, substr=".zarr")
# iterate for each fov
with tqdm(total=len(fov_names), desc="Distance Matrix Generation", unit="FOVs") \
as dist_mat_progress:
for fov in fov_names:
fov_adata = anndata.read_zarr(os.path.join(anndata_dir, f"{fov}.zarr"))
centroids = fov_adata.obsm["spatial"]
# generate the distance matrix, then assign centroid_labels as coords
dist_matrix = cdist(centroids, centroids).astype(np.float32)
dist_mat_xarr = xr.DataArray(dist_matrix, coords=[centroid_labels, centroid_labels])
# save distances to AnnData table
fov_adata.obsp["distance"] = dist_mat_xarr
fov_adata.write_zarr(os.path.join(anndata_dir, f"{fov}.zarr"), chunks=(1000, 1000))
dist_mat_progress.update(1)
3. Cell Distances analysis
Adjusted functions: generate_cell_distance_analysis
possibly calculate_mean_distance_to_all_cell_types andcalculate_mean_distance_to_cell_type as well
Required inputs
Cell table / AnnData table.
Output files
Since no one seems to really open or use the saved distance matrix xarrays anyways, I think we can just append them to the AnnData table and save some storage space.
The neighborhood mats and cell distance results will be saved to both the AnnData table and as individual csvs.
Timeline
Give a rough estimate for how long you think the project will take. In general, it's better to be too conservative rather than too optimistic.
A couple days
A week
Multiple weeks. For large projects, make sure to agree on a plan that isn't just a single monster PR at the end.
Estimated date when a fully implemented version will be ready for review:
1/19
Estimated date when the finalized project will be merged in:
1/23
delayed with distance matrix calculation benchmarking, should be in by 1/27
The text was updated successfully, but these errors were encountered:
@srivarra@alex-l-kong@ngreenwald@jranek Lmk your thoughts. I'm leaning towards a hard switch since we just released v0.7.0 of ark and anyone super committed to staying away from AnnData can always use that version.
This is for internal use only; if you'd like to open an issue or request a new feature, please open a bug or enhancement issue
Relevant background
Now that we have the AnnData conversion notebook, we can start implementing AnnData with some of the spatial analysis functions.
Design overview
We have two options:
anndata_dir
arg which will load in and save to the AnnData table if provided. When left asNone
, the functions will continue to use the cell_table.csv and process the same as before. This isn't too much double code because the Anndata reading and writing calls are very simple, but it'll add quite a few extraif
statements.Things which will be adapted for AnnData compatibility in this new PR:
Since all the spatial notebooks (except cell distances) only take the neighborhood mats as inputs, these changes will not break any other notebooks if we choose to do a hard fork.
** Note: This will break the spatial enrichment code we have in ark, but I'm pretty confident we can just scrap the old stuff and build on squidpy's nhood_enrichment function.
Code mockup
I can write up more specific pseudocode when we decide between a hard or soft fork, but for now here's which functions will need to be altered regardless.
1. Distance matrices calculation
Currently we load in the segmentations masks to get the labels and centroids of the cells by running
regionprops()
. This doesn't make much sense in our pipeline anymore since this is already done in the segmentation notebook and the centroids are stored in the cell table; I think moving forward we can skip loading in the segmentation masks and re-running regionprops (it's very time consuming using the NAS and also redundant).Also I will probably do some efficiency testing to find a replacement for cdist() that creates a sparse matrix rather than computing ever pairwise distance between cells.
Adjusted functions:
calc_dist_matrix
- adjust for AnnDataSoft fork example:
Hard fork example:
2. Neighborhood matrices calculation
Adjusted functions:
create_neighborhood_matrix
3. Cell Distances analysis
Adjusted functions:
generate_cell_distance_analysis
possibly
calculate_mean_distance_to_all_cell_types
andcalculate_mean_distance_to_cell_type
as wellRequired inputs
Cell table / AnnData table.
Output files
Since no one seems to really open or use the saved distance matrix xarrays anyways, I think we can just append them to the AnnData table and save some storage space.
The neighborhood mats and cell distance results will be saved to both the AnnData table and as individual csvs.
Timeline
Give a rough estimate for how long you think the project will take. In general, it's better to be too conservative rather than too optimistic.
Estimated date when a fully implemented version will be ready for review:
1/19
Estimated date when the finalized project will be merged in:
1/23
The text was updated successfully, but these errors were encountered: