datalad datasets : unable to read volumes #65

aryamehta2006 · 2022-07-11T20:51:54Z

VisualQC faces some issue with reading MR volumes from BIDS format downloaded via datalad -- see log below. cc @yarikoptic

(base) aryamehta@Aryas-Air ~ % visualqc_anatomical -b /Users/aryamehta/datasets/ds002785 -old

Anatomical MRI module
Time stamp : 2022-07-11 16:41:35

version info: visualqc 0.6.1
numpy 1.21.5 / scipy 1.7.3 / matplotlib 3.5.1
python 3.9.12 (main, Jun  1 2022, 06:34:44) 
[Clang 12.0.0 ]
platform macOS-12.4-arm64-arm-64bit
Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:29 PDT 2022; root:xnu-8020.121.3~4/RELEASE_ARM64_T8101


Input folder: /Users/aryamehta/datasets/ds002785
Output folder: /Users/aryamehta/datasets/ds002785/visualqc
outlier detection: disabled, as requested.
Restoring ratings from previous session(s), if they exist ..
To be reviewed : 216


Reviewing MD5E-s6747706--db99fa634eb92335db8a483331f7806a.nii.gz
Traceback (most recent call last):
  File "/Users/aryamehta/opt/anaconda3/bin/visualqc_anatomical", line 8, in <module>
    sys.exit(main())
  File "/Users/aryamehta/opt/anaconda3/lib/python3.9/site-packages/visualqc/__t1_mri__.py", line 12, in main
    t1_mri.cli_run()
  File "/Users/aryamehta/opt/anaconda3/lib/python3.9/site-packages/visualqc/t1_mri.py", line 872, in cli_run
    wf.run()
  File "/Users/aryamehta/opt/anaconda3/lib/python3.9/site-packages/visualqc/workflows.py", line 87, in run
    self.loop_through_units()
  File "/Users/aryamehta/opt/anaconda3/lib/python3.9/site-packages/visualqc/workflows.py", line 224, in loop_through_units
    skip_subject = self.load_unit(unit_id)
  File "/Users/aryamehta/opt/anaconda3/lib/python3.9/site-packages/visualqc/t1_mri.py", line 507, in load_unit
    self.current_img_raw = read_image(t1_mri_path, error_msg='T1 mri')
  File "/Users/aryamehta/opt/anaconda3/lib/python3.9/site-packages/visualqc/utils.py", line 37, in read_image
    raise IOError('Given path to {} does not exist!\n\t{}'
OSError: Given path to T1 mri does not exist!
	/Users/aryamehta/datasets/ds002785/.git/annex/objects/x8/Z5/MD5E-s6747706--db99fa634eb92335db8a483331f7806a.nii.gz/MD5E-s6747706--db99fa634eb92335db8a483331f7806a.nii.gz
(base) aryamehta@Aryas-Air ~ %

yarikoptic · 2022-07-11T21:13:17Z

did you datalad get the content of that dataset before running visualqc_anatomical?

raamana · 2022-07-11T22:36:35Z

That is how it was originally downloaded but we copy pasted it to another computer (outside dataalad), that’s probably the source of the error

but we can see the MRI scan though so they should be MRI scan data inside there, no?

raamana · 2022-07-11T23:14:08Z

I get the same error in the computer where I did the datalad get btw:

(base) $ 19:04:49 Quark ds002785 >>  vqct1 -b $PWD -old

Anatomical MRI module
Time stamp : 2022-07-11 19:04:54

version info: visualqc 0.6.1
numpy 1.17.4 / scipy 1.1.0 / matplotlib 3.5.1
python 3.7.2 (default, Dec 29 2018, 00:00:04)
[Clang 4.0.1 (tags/RELEASE_401/final)]
platform Darwin-21.4.0-x86_64-i386-64bit
Darwin Kernel Version 21.4.0: Fri Mar 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64


/Users/Reddy/anaconda3/envs/py36/lib/python3.7/site-packages/bids/layout/models.py:152: FutureWarning: The 'extension' entity currently excludes the leading dot ('.'). As of version 0.14.0, it will include the leading dot. To suppress this warning and include the leading dot, use `bids.config.set_option('extension_initial_dot', True)`.
  FutureWarning)
Input folder: /Volumes/work/Pitt/datasets/ds002785
Output folder: /Volumes/work/Pitt/datasets/ds002785/visualqc
outlier detection: disabled, as requested.
Restoring ratings from previous session(s), if they exist ..
To be reviewed : 216


Reviewing MD5E-s6406026--f20d90f38f7122ca08d290b502661802.nii.gz
Traceback (most recent call last):
  File "/Users/Reddy/anaconda3/envs/py36/bin/vqct1", line 8, in <module>
    sys.exit(main())
  File "/Users/Reddy/anaconda3/envs/py36/lib/python3.7/site-packages/visualqc/__t1_mri__.py", line 12, in main
    t1_mri.cli_run()
  File "/Users/Reddy/anaconda3/envs/py36/lib/python3.7/site-packages/visualqc/t1_mri.py", line 872, in cli_run
    wf.run()
  File "/Users/Reddy/anaconda3/envs/py36/lib/python3.7/site-packages/visualqc/workflows.py", line 87, in run
    self.loop_through_units()
  File "/Users/Reddy/anaconda3/envs/py36/lib/python3.7/site-packages/visualqc/workflows.py", line 224, in loop_through_units
    skip_subject = self.load_unit(unit_id)
  File "/Users/Reddy/anaconda3/envs/py36/lib/python3.7/site-packages/visualqc/t1_mri.py", line 507, in load_unit
    self.current_img_raw = read_image(t1_mri_path, error_msg='T1 mri')
  File "/Users/Reddy/anaconda3/envs/py36/lib/python3.7/site-packages/visualqc/utils.py", line 38, in read_image
    ''.format(error_msg, img_spec))
OSError: Given path to T1 mri does not exist!
	/Volumes/work/Pitt/datasets/ds002785/.git/annex/objects/jz/W2/MD5E-s6406026--f20d90f38f7122ca08d290b502661802.nii.gz/MD5E-s6406026--f20d90f38f7122ca08d290b502661802.nii.gz
(base) $ 19:08:33 Quark ds002785 >>

what commands can I run to ensure it was gotten / installed properly? I tried metadata but it didn't work:

(base) $ 19:11:34 Quark ds002785 >>  datalad metadata -d $PWD
[WARNING] Found no aggregated metadata info file /Volumes/work/Pitt/datasets/ds002785/.datalad/metadata/aggregate_v1.json. You will likely need to either update the dataset from its original location or reaggregate metadata locally.
[WARNING] Dataset at . contains no aggregated metadata on this path [metadata(/Volumes/work/Pitt/datasets/ds002785)]
(base) $ 19:11:39 Quark ds002785 >>

raamana · 2022-07-11T23:16:58Z

now that I think about it, I realize I only installed one of the derivatives : freesurfer, and not the base BIDS dataset. I am now running datalad get sub-????/anat/* and see whether the error reappears after the download is finished! My bad :)

raamana · 2022-07-11T23:30:50Z

I get the following, and it worked this time:

(base) $ 19:14:57 Quark ds002785 >>  datalad get sub-????/anat/*
get(ok): sub-0001/anat/sub-0001_T1w.nii.gz (file) [from s3-PUBLIC...]
get(ok): sub-0067/anat/sub-0067_T1w.nii.gz (file) [from s3-PUBLIC...]
get(ok): sub-0064/anat/sub-0064_T1w.nii.gz (file) [from s3-PUBLIC...]
get(ok): sub-0135/anat/sub-0135_T1w.nii.gz (file) [from s3-PUBLIC...]
get(ok): sub-0152/anat/sub-0152_T1w.nii.gz (file) [from s3-PUBLIC...]
get(ok): sub-0193/anat/sub-0193_T1w.nii.gz (file) [from s3-PUBLIC...]
get(ok): sub-0033/anat/sub-0033_T1w.nii.gz (file) [from s3-PUBLIC...]
get(ok): sub-0213/anat/sub-0213_T1w.nii.gz (file) [from s3-PUBLIC...]
get(ok): sub-0091/anat/sub-0091_T1w.nii.gz (file) [from s3-PUBLIC...]
get(ok): sub-0149/anat/sub-0149_T1w.nii.gz (file) [from s3-PUBLIC...]
  [206 similar messages have been suppressed]
action summary:
  get (notneeded: 216, ok: 216)

i don't understand the notneeded part in get (notneeded: 216 ... message though?

although I see two issues:

its very slow to traverse the dataset for some reason! All visualqc does is to look for valid MRIs, which should be super fast, so not sure who loop it's getting into while trying to index the */anat/*_T1w.nii.gz.nii files
due to fully resolving the file path, real subject IDs (like sub-0001) are being replaced with an MD5 hash like MD5E-s6843657--a5192feb724d3a07d8f20724bfce9f47.nii.gz. This is an issue as we need to record the QC ratings against the subject IDs, so I'll have to figure out a better way to index BIDS datasets obtained via datalad. My current implementation works with plains BIDS datasets, without symlinks managed by datalad

yarikoptic · 2022-07-11T23:51:08Z

Let's zoom tomorrow?

raamana · 2022-07-12T02:22:30Z

Sure! Tomorrow is a bit tough with dental appointments and other things but Thursday afternoon works. Or Friday?

yarikoptic · 2022-07-12T02:48:00Z

sure, just let me know the time ;-) Thu we have ReproNim coworking time 11-5pm which happens in NMIND gather town, so can meet there

raamana · 2022-07-12T14:33:11Z

we are trying to use the dataset on an M1 MacBook, and it appears installing datalad on it is not easy (and definitely not for a high school student)

i wish openfmri folks let us download the dataset, or parts of it, from a browser :). cc @effigies

i will check if AWS CLI works on M1 MacBook

yarikoptic · 2022-07-12T14:59:35Z

there is always trade off between "I want the flashiest latest cool gadget from a company which does not really care about science" and "I want a system for doing science" ;)

It is all on S3, you can use s3 clients to download straight from S3.

Re M1 -- should install rosetta and then git-annex should be installable from brew IIRC. some details here datalad/datalad#5701

effigies · 2022-07-12T15:12:53Z

i wish openfmri folks let us download the dataset, or parts of it, from a browser

OpenNeuro permits downloading; I believe recent Chrome or Firefox is needed for the download API needed to work with such large datasets. If you're still using legacy.openfmri.org, then I think there are tarballs, but these datasets are not kept in sync with OpenNeuro.

raamana · 2022-07-12T15:44:00Z

damn, that's good to know. I was always seeing it from safari, and there was no indication at all that we could download it from a browser. I would suggest leaving a note to ask folks to use Chrome or Firefox, instead of silently removing that option on safari

raamana · 2022-07-12T15:47:46Z

it doesn't seem to work on firefox btw, atleast for 2 datasets I looked at

effigies · 2022-07-12T16:29:56Z

You're right, it looks like Mozilla is not implementing this API; for some reason I thought they had. Looks like Chrome, Edge and Opera do implement it. https://developer.mozilla.org/en-US/docs/Web/API/File_System_Access_API#browser_compatibility

raamana · 2022-07-14T17:31:34Z

Hi Yarik, I am available in the next few hours if you want to look into this issue.

yarikoptic · 2022-07-14T20:06:05Z

pinged you on twitter with url to nmind if you don't know

yarikoptic · 2022-07-29T19:03:39Z

ok,since zooming didn't happen, let me follow up on original datalad-related issues from the last related to that comment by @raamana :

i don't understand the notneeded part in get (notneeded: 216 ... message though?

most likely those 216 were already obtained

its very slow to traverse the dataset for some reason! All visualqc does is to look for valid MRIs, which should be super fast, so not sure who loop it's getting into while trying to index the */anat/*_T1w.nii.gz.nii files

I have not looked inside: if visualqc traversal traverses also .git -- you might like to "disable" that. FWIW, here is out simple "walker" which exclude vcs subfolders by default: https://github.com/dandi/dandi-cli/blob/1c947365311732943753e15199a57c9bfd2759bf/dandi/utils.py#L260

regardless of the datalad, you might benefit from speeding up walking through multithreading the walk -- we have it in https://github.com/dandi/dandi-cli/blob/master/dandi/support/threaded_walk.py but there we have not added any vcs folders exclusion yet (used only within zarr folders) -- filed dandi/dandi-cli#1086 to possibly harmonize.

due to fully resolving the file path, real subject IDs (like sub-0001) are being replaced with an MD5 hash like MD5E-s6843657--a5192feb724d3a07d8f20724bfce9f47.nii.gz. This is an issue as we need to record the QC ratings against the subject IDs, so I'll have to figure out a better way to index BIDS datasets obtained via datalad. My current implementation works with plains BIDS datasets, without symlinks managed by datalad

such "resolve to the death" plagues many things, including browsers, AFNI etc. Often they come up with a switch to "do not bother resolving" and since I do not know details here I can only arrogantly state "there should be no need to resolve symlinks since that would incorporate some ad-hoc assumption on their purpose. If there is such ad-hoc assumption -- make it more explicit ". So what is the assumption which makes you to resolve the paths here? ;-)

raamana · 2022-08-10T16:50:53Z

thanks Yarik for the detailed notes. I was thinking of potentially excluding certain paths like .git etc but I was afraid of making any ad-hoc changes file path management that might introduce funny behaviour across platforms

raamana · 2022-08-10T16:53:26Z

I resolve paths by default as one of the several best practices for file/path management -- I don't understand the case against resolving though, except in extreme situations of large number of layers of sym-linking (which is often not the case with most regular users)

yarikoptic · 2022-08-10T17:10:14Z

I resolve paths by default as one of the several best practices for file/path management ...

could you provide reference for such a best practice. My mileage goes against it ;-)

raamana · 2022-08-11T15:16:41Z

I guess we approach it with different experiences from the past i guess :). one obvious rationale is to avoid depending on relative paths, which caused some issues for me before, esp. when the same tool is used process different projects and datasets

yarikoptic · 2022-08-11T16:00:04Z

one obvious rationale is to avoid depending on relative paths,

"relative path" (e.g., sub-01/blah.nii.gz) -> "absolute path" (e.g., /home/pradeep/favoritebids/sub-01/blah.nii.gz) -> "resolved path" (e.g., /tmp/junk/scannedyesterday.dat) , so it seems you want "absolute paths" but talking about "resolved paths" while skipping "absolute" intermediate. Is that right?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datalad datasets : unable to read volumes #65

datalad datasets : unable to read volumes #65

aryamehta2006 commented Jul 11, 2022

yarikoptic commented Jul 11, 2022 •

edited

Loading

raamana commented Jul 11, 2022

raamana commented Jul 11, 2022

raamana commented Jul 11, 2022

raamana commented Jul 11, 2022

yarikoptic commented Jul 11, 2022

raamana commented Jul 12, 2022

yarikoptic commented Jul 12, 2022

raamana commented Jul 12, 2022

yarikoptic commented Jul 12, 2022

effigies commented Jul 12, 2022

raamana commented Jul 12, 2022

raamana commented Jul 12, 2022

effigies commented Jul 12, 2022

raamana commented Jul 14, 2022

yarikoptic commented Jul 14, 2022

yarikoptic commented Jul 29, 2022

raamana commented Aug 10, 2022 •

edited

Loading

raamana commented Aug 10, 2022

yarikoptic commented Aug 10, 2022

raamana commented Aug 11, 2022

yarikoptic commented Aug 11, 2022 •

edited

Loading

datalad datasets : unable to read volumes #65

datalad datasets : unable to read volumes #65

Comments

aryamehta2006 commented Jul 11, 2022

yarikoptic commented Jul 11, 2022 • edited Loading

raamana commented Jul 11, 2022

raamana commented Jul 11, 2022

raamana commented Jul 11, 2022

raamana commented Jul 11, 2022

yarikoptic commented Jul 11, 2022

raamana commented Jul 12, 2022

yarikoptic commented Jul 12, 2022

raamana commented Jul 12, 2022

yarikoptic commented Jul 12, 2022

effigies commented Jul 12, 2022

raamana commented Jul 12, 2022

raamana commented Jul 12, 2022

effigies commented Jul 12, 2022

raamana commented Jul 14, 2022

yarikoptic commented Jul 14, 2022

yarikoptic commented Jul 29, 2022

raamana commented Aug 10, 2022 • edited Loading

raamana commented Aug 10, 2022

yarikoptic commented Aug 10, 2022

raamana commented Aug 11, 2022

yarikoptic commented Aug 11, 2022 • edited Loading

yarikoptic commented Jul 11, 2022 •

edited

Loading

raamana commented Aug 10, 2022 •

edited

Loading

yarikoptic commented Aug 11, 2022 •

edited

Loading