-
-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: copy the RSNA anonymizer protocol #206
base: master
Are you sure you want to change the base?
Conversation
7281164
to
c2e831f
Compare
I've updated the PR to include just the deidentification directives from the RSNA anonymizer script, I added additional things that weren't in the first version of the PR and updated the commit message to reflect this. The most important changed: I've also pulled tag processing functions |
Processing directives from the 2022-03-08 version of the RSNA anonymizer software http://mirc.rsna.org/download/Anonymizer-installer.jar The jar file was unzipped, the unzipped jar has a file dicom-anonymizer.script with xml deidentification directives all `@remove` and `@empty` commands were extracted from the file using `grep`, and transformed into the corresponding REMOVE and BLANK lines using `sed`. Further, lines with @hashdate, @round, and @integer were extracted these correspond to dates and uids. The tags were REMOVEd and a ADD tag func: lookup was added for processing dates and UIDs. @ALL directives correspond to key identifiers, those were REMOVEd then ADDed as var: lookup. The Anonymizer code is distributed under the RSNA public license http://mirc.rsna.org/rsnapubliclicense.pdf which is I believe to be compatible with the MIT license used in deid. So this derivative work should satisfy the terms of that license to be included in deid.
c2e831f
to
d8ba04d
Compare
ADD PatientName var:patient_name | ||
ADD PatientID var:patient_id | ||
ADD PatientAge var:patient_age | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove empty space at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure about adding func:process_uid
here? This function won't exist for most users. We do have plan to support deid provided functions (discussion was here) #203 but no implementation yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
happy to remove the trailing empty lines. I think having the functions make sense provided the tags are removed first, at least with var: a missing lookup meant the tag was not added back to the file, I think that behaviour is ok for these. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't add functions to a config file that a user is not expecting to be there. Perhaps you can test commenting them out, and then we can re-add when #207 is implemented and we can use a function there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I don't quite understand, is this a difference between func:
and var:
? Because deid.dicom includes var:
's that are unexpected when working from within python (I haven't used the command line scripts). But I'm happy to comment them out anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Var and func both won't work without the user providing functions and variables in the lookup. E.g., I would need to define that function or a variable of a particular name and pass it to the function to perform the parsing. We generally want these core provided recipes to not require that extra customization - e.g., most people are going to try it and be surprised that there is an extra function or variable dependency!
However, if / when we have these deid provided functions, we will easily be able to add them to the defaults. Does that make sense? Some of the documentation examples of using func/var might help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean they won't work - it appears to me that the default behaviour is to leave the tag unmodified, which is why for each ADD tag var:
and ADD tag func:
I preceded them with REMOVE tag
, that way if the function or variable isn't supplied the tag is removed. Quick example:
import os
import tempfile
from pydicom.dataset import FileDataset, FileMetaDataset
from pydicom.uid import UID
# Create some temporary filenames
suffix = '.dcm'
filename_little_endian = tempfile.NamedTemporaryFile(suffix=suffix).name
filename_big_endian = tempfile.NamedTemporaryFile(suffix=suffix).name
print("Setting file meta information...")
# Populate required values for file meta information
file_meta = FileMetaDataset()
file_meta.MediaStorageSOPClassUID = UID('1.2.840.10008.5.1.4.1.1.2')
file_meta.MediaStorageSOPInstanceUID = UID("1.2.3")
file_meta.ImplementationClassUID = UID("1.2.3.4")
print("Setting dataset values...")
# Create the FileDataset instance (initially no data elements, but file_meta
# supplied)
ds = FileDataset(filename_little_endian, {},
file_meta=file_meta, preamble=b"\0" * 128)
ds.PatientAge = "206"
from deid.config import DeidRecipe
from deid.dicom.parser import DicomParser
recipe = DeidRecipe()
recipe.deid["header"].append({'action' : 'ADD', 'field' : 'PatientAge', 'value' : 'var:new_age'})
dparser = DicomParser(ds, recipe = recipe)
dparser.parse()
ds
Retains the PatientAge of 206 and does not error.
recipe2 = DeidRecipe()
recipe2.deid["header"].append({'action' : 'REMOVE', 'field' : 'PatientAge'})
recipe2.deid["header"].append({'action' : 'ADD', 'field' : 'PatientAge', 'value' : 'var:new_age'})
dparser = DicomParser(ds, recipe = recipe2)
dparser.parse()
ds
shows the PatientAge field deleted
and finally
dparser.lookup["new_age"] = 102
dparser.parse()
ds
shows the modified age of 102
But your library your choice! I'm happy to comment those actions out until #207 is complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I can be more clear - it will run but it will emit this error message:
Lines 54 to 56 in 2718b7e
if value_option not in item: | |
bot.warning("%s not found in item lookup." % (value_option)) | |
return None |
Hmm. So I guess I don't feel very strongly about this, so if your preference is to leave them, that's okay with me. But likely we will need to refactor a bit if/when we add default functions (do you agree we should provide some default to plug in to this recipe?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh interesting, I've never seen those warnings when working in the python repl!
Yes I think including defaults would be helpful, autogenerating new UIDs is a particularly appealing example. The RSNA anonymizer hashes the date, we could provide functionality like that as well if it was desired. I was just trying to stick to the spirit of their recipe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay so I'm okay if you want to leave the func/var lines. A few more tweaks before this is ready to merge:
- please bump the version in deid/version.py
- add a corresponding note to the CHANGELOG.md
- find a good place in the docs to mention this template. Perhaps on this page after the last section, provide a simple list of recipes provided (and description?) E.g.:
We provide a set of default and pre-propulated recpies for you to use in the deid/data
directory, including:
- deid.dicom: A default recipe that will make a best effort to provide header and image pixel filters based on experience and CTP.
- deid.dicom.ultrasound: intended for ultrasound
- deid.dicom.xray.chest: intended for chest xray
- [deid.dicom.rsna.anon](link here): your description
The above also has me thinking we might want to name the file according to the others, with deid.dicom as the first part to better identify it's purpose, and then perhaps remove the year so it's not dated? We can always derive the last update date from git.
Thanks @vsoch, I will make the changes suggested above, as soon as I get a chance. |
Sounds good! I'm giving the other folks mentioned in the issue a little bit to respond, and if they don't it probably means they don't have bandwidth and I'll make some time in a coming evening / weekend to implement these deid-provided functions. |
@cfhammill please see #208 - when this is merged you will be able to provide a func: with a deid defined function and it will work without a hitch. |
@cfhammill are you still interested in contributing this recipe? We've since added deid-provided replace uid functions ( |
yes I am still interested in contributing, thanks @vsoch. I'll try to find time in the next couple weeks to make the relevant changes. |
Sounds good! Thank you! |
Status? |
Description
I've added to the deid/data/deid.dicom protocol
a large number of header removal and small number of blanks.
These header removals/blanks were extracted from the 2022-03-08
version of the RSNA anonymizer software
http://mirc.rsna.org/download/Anonymizer-installer.jar
The jar file was unzipped, the unzipped jar has a file
dicom-anonymizer.script with xml deidentification directives.
all
@remove
and@empty
commands were extracted from the fileusing
grep
, and transformed into the corresponding REMOVE andBLANK lines using
sed
, copied in to a slightly modifiedversion of deid.dicom I've been modifying for work. The file was
post-processed using R to remove duplicated REMOVE directives.
The Anonymizer code is distributed under the RSNA public license
http://mirc.rsna.org/rsnapubliclicense.pdf which is I believe to be
compatible with the MIT license used in deid. So this derivative
work should satisfy the terms of that license to be included in
deid.
I've also updated the ADD directives to generate new UIDs as per
the example in the docs, and to copy back in the SOPClassUID field.
These can be removed as they may be too specialized to my use case.
Checklist
Open questions
Should the UID generating directives be removed?