-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Novelty Score Calculation Update #30
base: main
Are you sure you want to change the base?
Conversation
Updated novelty score calcualtions including integrations of: 1. Clinical Trials/ Clinical Approvals - MVP1 2. TDLs - MVP2 3. Gene Distinctiveness - MVP2
app/novelty/compute_novelty.py
Outdated
from known import find_known_results | ||
from extr_smile_molpro_by_id import mol_to_smile_molpro | ||
from mol_similarity import find_nearest_neighbors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These three lines are not package imports, but local. They fail to be imported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I've imported the two necessary functions into the compute_novelty.py file to make it simpler. There won't be a need to import the three files anymore. I'll keep them in the repo in case there is future requirement of use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If those files are no longer being used, could you remove them please? If needed in the future, we can always reference them in GitHub history. Ideally any commented out code would also be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. I have removed the files from the repo and the commented out code from compute_novelty.py
'Gene Distinctiveness', 'TDLs', 'novelty_score'] | ||
df_numpy = pd.DataFrame([["NAN"]*len(column_list)]*len(message['results']), columns = column_list) | ||
|
||
return df_numpy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The final output used to only have drug
and novelty_score
, with drug
being the curie. Is that now Result ID
? Also, at least for the query I'm trying, every value is NaN. If that's expected, could we make the default value be 0 or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Since now we have MVP1 and MVP2 queries I changed the 'drug' to 'Result ID'. Can you send me the query? The lowest value is 0 for the novelty score. If it's showing NaN then there might be an issue I'd like to look a bit closer into. I'm assuming the "message"
key from the json response is being passed from the merged Response from the ARS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm working with a smaller message, and a large message. The smaller message seems to be fine, but the larger message returns all NaNs. The CI PK for that larger message is d1e6d43f-cb3e-4c6b-a6bc-5393f576053f.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the pk and it seems that the message gave an Error code of 422. It is probably why you're seeing ther NaN. I kept the NaN on purpose to distinguish between getting a 0 novelty score and when the code is erroring out. At the moment the code only works for messages with the code of 200 and shows NaNs otherwise. Do you want me to switch these out for 0s ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 422 error was given by the ARS, but that's not necessarily indicative of what the Appraiser did. I can pass that message to the Appraiser and it handles it fine, other than the novelty code, so I think something needs to be fixed in this code.
FYI, ALL fields are set to NaN, not just novelty_score.
FYI FYI, if the novelty code errors out, I'm already setting the default novelty to 0.0, so I think it would be a good idea to have that same default value in your code instead of a NaN (which I think is actually the string "NaN" instead of the value NaN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I've changed the default "NaN" strings (for instances when an error is encountered) to 0 for all entries in the DataFrame generated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So now the default value is 0, but ALL fields are still 0. I'm also not seeing any error messages come out of novelty, so it looks like everything went fine. In my opinion, Query ID
and Result ID
should always have values and not be 0, regardless of any errors.
Removed local imports from files and transferred functions from the files to compute_novelty.py
Due to code import in compute_novelty.py, removed files: known.py, molecular_similarity.py, extr_smile_molpro_by_id.py
Removed commented code from compute_novelty.py
Switched error cases to include 0s instead of "NAN" ion the Dataframe.
Updated novelty score calcualtions including integrations of: