Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Novelty Score Calculation Update #30

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

pg427
Copy link
Contributor

@pg427 pg427 commented Sep 26, 2024

Updated novelty score calcualtions including integrations of:

  1. Clinical Trials/ Clinical Approvals - MVP1
  2. TDLs - MVP2
  3. Gene Distinctiveness - MVP2

Updated novelty score calcualtions including integrations of:
1. Clinical Trials/ Clinical Approvals - MVP1
2.  TDLs - MVP2
3. Gene Distinctiveness - MVP2
Comment on lines 7 to 9
from known import find_known_results
from extr_smile_molpro_by_id import mol_to_smile_molpro
from mol_similarity import find_nearest_neighbors
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These three lines are not package imports, but local. They fail to be imported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I've imported the two necessary functions into the compute_novelty.py file to make it simpler. There won't be a need to import the three files anymore. I'll keep them in the repo in case there is future requirement of use.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If those files are no longer being used, could you remove them please? If needed in the future, we can always reference them in GitHub history. Ideally any commented out code would also be deleted.

Copy link
Contributor Author

@pg427 pg427 Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. I have removed the files from the repo and the commented out code from compute_novelty.py

'Gene Distinctiveness', 'TDLs', 'novelty_score']
df_numpy = pd.DataFrame([["NAN"]*len(column_list)]*len(message['results']), columns = column_list)

return df_numpy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The final output used to only have drug and novelty_score, with drug being the curie. Is that now Result ID? Also, at least for the query I'm trying, every value is NaN. If that's expected, could we make the default value be 0 or something?

Copy link
Contributor Author

@pg427 pg427 Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Since now we have MVP1 and MVP2 queries I changed the 'drug' to 'Result ID'. Can you send me the query? The lowest value is 0 for the novelty score. If it's showing NaN then there might be an issue I'd like to look a bit closer into. I'm assuming the "message" key from the json response is being passed from the merged Response from the ARS.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working with a smaller message, and a large message. The smaller message seems to be fine, but the larger message returns all NaNs. The CI PK for that larger message is d1e6d43f-cb3e-4c6b-a6bc-5393f576053f.

Copy link
Contributor Author

@pg427 pg427 Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the pk and it seems that the message gave an Error code of 422. It is probably why you're seeing ther NaN. I kept the NaN on purpose to distinguish between getting a 0 novelty score and when the code is erroring out. At the moment the code only works for messages with the code of 200 and shows NaNs otherwise. Do you want me to switch these out for 0s ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 422 error was given by the ARS, but that's not necessarily indicative of what the Appraiser did. I can pass that message to the Appraiser and it handles it fine, other than the novelty code, so I think something needs to be fixed in this code.

FYI, ALL fields are set to NaN, not just novelty_score.
FYI FYI, if the novelty code errors out, I'm already setting the default novelty to 0.0, so I think it would be a good idea to have that same default value in your code instead of a NaN (which I think is actually the string "NaN" instead of the value NaN.

Copy link
Contributor Author

@pg427 pg427 Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I've changed the default "NaN" strings (for instances when an error is encountered) to 0 for all entries in the DataFrame generated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now the default value is 0, but ALL fields are still 0. I'm also not seeing any error messages come out of novelty, so it looks like everything went fine. In my opinion, Query ID and Result ID should always have values and not be 0, regardless of any errors.

Removed local imports from files and transferred functions from the files to compute_novelty.py
Due to code import in compute_novelty.py, removed files: known.py, molecular_similarity.py, extr_smile_molpro_by_id.py
Removed commented code from compute_novelty.py
Switched error cases to include 0s instead of "NAN" ion the Dataframe.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants