-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Small fixes to database pruning and updating #328
Conversation
Apologies, hadn't seen #322 - I don't get notifications of issues as I used to for some reason. These were just the changes I needed to make to prepare data for @absternator. #304 is more a rumination on the nature of time and mortality at this point, but I'll pull across the changes from |
@@ -753,7 +753,7 @@ def assign_query_hdf5(dbFuncs, | |||
storePickle(combined_seq, combined_seq, True, None, dists_out) | |||
|
|||
# Clique pruning | |||
if model.type != 'lineage': | |||
if model.type != 'lineage' and os.path.isfile(ref_file_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check this one against #322 – I think I remember changing it there perhaps. Might be best just to remove this change for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it changes the behaviour if there is no reference file, as in the current GPS database - I suppose it just depends what we want the behaviour to be in such a situation.
@@ -333,7 +334,7 @@ def fit(self, X, max_components): | |||
else: | |||
y = self.assign(self.subsampled_X, max_batch_size = self.max_batch_size) | |||
self.within_label = findWithinLabel(self.means, y) | |||
self.between_label = findWithinLabel(self.means, y, 1) | |||
self.between_label = findBetweenLabel_bgmm(self.means, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change in behaviour? This would change some of the examples I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This harmonises the behaviour of BGMM and DBSCAN - identifying the between-strain cluster as that containing the largest number of points. Name is a bit ugly.
PopPUNK/utils.py
Outdated
adj (float) | ||
Distance by which to shift the interception point |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this now used? Also which distance, as in up or down the y-axis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used in
Line 117 in 77537af
x_max_start, y_max_start = decisionBoundary(mean0, gradient, adj = -1*min_move) |
unconstrained
with other refinement modes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better definition in 2054e6c.
Thank you for the comments @johnlees! I think I've addressed most of them - would it now make sense:
|
One final response in my comments, just on checking input is sorted
Let's merge this first. I still wanted to test a couple of things on that branch (should have time in a couple of weeks), then I can just resolve and merge problems on that one myself when it's ready.
Yep, good plan
Also makes sense! |
--remove-samples
, which did not work due to missing parenthesesprune_graph
- e.g. for trimming a database, or for removing inappropriate samples only identified after visualisation (I assume networks code will be optimised in the future, so this is a little rough and ready)unwords.py
to no longer use sets - https://stackoverflow.com/questions/15837729/random-choice-from-set