-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add partial time-reading and conjugation to HERADataFastReader #869
base: main
Are you sure you want to change the base?
Conversation
Adds: - `keep_times` argument to `read_hera_hdf5`, which keeps certain times at the end of the read (not really partial I/O, but still better than `.select()` on a DC). - ability to get conjugate baselines in `read_hera_hdf5`. - more tests to ensure partial bls/pols/times is done right.
Codecov ReportBase: 97.07% // Head: 97.09% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #869 +/- ##
==========================================
+ Coverage 97.07% 97.09% +0.01%
==========================================
Files 20 20
Lines 8816 8977 +161
==========================================
+ Hits 8558 8716 +158
- Misses 258 261 +3
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Just a handful of questions/comments.
hera_cal/io.py
Outdated
read_data=True, read_flags=False, read_nsamples=False, | ||
check=False, dtype=np.complex128, verbose=False): | ||
def read_hera_hdf5( | ||
filenames, bls=None, pols=None, keep_times: list[float] | None=None, full_read_thresh=0.002, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with this style of type hinting. Why is it| None=None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following None is the default for the parameter. I agree it could look a litle silly, though I'm not sure how it would look best.
Arguments: | ||
filenames: list of files to read | ||
bls: list of (ant_1, ant_2, [polstr]) tuples to read out of files. | ||
Default: all bls common to all files. | ||
pols: list of pol strings to read out of files. Default: all, but is | ||
superceded by any polstrs listed in bls. | ||
keep_times: list of times to read out of files. Default: all. Note: all times |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to implement some kind of tolerance here? I know pyuvdata has a tolerance for figuring out of times match, perhaps we can use their default under the hood?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I think that should be doable. I'll add it in.
if bls is not None: | ||
antpairs = [bl[:2] for bl in bls] | ||
else: | ||
antpairs = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we ignoring polarization? If I only want to load one bl for one pol, shouldn't that be allowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you can do that. This is just getting the antpairs for when we check whether all the antpairs are present in the data.
hera_cal/io.py
Outdated
@@ -1366,22 +1415,25 @@ def _adapt_metadata(self, info_dict, skip_lsts=False): | |||
info_dict['data_antpos'] = {ant: info_dict['antpos'][ant] for ant in info_dict['data_ants']} | |||
info_dict['times'] = np.unique(info_dict['times']) | |||
info_dict['times_by_bl'] = {ap: info_dict['times'] for ap in info_dict['antpairs']} | |||
info_dict['times_by_bl'].update({(a2, a1): info_dict['times'] for (a1, a2) in info_dict['antpairs']}) | |||
# info_dict['times_by_bl'].update({(a2, a1): info_dict['times'] for (a1, a2) in info_dict['antpairs']}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why was this removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't there originally -- I added it when I thought I needed to manually index all the conjugated bls. I'll remove it completely.
hera_cal/io.py
Outdated
@@ -1233,8 +1258,13 @@ def read_hera_hdf5(filenames, bls=None, pols=None, full_read_thresh=0.002, | |||
pols = list(pol_indices.keys()) | |||
bls = set(bl for bl in bls if len(bl) == 3) | |||
bls = bls.union([bl + (p,) for bl in bls_len2 for p in pols]) | |||
|
|||
# now, conjugate them if they're not in the data | |||
bls = {(i, j, p) if (i, j) in info['bls'] else (j, i, utils.conj_pol(p)) for (i, j, p) in bls} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be slightly cleaner with utils.reverse_bl()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks!
hera_cal/io.py
Outdated
|
||
# Now subselect the times. | ||
if keep_times is not None: | ||
time_mask = np.array([t in keep_times for t in info['times']]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above about tolerances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Adds:
keep_times
argument toread_hera_hdf5
, which keeps certain times at the end of the read (not really partial I/O, but still better than.select()
on a DC).read_hera_hdf5
.