Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add partial time-reading and conjugation to HERADataFastReader #869

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

steven-murray
Copy link
Contributor

Adds:

  • keep_times argument to read_hera_hdf5, which keeps certain times at the end of the read (not really partial I/O, but still better than .select() on a DC).
  • ability to get conjugate baselines in read_hera_hdf5.
  • more tests to ensure partial bls/pols/times is done right.

Adds:
  - `keep_times` argument to `read_hera_hdf5`, which keeps certain
    times at the end of the read (not really partial I/O, but still
    better than `.select()` on a DC).
  - ability to get conjugate baselines in `read_hera_hdf5`.
  - more tests to ensure partial bls/pols/times is done right.
@codecov
Copy link

codecov bot commented Jan 31, 2023

Codecov Report

Base: 97.07% // Head: 97.09% // Increases project coverage by +0.01% 🎉

Coverage data is based on head (15d15ba) compared to base (6f6318f).
Patch coverage: 95.23% of modified lines in pull request are covered.

❗ Current head 15d15ba differs from pull request most recent head 7d9cf53. Consider uploading reports for the commit 7d9cf53 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #869      +/-   ##
==========================================
+ Coverage   97.07%   97.09%   +0.01%     
==========================================
  Files          20       20              
  Lines        8816     8977     +161     
==========================================
+ Hits         8558     8716     +158     
- Misses        258      261       +3     
Impacted Files Coverage Δ
hera_cal/io.py 98.45% <95.23%> (+0.11%) ⬆️
hera_cal/frf.py 97.79% <0.00%> (-0.20%) ⬇️
hera_cal/vis_clean.py 97.77% <0.00%> (+<0.01%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Member

@jsdillon jsdillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Just a handful of questions/comments.

hera_cal/io.py Outdated
read_data=True, read_flags=False, read_nsamples=False,
check=False, dtype=np.complex128, verbose=False):
def read_hera_hdf5(
filenames, bls=None, pols=None, keep_times: list[float] | None=None, full_read_thresh=0.002,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with this style of type hinting. Why is it| None=None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following None is the default for the parameter. I agree it could look a litle silly, though I'm not sure how it would look best.

Arguments:
filenames: list of files to read
bls: list of (ant_1, ant_2, [polstr]) tuples to read out of files.
Default: all bls common to all files.
pols: list of pol strings to read out of files. Default: all, but is
superceded by any polstrs listed in bls.
keep_times: list of times to read out of files. Default: all. Note: all times
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to implement some kind of tolerance here? I know pyuvdata has a tolerance for figuring out of times match, perhaps we can use their default under the hood?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think that should be doable. I'll add it in.

Comment on lines +1156 to +1159
if bls is not None:
antpairs = [bl[:2] for bl in bls]
else:
antpairs = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we ignoring polarization? If I only want to load one bl for one pol, shouldn't that be allowed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you can do that. This is just getting the antpairs for when we check whether all the antpairs are present in the data.

hera_cal/io.py Outdated
@@ -1366,22 +1415,25 @@ def _adapt_metadata(self, info_dict, skip_lsts=False):
info_dict['data_antpos'] = {ant: info_dict['antpos'][ant] for ant in info_dict['data_ants']}
info_dict['times'] = np.unique(info_dict['times'])
info_dict['times_by_bl'] = {ap: info_dict['times'] for ap in info_dict['antpairs']}
info_dict['times_by_bl'].update({(a2, a1): info_dict['times'] for (a1, a2) in info_dict['antpairs']})
# info_dict['times_by_bl'].update({(a2, a1): info_dict['times'] for (a1, a2) in info_dict['antpairs']})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't there originally -- I added it when I thought I needed to manually index all the conjugated bls. I'll remove it completely.

hera_cal/io.py Outdated
@@ -1233,8 +1258,13 @@ def read_hera_hdf5(filenames, bls=None, pols=None, full_read_thresh=0.002,
pols = list(pol_indices.keys())
bls = set(bl for bl in bls if len(bl) == 3)
bls = bls.union([bl + (p,) for bl in bls_len2 for p in pols])

# now, conjugate them if they're not in the data
bls = {(i, j, p) if (i, j) in info['bls'] else (j, i, utils.conj_pol(p)) for (i, j, p) in bls}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be slightly cleaner with utils.reverse_bl()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!

hera_cal/io.py Outdated

# Now subselect the times.
if keep_times is not None:
time_mask = np.array([t in keep_times for t in info['times']])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above about tolerances.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants