Skip to content

Conversation

@t-brandt
Copy link
Contributor

@t-brandt t-brandt commented May 7, 2025

Resolves JP-3997

Closes #

This PR addresses the mosaic pipeline for both JWST and Roman. It demonstrates an approach to dramatically decrease the amount of time spent in computing coordinate transformations, which currently dominates runtime, with negligible loss of accuracy assuming the distortion corrections are all smooth. The PR itself is not intended to be merged straightaway; the intention is to change some combination of the routine adjusted here and a similar routine in drizzle
https://github.com/spacetelescope/drizzle/blob/63749f487b766e814edcafc50f5ceed1e470d71e/drizzle/utils.py#L10 ,
and to use a single routine if possible for maintainability.

Tasks

  • update or add relevant tests
  • update relevant docstrings and / or docs/ page
  • Does this PR change any API used downstream? (if not, label with no-changelog-entry-needed)
    • write news fragment(s) in changes/: echo "changed something" > changes/<PR#>.<changetype>.rst (see below for change types)
    • run regression tests with this branch installed ("git+https://github.com/<fork>/stcal@<branch>")
news fragment change types...
  • changes/<PR#>.apichange.rst: change to public API
  • changes/<PR#>.bugfix.rst: fixes an issue
  • changes/<PR#>.general.rst: infrastructure or miscellaneous change

@codecov
Copy link

codecov bot commented May 7, 2025

Codecov Report

❌ Patch coverage is 81.48148% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.84%. Comparing base (17e1705) to head (bb9e687).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/stcal/outlier_detection/utils.py 81.48% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #367      +/-   ##
==========================================
- Coverage   89.89%   89.84%   -0.05%     
==========================================
  Files          65       65              
  Lines        9983     9994      +11     
==========================================
+ Hits         8974     8979       +5     
- Misses       1009     1015       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mcara
Copy link
Member

mcara commented Jun 6, 2025

I may be wrong but I am very concerned about this approach for the following reason: drizzle computes coordinates in the resampled image by interpolating pixmap values. Your approach would make drizzle use interpolated values to interpolate again (interpolate interpolated values).

Although the effects of a coarser grid may not be visible in science data, they are clearly visible in the weights of resampled images (from my past experience with a similar option in the drizzlepac that also allows adjusting coarseness of the grid).

@mcara
Copy link
Member

mcara commented Jun 6, 2025

There is also another issue: pixmap may and will contain both NaN at locations outside of the bounding box and, according to @nden, it may contain (i.e., it is not forbidden) NaN values even inside the bounding box. This PR does not seem to handle this case. Also, this can result in a wider border region along the bounding box to unusable due to a coarser grid near NaN values.

@schlafly
Copy link
Collaborator

I can't remember what accuracy Tim said he was targeting with this PR, but it was something like 10^-6 pixels. You're right that there is a double interpolation here, but the interpolation at this stage is so close to perfect that I don't think we need to worry about it.

Re the wider NaN region at the boundary, I guess so, but by construction this is starting at the edges of the bounding box, so I think this will similarly only happen at the 10^-6 pixel level.

For NaNs within the bounding box, I can see worse things happening, I agree. I don't think that's ever the case with Roman or the usual Webb imaging detectors, so I think there are enough cases where we'd want to set fast = True that this is still worth it. But for pathological cases we would be able to fall back to the fast = False mode.

I think I can imagine more complicated WCS boundaries for spectroscopic modes, but I am having trouble coming up with any existing examples with ndim = 2 and interior NaNs.

Maybe more broadly, this PR will not be good if the WCS is ill behaved; if the WCS has lots of discontinuities or very small-scale structure, the interpolation will fall down. I see interior NaNs as a kind of extreme ill-behavior. But since this PR does an evaluation every 10x10 pixels that structure has to be very small-scale to matter. And if one starts caring about structure on much smaller scales than that, presumably lots of other things start breaking as regular rectangular pixels becomes a poor approximation.

@t-brandt
Copy link
Contributor Author

My target accuracy with a 10x10 coarsening was something like 1e-4 pixels in Roman, at least with the placeholder distortion correction. The grid could be different: 3x3 or 4x4 would still offer a very large speedup. The error depends on the importance of nonlinear terms in the distortion solution. If these are quadratic to leading order, then a 1 pixel shift past a linear fit over 100 pixels would be a 1e-2 pixel shift over ten pixels, or a 0.0025 pixel linear interpolation error at the center of the nodes. That's actually a huge distortion: it would be a 1600 pixel shift over a Roman detector. If we are talking about 1 pixel shift over 1000 pixels, i.e. much weaker distortion, this will fall by two orders of magnitude. The error would also fall with the square of the oversampling factor, so if this is a concern, then 4x4 oversampling should get most of the speedup at a negligible cost to accuracy. It would make sense, to me, to have this coarsening be an input parameter.

I would expect the effects of this to be visible in the weights image depending on the coarseness used. At a high coarseness it will clearly be there, while I would expect it to be very faint with a coarseness of 4 or 5.

@schlafly
Copy link
Collaborator

@mcara , what do you think next steps are here? I think it will make a big difference for the mosaic step for Roman.

@mcara
Copy link
Member

mcara commented Sep 26, 2025

@schlafly Since this PR would not affect drizzle package itself and you are satisfied with it, I have no problem at all.

Copy link
Collaborator

@emolter emolter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it might make sense to expose the interpolation factor as an optional input? That way we're covered if we realize that, say, one of the JWST modes needs a finer interp grid.

Could even consider replacing the "fast" flag such that interp<=1 falls back to the old "slow" version

@mcara
Copy link
Member

mcara commented Nov 18, 2025

@t-brandt I would discourage high interpolation orders because if the interpolation function is wiggly enough then the inverse may be a multivalued function which may lead to incorrect computations (I think in polygon intersections). Just something to be aware of.

Copy link
Member

@mcara mcara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. AstroDrizzle in the DrizzlePac has a parameter stepsize: https://drizzlepac.readthedocs.io/en/deployment/astrodrizzle.html. It really functions as fast from this PR: when stepsize=1 => fast=False. When stepsize>1 => fast=True and you can control how much "faster". We could have a parameter pixmap_step to do the same. No need for fast, IMO.
  2. I am worried about using cubic splines because of their tendency to wiggle. Inverse of pixmap is used to compute scan regions via polygon intersections (not directly by the resampling algorithm) and this wiggling is bound to create issues (for example, you may discover that some pixels near the edges of the images may miss). I kind of strongly believe we should stay with 1st order for now. Maybe do some investigation and see if interpolating functions (for the X and Y coordinates) are wiggly both for the current Roman distortions and also for realistic JWST models. If @schlafly thinks that it's OK to use 3rd order considering potential issues, I will approve it. Even then, I think we need a parameter that can select order.
  3. This change really needs a unit test on some typical distortion model. Make sure computed values with fast=True (or pixmap_step=5 or 10) are close to those with fast=False to within expected accuracy (1e-3? 1e-4?)

Copy link
Member

@mcara mcara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know whether my previous review comment will be preserved so I will repost it here (sorry if it happens to be a duplicate):

  1. AstroDrizzle in the DrizzlePac has a parameter stepsize: https://drizzlepac.readthedocs.io/en/deployment/astrodrizzle.html. It really functions as fast from this PR: when stepsize=1 => fast=False. When stepsize>1 => fast=True and you can control how much "faster". We could have a parameter pixmap_step to do the same. No need for fast, IMO.
  2. I am worried about using cubic splines because of their tendency to wiggle. Inverse of pixmap is used to compute scan regions via polygon intersections (not directly by the resampling algorithm) and this wiggling is bound to create issues (for example, you may discover that some pixels near the edges of the images may miss). I kind of strongly believe we should stay with 1st order for now. Maybe do some investigation and see if interpolating functions (for the X and Y coordinates) are wiggly both for the current Roman distortions and also for realistic JWST models. If @schlafly thinks that it's OK to use 3rd order considering potential issues, I will approve it. Even then, I think we need a parameter that can select order.
  3. This change really needs a unit test on some typical distortion model. Make sure computed values with fast=True (or pixmap_step=5 or 10) are close to those with fast=False to within expected accuracy (1e-3? 1e-4?)


grid = gwcs.wcstools.grid_from_bounding_box(bb)
return np.dstack(reproject(in_wcs, out_wcs)(grid[0], grid[1]))
if fast and len(in_shape) == 2:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is len(in_shape) == 2 important here? How is the code different from the case of fast=False? That is, would the code handle more than 2 axes when false=False?

Comment on lines 300 to 301
x_coarse = np.linspace(x[0], x[-1], max(len(x)//10, 10))
y_coarse = np.linspace(y[0], y[-1], max(len(y)//10, 10))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This automated way of computing step may lead to some very large steps, i.e., for a 4000x4000 image, step would be 400 pixels. Is that reasonable? What would be the accuracy for such a large step?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the grid seems to be mostly but not totally uniform: the last column/row can be quite narrow (as little as 1 pixel). I would suggest using 10 (or pixmap_step) as an initial approximation => number of nodes => uniform step (<=pixmap_step to err on the side of accuracy).

Copy link
Contributor Author

@t-brandt t-brandt Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the step would not be 400 pixels. The number of points is the maximum of 10 and len(x)//10: no fewer than 10 steps, and no more than about 10 pixel gaps between steps. Also, this grid is uniform with linspace: there will be fractional pixel values for interpolation. This is to ensure that the first and last pixels in each row/column are present so that the algorithm never extrapolates.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you are right: I forgot that the third argument in numpy.linspace is the number of steps and not the step length.

@schlafly
Copy link
Collaborator

I am worried about using cubic splines because of their tendency to wiggle. Inverse of pixmap is used to compute scan regions via polygon intersections (not directly by the resampling algorithm) and this wiggling is bound to create issues (for example, you may discover that some pixels near the edges of the images may miss). I kind of strongly believe we should stay with 1st order for now. Maybe do some investigation and see if interpolating functions (for the X and Y coordinates) are wiggly both for the current Roman distortions and also for realistic JWST models. If @schlafly thinks that it's OK to use 3rd order considering potential issues, I will approve it. Even then, I think we need a parameter that can select order.

For cases like this one where we're interpolating some high-order polynomial I'm not worried about cubic splines (and, indeed, I like that they're smooth with smooth derivatives). fast = True vs. a specified step size seems reasonable to me (with step size <= 1 meaning no interpolation?). It's also reasonable to make the order a parameter, leaving the default at Tim's choice.

@mcara
Copy link
Member

mcara commented Dec 16, 2025

I like that they're smooth with smooth derivatives

True, but smoothness does not imply (unique) invertibility.

@t-brandt
Copy link
Contributor Author

The interpolation is done between pixels in a resampled image and pixels in an input image. The fundamental assumption is that the mapping of pixels to coordinates on the sky is reasonably smooth without major wiggles. If it is not smooth and dominated by a linear term on a scale of tens of pixels and if there is any risk of non-invertibility, that would be quite the optical setup: I cannot conceive of such a situation for an imaging application.

@t-brandt
Copy link
Contributor Author

@mcara: I agree that the changes in this PR match stepsize in AstroDrizzle with order=1. I fully support making order and stepsize callable parameters. I will make those changes. I think we can put order=1 as the default and test order=3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants