Skip to content

Conversation

@eerovaher
Copy link
Member

I am submitting a finance request for work on astropy units parsing, in particular parsing function units and custom units. The latter is important for reading existing data files that might be using non-standard units.

@AnaGabela
Copy link
Contributor

Hi,
Can you let us know what is the hourly rate you will be charging? We will need that for the paperwork.

Also can you clarify who will be the payee? Will there be a subcontractor, do you have a name?

Thanks!
A

@eerovaher
Copy link
Member Author

Also can you clarify who will be the payee? Will there be a subcontractor, do you have a name?

I'd be doing the work myself.

@nstarman
Copy link
Member

Hi! I like this idea. it’s definitely important to fix the many problems with our unit parsing. One question I wanted to raise: in an earlier PR of yours (#17652) we talked about possibly moving away from formatter classes toward formatter instances and the many ways it would improve the unit formatters. Do you think it’s in scope for this work to include that transition, or would that be better handled separately?

@eerovaher
Copy link
Member Author

The main challenge with switching from formatter classes to formatter instances is how to avoid breaking formatters implemented in downstream packages. I would be willing to investigate that as a part of this project, if the astropy.units maintainers think it's worthwhile.

@nstarman
Copy link
Member

Yes, it's definitely challenging, some kind of deprecation would be necessary.
Pinging @mhvk for thoughts...

@mhvk
Copy link
Contributor

mhvk commented Nov 14, 2025

@nstarman - To change to formatter instances is one of those things that would be nice to have in theory (I know I am one of those who suggested it...), but I think in practice it will probably have relatively little impact for what will be quite a bit of work, especially with deprecations, etc. It doesn't seem obvious that it would be effort and money well spent.

More on the topic here, it is certainly a long-standing problem that function units can only be parsed by general and cds, but I should probably mention that one of the reasons FITS doesn't have it, is that the FITS standard committee was less than helpful in thinking through how arguably the most relevant units (mag(ST) and mag(AB)) should be formatted.

p.s. On the proposal itself, I think it is a nice (if modest) improvement, but it does seem to me that the equivalent of six 40-hour weeks is rather a lot for the described work. (For reference for just the function unit part of it, I don't remember exactly, but ensuring cds could represent Dex took about a day; I think the whole creation of the function units took me of order a week, maybe two including pondering it for quite a while before and maintenance/corrections after. That said, the last 5-10% of making something really consistent and nice takes considerably more than 5-10% of the total time.)

EDIT: explicit suggestion: if the proposal is meant to also cover (part of) continued general maintenance and cleanup, do include that.

@eerovaher
Copy link
Member Author

I agree that parsing function units shouldn't take too much time, but I don't expect designing the API for defining custom units to be a quick process. For example, we need to ensure custom units do not interfere needlessly with standard units, but we cannot simply forbid all interference either because users might be trying to read a non-standard file for which some overriding of standard units is necessary. And the very nature of custom units means there is little guidance from external standards. One more complication is that the custom unit functionality should be available through the astropy I/O interface and I'm not too familiar with that (I suspect @mhvk is the only person who might be sufficiently familiar with both astropy.io and astropy.units internals to take that on immediately).

@mhvk
Copy link
Contributor

mhvk commented Nov 14, 2025

Yes, the custom units might take a bit more time. My sense is that the basic infrastructure is actually in place (FITS files give warnings that tell what to do, but perhaps not very clearly or even misleadingly, see astropy/astropy#15313), so it might mostly be documenting it more clearly. But agreed that it needs time to even think through what actually goes wrong/is needed.

p.s. A sideways related long-standing wish-list item is for FITS headers to return quantities if they list a unit in their string -- and vice versa, allow quantities to be written to headers. See astropy/astropy#9332 and an aborted attempt at astropy/astropy#11849.

@eerovaher
Copy link
Member Author

My sense is that the basic infrastructure is actually in place...

Some of the basic infrastructure seems to in place for the happy path, but my impression is that cases off the happy path have not received enough attention, and those are the cases that stand out for the users.

@kelle
Copy link
Member

kelle commented Nov 18, 2025

It would be useful for this funding request to have a budget range.

@eerovaher
Copy link
Member Author

Responding to #494 (comment)

(FITS files give warnings that tell what to do, but perhaps not very clearly or even misleadingly, see astropy/astropy#15313), so it might mostly be documenting it more clearly.

A part of the issue is that astropy often responds to unit parsing problems by suggesting custom units should be defined even if the problem had nothing to do with them. For example, the "cds" format allows specifying numerical factors that are powers of 10:

>>> from astropy import units as u
>>> u.Unit("10-3m", format="cds")
Unit("0.001 m")

But if a different factor is used we get a verbose error message that is mostly about custom units:

>>> u.Unit("2-3m", format="cds")
Traceback (most recent call last):
  ...
ValueError: '2-3m' did not parse as cds unit: Only base ten exponents are allowed in CDS If
this is meant to be a custom unit, define it with 'u.def_unit'. To have it recognized inside a
file reader or other code, enable it with 'u.add_enabled_units'. For details, see
https://docs.astropy.org/en/latest/units/combining_and_defining.html

And setting parse_strict="silent" produces a mostly useless UnrecognizedUnit

>>> u.Unit("2-3m", format="cds", parse_strict="silent")
UnrecognizedUnit(2-3m)

A better behavior would be that

  • with parse_strict="raise" an error is raised, but the error message should not bring up custom units at all,
  • with parse_strict="warn" the warning message informs that the factor should be a power of 10, but the string is parsed anyways,
  • with parse_strict="silent" the string is parsed silently.

That might be straightforward to implement, but this would have to be done for each parser rule individually, and we have several different parsers, which does add up to quite a lot.

@dhomeier
Copy link
Contributor

dhomeier commented Nov 18, 2025

Agreed that UnrecognizedUnit is not helpful at all here, but custom unit does not seem so far off – it is not an allowed CDS unit, and there is no obvious general way how this should be represented. Your example has some potential for confusion, as I would intuitively expect "10-3m" to become milimetres, but would you interpret "2-3m" as something like "octimetres"? Realising that "10-4m" is permitted in CDS as well, you may indeed say let parse_strict!="silent" pass this as Unit("0.125 m"), but this would already get you in trouble for cases like "7-1m" (represent this as a fraction rather? how many significant digits...?)

@mhvk
Copy link
Contributor

mhvk commented Nov 18, 2025

My main worry would be that we spent a lot of time on fixing a corner case that in practice occurs rarely. The present message was an attempt at fixing an issue raised at astropy - maybe we should not worry about further fixes until we've got more concrete use cases (e.g., actual Vizier Catalogues that the current code cannot read, etc.)?

@AnaGabela
Copy link
Contributor

Please react to this comment to vote on this proposal (👍, 👎, or no reaction for +0).

@nstarman nstarman marked this pull request as ready for review November 26, 2025 13:51
@kelle
Copy link
Member

kelle commented Dec 8, 2025

The Cycle 5 funding request process has been hugely successful! On the downside, that means our funds are severely oversubscribed. Even after the Finance Committee and SPOC have taken into consideration community feedback/voting and alignment with the roadmap, there are still more funding requests than we can afford in 2026.

We would like to stretch the budget as far as possible, and to fund as many activities as possible, while making sure the Project remains volunteer-driven. Hence, we would like to know if this project will still meet its deliverables if your minimum budget is reduced by 25%, 50%, or 100%. Or if there’s some other minimum, feel free to specify that instead.

As a reminder, there will be more funding for 2027 and we expect the Cycle 6 call for 2027 funding requests to begin in the Fall of 2026.

Thank you for your engagement and understanding as we continue to optimize our funding and budgeting processes and the balance of volunteer vs funded work!

(@eerovaher )

@eerovaher
Copy link
Member Author

Improving astropy unit parsers is not a monolithic task and some of the possible improvements could be implemented without too much effort. For example, implementing function unit support for the "vounit" parser shouldn't be too complicated (that being said it could have been implemented years ago but nobody has actually done it). So if I get more money I will work on this more. If I get less money I will work on this less. If I get no money I will not work at all. I expect that if you give me tiny amounts of money then the administration effort will not be worthwhile, but that might be a bigger problem for you than for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants