-
-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fraction computation with large times produces strange timestamps which are not printed sanely #59
Comments
To be clear, I think part of the problem here is that the printed duration now reads quite strangely. 10518456960 minutes AND 972891790359s? This is quite unintuitive, since I would never expect to need to compute the actual number of minutes based on the number of seconds as well. In other words, I think it is confusing that the seconds value ever is greater than or equal to 60 when minutes are presented, and similarly for other units. |
Catching up here, here is a concrete program that fails that I believe should succeed: use jiff::{Span, Unit};
fn main() -> anyhow::Result<()> {
let span1: Span = "Pt843517081,1H".parse()?;
let iso = span1.to_string();
eprintln!("span1 serialized: {iso:?}");
let span2: Span = iso.parse()?;
assert_eq!(span1.total(Unit::Hour)?, span2.total(Unit::Hour)?);
Ok(())
} Output:
So this looks to me like there is a real bug here. Nice find! I'll investigate more later. But yes, there are some shenanigans happening here:
But I can't investigate more closely right now. I'll do so later. |
Because Jiff spans (like Temporal durations) represent each unit individually, and because ISO 8601 require the use of fractional seconds to represent sub-second units, there is an inherent loss of fidelity when spans are serialized to the ISO 8601 duration format. For example, a Jiff span like `1 second 2000 milliseconds` is distinct from `3 seconds` even though they represent the same amount of elapsed time. These types of spans are unbalanced. When a span like `1 second 2000 milliseconds` is ISO 8601 serialized, it's turned into `PT3s` because there simply is no way to represent greater-than-second durations in smaller-than-second units in the ISO 8601 duration format. So when `PT3s` is deserialized, Jiff of course has no idea that `1 second 2000 milliseconds` was actually intended, and thus treats it as `3 seconds`. On top of this, Jiff imposes fairly strict limits on the individual units of a `Span`. Other than nanoseconds, every individual unit can express the full range of time supported by Jiff (`-9999-01-01` through `9999-12-31`) and nothing more. So if one serializes a span to the ISO 8601 format with large millisecond, microsecond or nanosecond components, those have to be folded into the larger hour, minute or second units. This in turn can create a ISO 8601 duration whose hour, minute and second units exceed Jiff's unit limits. So in order to preserve the ability to at least roundtrip all Jiff spans (even if the individual unit values are lost), Jiff will automatically rebalance these "larger" units into smaller units at parse time. This is a big complicated mess and it turns out I got one part of this wrong: I was only re-balancing units at parse time when we parsed a fractional component. But in fact, we should be re-balancing spans even when there isn't a fractional component. Namely, the milliseconds, microseconds and nanosecond components can add up to a whole number of seconds, resulting in a whole number of seconds in the corresponding ISO 8601 duration. This bug was found by @addisoncrump's fuzzer. Nice work. Fixes #59
Because Jiff spans (like Temporal durations) represent each unit individually, and because ISO 8601 require the use of fractional seconds to represent sub-second units, there is an inherent loss of fidelity when spans are serialized to the ISO 8601 duration format. For example, a Jiff span like `1 second 2000 milliseconds` is distinct from `3 seconds` even though they represent the same amount of elapsed time. These types of spans are unbalanced. When a span like `1 second 2000 milliseconds` is ISO 8601 serialized, it's turned into `PT3s` because there simply is no way to represent greater-than-second durations in smaller-than-second units in the ISO 8601 duration format. So when `PT3s` is deserialized, Jiff of course has no idea that `1 second 2000 milliseconds` was actually intended, and thus treats it as `3 seconds`. On top of this, Jiff imposes fairly strict limits on the individual units of a `Span`. Other than nanoseconds, every individual unit can express the full range of time supported by Jiff (`-9999-01-01` through `9999-12-31`) and nothing more. So if one serializes a span to the ISO 8601 format with large millisecond, microsecond or nanosecond components, those have to be folded into the larger hour, minute or second units. This in turn can create a ISO 8601 duration whose hour, minute and second units exceed Jiff's unit limits. So in order to preserve the ability to at least roundtrip all Jiff spans (even if the individual unit values are lost), Jiff will automatically rebalance these "larger" units into smaller units at parse time. This is a big complicated mess and it turns out I got one part of this wrong: I was only re-balancing units at parse time when we parsed a fractional component. But in fact, we should be re-balancing spans even when there isn't a fractional component. Namely, the milliseconds, microseconds and nanosecond components can add up to a whole number of seconds, resulting in a whole number of seconds in the corresponding ISO 8601 duration. This bug was found by @addisoncrump's fuzzer. Nice work. Fixes #59
The following testcases trigger the observed behaviour:
Looking a bit further, it seems that the routine for parsing fractional time is a bit strange in the presence of large values. Specifically (for the case of minutes), this code region:
This code is explained by the following text:
It is unclear to me why the units would be backfilled to the smallest unit. Shouldn't this be implemented in the other direction, preferring larger units for larger values?
Regardless, the result is that the
nanos
value is set to something large. For the first testcase, the resulting computed timestamp is printed asPT10518456960m972891790359s
. This value is correct, but not parseable because the seconds unit is out of range:Note that in memory, this value is actually:
So it would appear that this issue is in part induced by the printer implementation.
This was caught by #57 because the printer emitted invalid timestamps.
The text was updated successfully, but these errors were encountered: