Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Care around datetime ranges with mixed tzifo / fix FiniteDatetimeRange.days #192

Open
Peter554 opened this issue Jan 24, 2025 · 8 comments
Assignees

Comments

@Peter554
Copy link
Contributor

Peter554 commented Jan 24, 2025

Problem summary

  • It's possible to construct FiniteDatetimeRanges with mixed tzinfo (where the start and end have different tzinfo).
  • This makes the FiniteDatetimeRange.days property confusing and error prone (Kraken: important for computation of daily standing charges). For example, is the below range 30 or 31 days?
RANGE = ranges.FiniteDatetimeRange(
    # This is also 2024-03-01T00:00:00 in TZ_UTC
    datetime.datetime(2024, 3, 1, tzinfo=TZ_LONDON),

    # This is 2024-04-01T00:00:00 in TZ_LONDON
    datetime.datetime(2024, 3, 31, hour=23, tzinfo=TZ_UTC),
)
  • We're unlikely to construct such ranges directly, but it's possible to create them accidentally via intersection/union operations. For example, if ranges coming from database records are typically going to be UTC, but other ranges are often created in localtime.
  • This issue was brought to the spotlight by some failing tests when trying to upgrade xocto in Kraken from v7.1.0 -> v7.1.1, but it's actually symptom of a pre-existing issue.

Problem in detail

See https://github.com/octoenergy/xocto/pull/191/files, which demonstrates:

  • For a FiniteDatetimeRange with mixed tzinfo the .days property is ambiguous.
  • (r1 & r2).days can be different from (r2 & r1).days.
  • (r1 | r2).days can be different from r1.days even when r2 == r1.
  • This demo is branched from v7.1.0, demonstrating that this issue existed before v7.1.1.

Proposed solution

  • [Breaking change] Remove the FiniteDatetimeRange.days property.
  • Instead, require callers to convert to a date range and compute days that way e.g. dt_range.localize(tz=...).as_date_range().days.
@Peter554 Peter554 self-assigned this Jan 24, 2025
@Peter554 Peter554 changed the title FiniteDatetimeRanges with mixed tzifo Disallow FiniteDatetimeRanges with mixed tzifo Jan 24, 2025
@benthorner
Copy link
Contributor

🐮 Thoughts about .days

This method is misleading for a range at time granularity: it should return a Decimal or be named e.g. whole_days.

Thoughts about ambiguity

This makes the FiniteDatetimeRange.days property ambiguous

Could you elaborate on this point? I can see some examples in the linked PR e.g.

# This range has inconsistent tzinfo.
# It's unclear how many days this represents.
range_utc = ranges.FiniteDatetimeRange(
    datetime.datetime(2024, 3, 1, tzinfo=TZ_LONDON),
    datetime.datetime(2024, 3, 31, hour=23, tzinfo=TZ_UTC),
)

I agree it's not clear - i.e. it's not easy to calculate - but I don't think that makes it ambiguous: the amount of elapsed time, in seconds, is unambiguous; thus the number of 24 hour periods is unambiguous.

Perhaps we're getting confused with "calendar days" e.g. 11AM today vs 3PM yesterday differs by 1 day on a calendar. The above example seems similar: it's only ambiguous in terms of calendar dates.

If we want to measure days in a particular timezone, arguably it's the responsibility of the developer to declare what the timezone is. In other words: we've just been too imprecise / lazy historically 😉.

Thoughts about & and |

The automatic timezone conversion here is arguably the problem e.g.

(r1 & r2) == (r2 & r1)
=> True

(r1 & r2).seconds
=> 2674800

(r2 & r1).seconds
=> 2678400

We've recently had a similar discussion about units e.g. 1 kW + 1000W == 2 kW == 2000 W - the intended unit is ambiguous. In both cases, arguably we just shouldn't allow this kind of mixing in operations.

Thoughts about proposal

I do agree mixed timezones is surprising and error-prone, so I'm in favour of checking they are the same:

  • On construction ➕.
  • In operations ✨.

This naturally forces us to be more precise in at least some places, so is consistent with my thoughts about ambiguity.

@Peter554
Copy link
Contributor Author

Peter554 commented Jan 27, 2025

@benthorner thanks for taking a look

This makes the FiniteDatetimeRange.days property ambiguous

Could you elaborate on this point? I can see some examples in the linked PR e.g.

# This range has inconsistent tzinfo.
# It's unclear how many days this represents.
range_utc = ranges.FiniteDatetimeRange(
    datetime.datetime(2024, 3, 1, tzinfo=TZ_LONDON),
    datetime.datetime(2024, 3, 31, hour=23, tzinfo=TZ_UTC),
)

It's only ambiguous in that the concept of "days" in general is ambiguous. According to its usage in Kraken this .days property is used within billing calculations to count calendar days. So in practice this .days property is intended to represent whole calendar days.


You can see an example of how this property is (misused?) in a billing calculation from this tiny snippet:

measurement_period = certificates.period & period
...
scale = Decimal(measurement_period.days).quantize(three_dp) / certificates.period.days

# scale is then used to scale a cost.
# The problem occurs because one of `certificates.period` and `period` are in UTC 
# and the other is in localtime. The resulting `measurement_period` has mixed tzinfo 
# (following xocto v7.1.1), and the `.days` property doesn't behave as expected (we end up 
# with the example range I gave above, and the code expects 31 days, but the result is actually 30 days).

I agree that .days in general is imprecisely named. What would you think to this idea?

  • Changing the name to range.whole_days.
  • And also, making it a method, that takes a tzinfo i.e. range.whole_days(tz=...)

@jarshwah
Copy link
Contributor

I'm not sure that "days" is ambiguous. The python standard library uses "days" as an integer, and I don't think that has been a point of confusion.

The confusion is.. date math in python does not consider TZ or fold attributes. It does a straight comparison between the date and time components only.

IMO we should probably enforce ranges ONLY being in UTC - though that'd take a lot more thought to confirm than I've actually given, and our range database types try to localise into the local timezone by default (a decision I made that I massively regret, and we should never use in Kraken!). Enforcing a consistent timezone (even if not UTC) is the next best and probably safest compatibility wise.

If enforcing a consistent timezone for our intersection/union/other operators means that the order of operations is no longer buggy, great. If not, then we should definitely fix that oddity as well.

@benthorner
Copy link
Contributor

The confusion is.. date math in python does not consider TZ or fold attributes

@jarshwah to me that's a secondary issue here. To me the problem is we are not clear about what we want:

  • FiniteDatetimePeriod(utc_dt, gbr_dt).days - "days" in whose calendar? UTC? GBR?
  • (utc_dt & gbr_dt).days - ditto

The python standard library uses "days" as an integer

Are we talking about the timedelta class here? Our ...Period class isn't really equivalent e.g. .seconds is all the seconds, not just the remainder after days and hours. @Peter554 I don't mind if we leave it as-is for now: it's tangential.

👉 Sounds like we may be in agreement:

  • Restrict ...Period boundaries to be the same timezone.
  • Require a consistent timezone in operations on periods.

Does that sound OK to everyone?

@Peter554
Copy link
Contributor Author

  • Restrict ...Period boundaries to be the same timezone.
  • Require a consistent timezone in operations on periods.

Does that sound OK to everyone?

@benthorner It does in theory sound ideal to me, however I created a PR for this change (#194) and tried installing it one Kraken. The number of errors is rather daunting so I think we might need to find a smaller change to fix this issue in the short term. Context: Upgrading xocto to v7.1.1 in Kraken is currently blocked since it surfaced this issue.

While ranges with mixed tzinfo are generally going to be confusing/error-prone, we're mainly talking about the .days property here (that's the main place I can see that it's leading to errors in Kraken). So I may try to address this just at the level of that property for now, making it a method accepting a tzinfo (.days(tz=...)). That's a smaller change and likely much faster to get through into Kraken.


and our range database types try to localise into the local timezone by default (a decision I made that I massively regret, and we should never use in Kraken!)

@jarshwah Sidetracking but this has me concerned since I've been encouraging people to use our custom range database types! If we regret that decision is it too late to change it? Can't we fix that? Is it worth opening a separate issue for this?

Peter554 added a commit that referenced this issue Jan 30, 2025
It's possible to create mixed TZ FiniteDatetimeRange's.
It probably shouldn't be, but it currently is and this is
unlikely to change too soon.
See #192

This makes the FiniteDatetimeRange.days poorly defined.
For example, should the below range be 31 or 30 days?

RANGE = ranges.FiniteDatetimeRange(
    # This is also 2024-03-01T00:00:00 in TZ_UTC
    datetime.datetime(2024, 3, 1, tzinfo=TZ_LONDON),
    # This is 2024-04-01T00:00:00 in TZ_LONDON
    datetime.datetime(2024, 3, 31, hour=23, tzinfo=TZ_UTC),
)

This PR updates FiniteDatetimeRange.days to require passing
a TZ, to remove this ambiguity.
@Peter554
Copy link
Contributor Author

Peter554 commented Jan 30, 2025

It does in theory sound ideal to me, however I created a PR for this change (#194) and tried installing it one Kraken. The number of errors is rather daunting so I think we might need to find a smaller change to fix this issue in the short term. Context: Upgrading xocto to v7.1.1 in Kraken is currently blocked since it surfaced this issue.

A possible quicker fix to unblock things -> #195

Update: Still quite a few errors trying to install this. Less, but still quite some.

Peter554 added a commit that referenced this issue Jan 30, 2025
In #187 the implementation of
FiniteDatetimeRange.intersection was optimised. This introduced a very
subtle change in behaviour. Previously when left.end was equal to
right.end then the right.end was favoured. In
#187 the implementation was
(unintentionally) changed to favour left.end.

This change reverses that, to again favour right.end.

This really shouldn't matter, but it can be signidicant when
the TZ of left and right are different. Eventually we'd like
to prevent this kind of mixed TZ behaviour (see
#192), but in the short
term it seems reasonable to make this tiny change to make things
consistent again with how they were before.
Peter554 added a commit that referenced this issue Jan 30, 2025
It's possible to create mixed TZ FiniteDatetimeRange's.
It probably shouldn't be, but it currently is and this is
unlikely to change too soon.
See #192

This makes the FiniteDatetimeRange.days poorly defined.
For example, should the below range be 31 or 30 days?

RANGE = ranges.FiniteDatetimeRange(
    # This is also 2024-03-01T00:00:00 in TZ_UTC
    datetime.datetime(2024, 3, 1, tzinfo=TZ_LONDON),
    # This is 2024-04-01T00:00:00 in TZ_LONDON
    datetime.datetime(2024, 3, 31, hour=23, tzinfo=TZ_UTC),
)

This PR updates FiniteDatetimeRange.days to require passing
a TZ, to remove this ambiguity.
Peter554 added a commit that referenced this issue Jan 30, 2025
In #187 the implementation of
FiniteDatetimeRange.intersection was optimised. This introduced a very
subtle change in behaviour. Previously when left.end was equal to
right.end then the right.end was favoured. In
#187 the implementation was
(unintentionally) changed to favour left.end.

This change reverses that, to again favour right.end.

This really shouldn't matter, but it can be signidicant when
the TZ of left and right are different. Eventually we'd like
to prevent this kind of mixed TZ behaviour (see
#192), but in the short
term it seems reasonable to make this tiny change to make things
consistent again with how they were before.
Peter554 added a commit that referenced this issue Jan 30, 2025
In #187 the implementation of
FiniteDatetimeRange.intersection was optimised. This introduced a very
subtle change in behaviour. Previously when left.end was equal to
right.end then the right.end was favoured. In
#187 the implementation was
(unintentionally) changed to favour left.end.

This change reverses that, to again favour right.end.

This really shouldn't matter, but it can be signidicant when
the TZ of left and right are different. Eventually we'd like
to prevent this kind of mixed TZ behaviour (see
#192), but in the short
term it seems reasonable to make this tiny change to make things
consistent again with how they were before.
Peter554 added a commit that referenced this issue Jan 30, 2025
In #187 the implementation of
FiniteDatetimeRange.intersection was optimised. This introduced a very
subtle change in behaviour. Previously when left.end was equal to
right.end then the right.end was favoured. In
#187 the implementation was
(unintentionally) changed to favour left.end.

This change reverses that, to again favour right.end.

This really shouldn't matter, but it can be signidicant when
the TZ of left and right are different. Eventually we'd like
to prevent this kind of mixed TZ behaviour (see
#192), but in the short
term it seems reasonable to make this tiny change to make things
consistent again with how they were before.
@djrobstep
Copy link

djrobstep commented Jan 31, 2025

Seems to me that the root cause of the problem is that datetime ranges are being used rather than date ranges.

Both have a clear, unambiguous definition of "days", it's muddying the two where the confusion creeps in.

- days between two datetimes = number of seconds between the two / number of secs in 24 hours, ie a float
- days between two dates = integer number of hops from one to the other along a calendar

Presumably what billing wants is actually the latter - in which case the conversion ought to be explicit.

So instead of days taking a timezone, how about allowing conversion to a date range in a specific time zone?

measurement_period.as_date_range(tz=...).days

That forces rounding etc to be dealt with explicitly (the existing method fails if either of the datetimes aren't midnight-aligned, so it would also need an option to allow this)

Fractional number of days is probably not what people usually want, so days() on the existing DateTimeRange method could be deprecated and replaced with fractional_days() to help avoid the footgun.

Thoughts?

Peter554 added a commit that referenced this issue Jan 31, 2025
In #187 the implementation of
FiniteDatetimeRange.intersection was optimised. This introduced a very
subtle change in behaviour. Previously when left.end was equal to
right.end then the right.end was favoured. In
#187 the implementation was
(unintentionally) changed to favour left.end.

This change reverses that, to again favour right.end.

This really shouldn't matter, but it can be signidicant when
the TZ of left and right are different. Eventually we'd like
to prevent this kind of mixed TZ behaviour (see
#192), but in the short
term it seems reasonable to make this tiny change to make things
consistent again with how they were before.
@Peter554
Copy link
Contributor Author

Peter554 commented Feb 2, 2025

@djrobstep Thanks for the feedback - yes this is likely what we want. There was also some internal discussion in slack that reached a similar conclusion. I'll updated this issue.

@Peter554 Peter554 changed the title Disallow FiniteDatetimeRanges with mixed tzifo Care around datetime ranges with mixed tzifo / fix FiniteDatetimeRange.days Feb 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants