Range semantics #56479

LilithHafner · 2024-11-06T21:19:47Z

Theoretical question:
Is the range 0.0:0.1:1.0
A) a performance optimization that should behave identically to the vector [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] or
B) does it represent the 11 real numbers evenly spaced between 0 and 1, including both endpoints?

Practical implications:
Under A, diff(0:.1:1) is [0.1, 0.1, 0.09999999999999998, 0.10000000000000003, 0.09999999999999998, 0.09999999999999998, 0.09999999999999998, 0.10000000000000009, 0.09999999999999998, 0.09999999999999998] (current implementation)
Under B, a performance and accuracy improvement would be to define diff(x::AbstractRange) = fill(step(x), length(x)-1) which would return [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1].

Under A, searchsorted(1f8-10:1f8, 1f8) should return 5:9
Under B, searchsorted(1f8-10:1f8, 1f8) should return 9:9 (current implementation, see #34408)

Under A, collecting a range before performing an operation should never change results
Under B, a range can be more precise than the collected vector so it is possible for collect to introduce rounding errors which change results.

The text was updated successfully, but these errors were encountered:

mbauman · 2024-11-06T21:54:57Z

Even better, diff(A::AbstractRange) could be implemented as r[begin:end-1] .- r[begin+1:end], which gives back the range:

julia> rangediff(r) = r[begin:end-1] .- r[begin+1:end]
rangediff (generic function with 4 methods)

julia> rangediff(0:.1:1)
StepRangeLen(-0.1, 0.0, 10)

julia> println(collect(ans))
[-0.1, -0.1, -0.1, -0.1, -0.1, -0.1, -0.1, -0.1, -0.1, -0.1]

This is in keeping with how we do lots of range arithmetic — it's O(1) where possible, even if it means that the values are different than the collected versions would be. For example, (0:.1:1) ./ 3 is 0:1/30:1/3 instead of the division of the values the range generates:

julia> (0:.1:1) ./ 3
0.0:0.03333333333333333:0.3333333333333333

julia> (collect(0:.1:1) ./ 3) .- collect((0:.1:1) ./ 3)
11-element Vector{Float64}:
  0.0
  0.0
  0.0
 -1.3877787807814457e-17
  0.0
  0.0
 -2.7755575615628914e-17
 -2.7755575615628914e-17
  0.0
  0.0
  0.0

So what's the rule? I think it's gotta be a little gray:

If you can compute the operation with O(1) range operations directly on the begin / step / stop / len quad and return an O(1) object, then that's the right thing to do.
Otherwise, it's gotta work on the generated values

Where does this leave searching? No clue.

adienes · 2024-11-06T21:57:58Z

I have, in the past, though I do not remember the exact discussions, been informed of the following two stylized facts:

a range IS a vector, almost equal to its collected vector, but with some performance improvements on top
a range always includes its two endpoints

is it possible for a clean design to satisfy both? it may have to sacrifice "equally spaced."

andrewjradcliffe · 2024-11-07T02:08:33Z

Even better, diff(A::AbstractRange) could be implemented as r[begin:end-1] .- r[begin+1:end], which gives back the range:

julia> rangediff(r) = r[begin:end-1] .- r[begin+1:end]
rangediff (generic function with 4 methods)

julia> rangediff(0:.1:1)
StepRangeLen(-0.1, 0.0, 10)

julia> println(collect(ans))
[-0.1, -0.1, -0.1, -0.1, -0.1, -0.1, -0.1, -0.1, -0.1, -0.1]

I think you mean rangediff(r) = r[begin+1:end] .- r[begin:end-1] ^_^;;

oscardssmith · 2024-11-07T02:23:34Z

triage defers to a higher power.

StefanKarpinski · 2024-11-07T15:36:09Z

Lol, took me a while to realize I'm supposed to be the higher power.

mbauman · 2024-11-07T15:54:42Z

So there are more than two possible semantics:

There's the theoretical perfect spacing of 11 elements between 0.0 and 1.0 (or eleven increments of 1 starting from 1f8)
There's the internal implementation, pre-rounding.
There are potentially alternative implementations that try to work on what they think the maths behind the range are, based on the outputs of first(r), last(r), step(r), length(r).
There's the rounded element-wise getindex approximation

In the case of searchsorted, we currently use a mix of 3 and 4. This means that we can find 0.3 in 0:0:.1:1, but I can't find the theoretical 3//10 — instead, both 3//10 and 4//10 correctly land between 0.3 and 0.4. That works, and it's "just" an optimization. But it assumes that the rounded range outputs (4) are monotonically increasing, which is why it falls apart with 1f8:

julia> searchsorted(1f8:1f8+8, 100000000)
1:1

julia> searchsorted(1f8:1f8+8, 100000001)
2:1

julia> searchsorted(collect(1f8:1f8+8), 100000001)
6:5

I think the rule has got to be about what the particular function returns. search/find/etc return indices — indices that are specific to the given range. Thus those indices must be able to be used as indices in the range and get the values they found.

Conversely, simple range arithmetic returns other ranges. As long as the start/stop and length are correctly computed, I think it's ok for the collected variants to give slightly different numeric answers. It's not terribly unlike different summation algorithms, really.

mbauman · 2024-11-07T15:58:10Z

search/find/etc return indices — indices that are specific to the given range. Thus those indices must be able to be used as indices in the range and get the values they found.

~~Amusingly~~ Horrifyingly, though, I don't see any feasible way to get around this disconnect:

julia> r = 1f8:1f8+12
1.0f8:1.0f0:1.0000002f8

julia> idxs = searchsorted(collect(r), 100000000) # Assuming searchsorted returns the range outputs that match
1:5

julia> r[idxs]
1.0f8:1.0f0:1.0f8

julia> all(==(100000000), r[idxs])
true

julia> r[idxs] .- 100000000
0.0f0:1.0f0:4.0f0

julia> collect(ans)
5-element Vector{Float32}:
 0.0
 1.0
 2.0
 3.0
 4.0

LilithHafner added design Design of APIs or of the language itself needs decision A decision on this change is needed ranges Everything AbstractRange triage This should be discussed on a triage call labels Nov 6, 2024

oscardssmith assigned StefanKarpinski Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Range semantics #56479

Range semantics #56479

LilithHafner commented Nov 6, 2024

mbauman commented Nov 6, 2024

adienes commented Nov 6, 2024

andrewjradcliffe commented Nov 7, 2024

oscardssmith commented Nov 7, 2024

StefanKarpinski commented Nov 7, 2024

mbauman commented Nov 7, 2024

mbauman commented Nov 7, 2024 •

edited

Loading

Range semantics #56479

Range semantics #56479

Comments

LilithHafner commented Nov 6, 2024

mbauman commented Nov 6, 2024

adienes commented Nov 6, 2024

andrewjradcliffe commented Nov 7, 2024

oscardssmith commented Nov 7, 2024

StefanKarpinski commented Nov 7, 2024

mbauman commented Nov 7, 2024

mbauman commented Nov 7, 2024 • edited Loading

mbauman commented Nov 7, 2024 •

edited

Loading