Skip to content

Commit cd848bc

Browse files
authored
GH-47734: [Python] Fix hypothesis timedelta bounds for duration/interval types (#48460)
### Rationale for this change Unbounded hypothesis timedeltas overflow int64 storage when converted to duration[ns]; this adds safe bounds like we're doing it for timestamps. Assuming from the code, I think overflow is happening here: https://github.com/apache/arrow/blob/203437b4d6848885de72f32bfb3017919373a736/python/pyarrow/tests/strategies.py#L144 https://github.com/HypothesisWorks/hypothesis/blob/7288aa8f07f6ba61093b1eac6571d13632f31a54/hypothesis-python/src/hypothesis/strategies/_internal/datetime.py#L347C5-L350 Simple example would be: ```python pa.array([datetime.timedelta.max], type=pa.duration('ns')) ``` Disclaimer: I cannot reproduce it in my local so I can't confirm that the above is correct. There should be something else but I think it's good to set the bounds in any event. ### What changes are included in this PR? Explicitly set the bounds for `st.timedeltas` in hypothesis. 90% of the capacity when it's a nano second. ### Are these changes tested? Passed in #48460 (comment) ### Are there any user-facing changes? No, test-only. * GitHub Issue: #47734 Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Raúl Cumplido <[email protected]>
1 parent ad9279e commit cd848bc

File tree

1 file changed

+18
-2
lines changed

1 file changed

+18
-2
lines changed

python/pyarrow/tests/strategies.py

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -323,9 +323,25 @@ def arrays(draw, type, size=None, nullable=True):
323323
value = st.datetimes(timezones=st.just(tz), min_value=min_datetime,
324324
max_value=max_datetime)
325325
elif pa.types.is_duration(ty):
326-
value = st.timedeltas()
326+
if ty.unit in ('s', 'ms'):
327+
min_value = datetime.timedelta.min
328+
max_value = datetime.timedelta.max
329+
elif ty.unit == 'us':
330+
max_int64 = 2**63 - 1
331+
max_days = max_int64 // (86400 * 10**6)
332+
min_value = datetime.timedelta(days=-max_days)
333+
max_value = datetime.timedelta(days=max_days)
334+
else: # 'ns'
335+
# Empirically tested value
336+
min_value = datetime.timedelta(days=-96_075)
337+
max_value = datetime.timedelta(days=96_075)
338+
value = st.timedeltas(min_value=min_value, max_value=max_value)
327339
elif pa.types.is_interval(ty):
328-
value = st.timedeltas()
340+
# Empirically tested value
341+
value = st.timedeltas(
342+
min_value=datetime.timedelta(days=-96_075),
343+
max_value=datetime.timedelta(days=96_075)
344+
)
329345
elif pa.types.is_binary(ty) or pa.types.is_large_binary(ty):
330346
value = st.binary()
331347
elif pa.types.is_string(ty) or pa.types.is_large_string(ty):

0 commit comments

Comments
 (0)