-
Notifications
You must be signed in to change notification settings - Fork 200
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
If I try to parse a datetime column that contains some specific datetimes, like "2020-01-02 20:20:39", the format is not inferred correctly, so ToDatetime raises a RejectColumn exception.
To be more clear, the datetimes that fail have the same digits in the year and the hour/minute, so 2020 and 20:20.
I was able to trace this down to _guess_datetime_format in ToDatetime, which then calls
from pandas._libs.tslibs.parsing import (
guess_datetime_format as pd_guess_datetime_format,
)pd_guess_datetime_format is also failing to parse those specific datetimes. I am not sure if that is a problem with the pandas code, or it's even further upstream in dateutil.
Steps/Code to Reproduce
from skrub import ToDatetime
import pandas as pd
df = pd.Series(["1959-07-01 19:59:16", "2018-07-01 20:19:16"])
transformer = ToDatetime()
dt_series = transformer.fit_transform(df)
print(dt_series)Expected Results
The series is converted to datetime.
Actual Results
RejectColumn Traceback (most recent call last)
Cell In[21], [line 7](vscode-notebook-cell:?execution_count=21&line=7)
5 df = pd.Series(["1959-07-01 19:59:16", "2018-07-01 20:19:16"])
6 transformer = ToDatetime()
----> [7](vscode-notebook-cell:?execution_count=21&line=7) dt_series = transformer.fit_transform(df)
8 print(dt_series)
File ~/work/skrub/skrub/_apply_to_cols.py:175, in _wrap_add_check_single_column.<locals>.fit_transform(self, X, y, **kwargs)
172 @functools.wraps(f)
173 def fit_transform(self, X, y=None, **kwargs):
174 self._check_single_column(X, f.__name__)
--> [175](https://file+.vscode-resource.vscode-cdn.net/Users/rcap/work/skrub/~/work/skrub/skrub/_apply_to_cols.py:175) return f(self, X, y=y, **kwargs)
File ~/work/skrub/skrub/_to_datetime.py:395, in ToDatetime.fit_transform(***failed resolving arguments***)
393 datetime_format = self._get_datetime_format(column)
394 if datetime_format is None:
--> [395](https://file+.vscode-resource.vscode-cdn.net/Users/rcap/work/skrub/~/work/skrub/skrub/_to_datetime.py:395) raise RejectColumn(
396 f"Could not find a datetime format for column {sbd.name(column)!r}."
397 )
399 self.format_ = datetime_format
400 try:
RejectColumn: Could not find a datetime format for column None.Versions
System:
python: 3.11.14 | packaged by conda-forge | (main, Oct 22 2025, 22:56:31) [Clang 19.1.7 ]
executable: /Users/rcap/work/skrub/.pixi/envs/dev/bin/python
machine: macOS-15.7.2-arm64-arm-64bit
Python dependencies:
sklearn: 1.8.0
pip: None
setuptools: 80.9.0
numpy: 2.3.5
scipy: 1.16.3
Cython: None
pandas: 2.3.3
matplotlib: 3.10.8
joblib: 1.5.2
threadpoolctl: 3.6.0
Built with OpenMP: True
threadpoolctl info:
user_api: blas
internal_api: openblas
num_threads: 10
prefix: libopenblas
filepath: /Users/rcap/work/skrub/.pixi/envs/dev/lib/libopenblas.0.dylib
version: 0.3.30
threading_layer: openmp
architecture: VORTEX
user_api: openmp
internal_api: openmp
num_threads: 10
prefix: libomp
filepath: /Users/rcap/work/skrub/.pixi/envs/dev/lib/libomp.dylib
version: None
0.7.1Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working