Skip to content

BUG - ToDatetime fails to parse some very specific datetimes #1835

@rcap107

Description

@rcap107

Describe the bug

If I try to parse a datetime column that contains some specific datetimes, like "2020-01-02 20:20:39", the format is not inferred correctly, so ToDatetime raises a RejectColumn exception.

To be more clear, the datetimes that fail have the same digits in the year and the hour/minute, so 2020 and 20:20.

I was able to trace this down to _guess_datetime_format in ToDatetime, which then calls

from pandas._libs.tslibs.parsing import (
    guess_datetime_format as pd_guess_datetime_format,
)

pd_guess_datetime_format is also failing to parse those specific datetimes. I am not sure if that is a problem with the pandas code, or it's even further upstream in dateutil.

Steps/Code to Reproduce

from skrub import ToDatetime
import pandas as pd

df = pd.Series(["1959-07-01 19:59:16", "2018-07-01 20:19:16"])
transformer = ToDatetime()
dt_series = transformer.fit_transform(df)
print(dt_series)

Expected Results

The series is converted to datetime.

Actual Results

RejectColumn                              Traceback (most recent call last)
Cell In[21], [line 7](vscode-notebook-cell:?execution_count=21&line=7)
      5 df = pd.Series(["1959-07-01 19:59:16", "2018-07-01 20:19:16"])
      6 transformer = ToDatetime()
----> [7](vscode-notebook-cell:?execution_count=21&line=7) dt_series = transformer.fit_transform(df)
      8 print(dt_series)

File ~/work/skrub/skrub/_apply_to_cols.py:175, in _wrap_add_check_single_column.<locals>.fit_transform(self, X, y, **kwargs)
    172 @functools.wraps(f)
    173 def fit_transform(self, X, y=None, **kwargs):
    174     self._check_single_column(X, f.__name__)
--> [175](https://file+.vscode-resource.vscode-cdn.net/Users/rcap/work/skrub/~/work/skrub/skrub/_apply_to_cols.py:175)     return f(self, X, y=y, **kwargs)

File ~/work/skrub/skrub/_to_datetime.py:395, in ToDatetime.fit_transform(***failed resolving arguments***)
    393 datetime_format = self._get_datetime_format(column)
    394 if datetime_format is None:
--> [395](https://file+.vscode-resource.vscode-cdn.net/Users/rcap/work/skrub/~/work/skrub/skrub/_to_datetime.py:395)     raise RejectColumn(
    396         f"Could not find a datetime format for column {sbd.name(column)!r}."
    397     )
    399 self.format_ = datetime_format
    400 try:

RejectColumn: Could not find a datetime format for column None.

Versions

System:
    python: 3.11.14 | packaged by conda-forge | (main, Oct 22 2025, 22:56:31) [Clang 19.1.7 ]
executable: /Users/rcap/work/skrub/.pixi/envs/dev/bin/python
   machine: macOS-15.7.2-arm64-arm-64bit

Python dependencies:
      sklearn: 1.8.0
          pip: None
   setuptools: 80.9.0
        numpy: 2.3.5
        scipy: 1.16.3
       Cython: None
       pandas: 2.3.3
   matplotlib: 3.10.8
       joblib: 1.5.2
threadpoolctl: 3.6.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 10
         prefix: libopenblas
       filepath: /Users/rcap/work/skrub/.pixi/envs/dev/lib/libopenblas.0.dylib
        version: 0.3.30
threading_layer: openmp
   architecture: VORTEX

       user_api: openmp
   internal_api: openmp
    num_threads: 10
         prefix: libomp
       filepath: /Users/rcap/work/skrub/.pixi/envs/dev/lib/libomp.dylib
        version: None
0.7.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions