BUG: always warn when on_bad_lines callable returns extra fields with index_col in read_csv (Python engine) (GH#61837) #62297

skalwaghe-56 · 2025-09-08T17:01:12Z

closes BUG: read_csv() on_bad_lines callable does not raise ParserWarning when index_col is set #61837 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

This PR fixes a regression in the CSV parsers when using on_bad_lines as a callable.

Thanks!

skalwaghe-56 · 2025-09-10T10:43:39Z

@jbrockmendel @rhshadrach If you could please guide me further.

skalwaghe-56 · 2025-09-12T12:21:03Z

@rhshadrach @jorisvandenbossche When I ran the test locally for the changes 1 test xpassed. Related to #10153 I think.
Its this test

@pytest.mark.parametrize("dtype", [{"b": "category"}, {1: "category"}])
def test_categorical_dtype_single(all_parsers, dtype, request):
    # see gh-10153
    parser = all_parsers
    data = """a,b,c
1,a,3.4
1,a,3.4
2,b,4.5"""
    expected = DataFrame(
        {"a": [1, 1, 2], "b": Categorical(["a", "a", "b"]), "c": [3.4, 3.4, 4.5]}
    )
    if parser.engine == "pyarrow":
        mark = pytest.mark.xfail(
            strict=False,
            reason="Flaky test sometimes gives object dtype instead of Categorical",
        )
        request.applymarker(mark)

    actual = parser.read_csv(StringIO(data), dtype=dtype)
    tm.assert_frame_equal(actual, expected)

I would like you guys to check this out and check the PR too!

Thanks!

pandas/io/parsers/python_parser.py

pandas/tests/io/parser/test_python_parser_only.py

pandas/io/parsers/python_parser.py

pandas/io/parsers/base_parser.py

pandas/io/parsers/python_parser.py

skalwaghe-56

I have fixed the tests too now. The CI should be successful now.

rhshadrach

Looking good!

pandas/io/parsers/python_parser.py

doc/source/whatsnew/v2.3.3.rst

- Always emit ParserWarning and drop extra fields when an on_bad_lines callable returns more elements than expected, regardless of index_col, in PythonParser._rows_to_cols. [GH#61837] - Ensure non-bad rows are appended in the outer else branch so good lines are preserved. - Add regression test pandas/tests/io/parser/test_python_parser_only.py::test_on_bad_lines_callable_warns_and_truncates_with_index_col covering index_col in [None, 0]. Closes pandas-dev#61837.

skalwaghe-56 force-pushed the fix-issue-61837 branch 3 times, most recently from f579800 to d77afef Compare September 10, 2025 10:43

simonjayhawkins added Bug IO CSV read_csv, to_csv labels Sep 10, 2025

skalwaghe-56 force-pushed the fix-issue-61837 branch 4 times, most recently from 7009e84 to 0729267 Compare September 12, 2025 12:14

skalwaghe-56 force-pushed the fix-issue-61837 branch 4 times, most recently from 02e9bd2 to 7f303f7 Compare September 16, 2025 16:40

rhshadrach requested changes Sep 16, 2025

View reviewed changes

skalwaghe-56 force-pushed the fix-issue-61837 branch from 7f303f7 to f6887a2 Compare September 17, 2025 17:00

rhshadrach requested changes Sep 17, 2025

View reviewed changes

pandas/io/parsers/python_parser.py Outdated Show resolved Hide resolved

skalwaghe-56 force-pushed the fix-issue-61837 branch from f6887a2 to 014e05f Compare September 18, 2025 12:15

skalwaghe-56 requested a review from rhshadrach September 18, 2025 12:15

skalwaghe-56 commented Sep 18, 2025

View reviewed changes

skalwaghe-56 force-pushed the fix-issue-61837 branch 2 times, most recently from e1f405e to 2fa7f70 Compare September 20, 2025 09:00

rhshadrach requested changes Sep 20, 2025

View reviewed changes

pandas/io/parsers/python_parser.py Outdated Show resolved Hide resolved

doc/source/whatsnew/v2.3.3.rst Outdated Show resolved Hide resolved

skalwaghe-56 added 2 commits September 23, 2025 21:04

DOC: whatsnew entry for on_bad_lines regression fix (GH#61837)

c103dcc

skalwaghe-56 force-pushed the fix-issue-61837 branch from 2fa7f70 to c103dcc Compare September 23, 2025 15:34

skalwaghe-56 requested a review from rhshadrach September 23, 2025 15:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: always warn when on_bad_lines callable returns extra fields with index_col in read_csv (Python engine) (GH#61837) #62297

BUG: always warn when on_bad_lines callable returns extra fields with index_col in read_csv (Python engine) (GH#61837) #62297

skalwaghe-56 commented Sep 8, 2025 •

edited

Loading

Uh oh!

skalwaghe-56 commented Sep 10, 2025

Uh oh!

skalwaghe-56 commented Sep 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

skalwaghe-56 left a comment

Uh oh!

rhshadrach left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BUG: always warn when on_bad_lines callable returns extra fields with index_col in read_csv (Python engine) (GH#61837) #62297

Are you sure you want to change the base?

BUG: always warn when on_bad_lines callable returns extra fields with index_col in read_csv (Python engine) (GH#61837) #62297

Conversation

skalwaghe-56 commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skalwaghe-56 commented Sep 10, 2025

Uh oh!

skalwaghe-56 commented Sep 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

skalwaghe-56 left a comment

Choose a reason for hiding this comment

Uh oh!

rhshadrach left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

skalwaghe-56 commented Sep 8, 2025 •

edited

Loading