-
-
Notifications
You must be signed in to change notification settings - Fork 19k
BUG: always warn when on_bad_lines callable returns extra fields with index_col in read_csv (Python engine) (GH#61837) #62297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
f579800
to
d77afef
Compare
@jbrockmendel @rhshadrach If you could please guide me further. |
7009e84
to
0729267
Compare
@rhshadrach @jorisvandenbossche When I ran the test locally for the changes 1 test xpassed. Related to #10153 I think. @pytest.mark.parametrize("dtype", [{"b": "category"}, {1: "category"}])
def test_categorical_dtype_single(all_parsers, dtype, request):
# see gh-10153
parser = all_parsers
data = """a,b,c
1,a,3.4
1,a,3.4
2,b,4.5"""
expected = DataFrame(
{"a": [1, 1, 2], "b": Categorical(["a", "a", "b"]), "c": [3.4, 3.4, 4.5]}
)
if parser.engine == "pyarrow":
mark = pytest.mark.xfail(
strict=False,
reason="Flaky test sometimes gives object dtype instead of Categorical",
)
request.applymarker(mark)
actual = parser.read_csv(StringIO(data), dtype=dtype)
tm.assert_frame_equal(actual, expected) I would like you guys to check this out and check the PR too! Thanks! |
02e9bd2
to
7f303f7
Compare
7f303f7
to
f6887a2
Compare
f6887a2
to
014e05f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have fixed the tests too now. The CI should be successful now.
e1f405e
to
2fa7f70
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!
- Always emit ParserWarning and drop extra fields when an on_bad_lines callable returns more elements than expected, regardless of index_col, in PythonParser._rows_to_cols. [GH#61837] - Ensure non-bad rows are appended in the outer else branch so good lines are preserved. - Add regression test pandas/tests/io/parser/test_python_parser_only.py::test_on_bad_lines_callable_warns_and_truncates_with_index_col covering index_col in [None, 0]. Closes pandas-dev#61837.
2fa7f70
to
c103dcc
Compare
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.This PR fixes a regression in the CSV parsers when using
on_bad_lines
as a callable.Thanks!