Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: Make test_sql.py parallelizable #60378

Open
1 of 3 tasks
WillAyd opened this issue Nov 20, 2024 · 3 comments
Open
1 of 3 tasks

TST: Make test_sql.py parallelizable #60378

WillAyd opened this issue Nov 20, 2024 · 3 comments
Assignees
Labels
good first issue IO SQL to_sql, read_sql, read_sql_query Testing pandas testing functions or related to the test suite

Comments

@WillAyd
Copy link
Member

WillAyd commented Nov 20, 2024

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

test_sql.py must be run on a single thread now, because tests re-use the same table names. This can cause a race condition when different parametrizations of a test run on different threads

Feature Description

Add a uuid or something else to the table names in the test_sql.py module to disambiguate

Alternative Solutions

status quo

Additional Context

No response

@WillAyd WillAyd added Testing pandas testing functions or related to the test suite IO SQL to_sql, read_sql, read_sql_query good first issue labels Nov 20, 2024
@UmbertoFasci
Copy link
Contributor

take

@WillAyd WillAyd changed the title TST: Make test_sql.py serializable TST: Make test_sql.py parallelizable Nov 20, 2024
@UmbertoFasci
Copy link
Contributor

@WillAyd I am about halfway through the tests. I am generating a unique table uuid when indicated while maintaining the original context through the prefix.

Before:

@pytest.mark.parametrize("conn", all_connectable)
def test_read_table_columns(conn, request, test_frame1):
    # test columns argument in read_table
    conn_name = conn
    if conn_name == "sqlite_buildin":
        request.applymarker(pytest.mark.xfail(reason="Not Implemented"))

    conn = request.getfixturevalue(conn)
    sql.to_sql(test_frame1, "test_frame", conn)

    cols = ["A", "B"]

    result = sql.read_sql_table("test_frame", conn, columns=cols)
    assert result.columns.tolist() == cols

After made parallelizable:

@pytest.mark.parametrize("conn", all_connectable)
def test_read_table_columns(conn, request, test_frame1):
    # test columns argument in read_table
    conn_name = conn
    if conn_name == "sqlite_buildin":
        request.applymarker(pytest.mark.xfail(reason="Not Implemented"))

    conn = request.getfixturevalue(conn)
    table_uuid = f"test_frame_{uuid.uuid4().hex}"
    sql.to_sql(test_frame1, table_uuid, conn)

    cols = ["A", "B"]

    result = sql.read_sql_table(table_uuid, conn, columns=cols)
    assert result.columns.tolist() == cols

Let me know if you would like this done in a different fashion.

@WillAyd
Copy link
Member Author

WillAyd commented Nov 21, 2024

Seems reasonable. Probably worth a helper function in the module to not have to repeat the same code in each function, but what you have looks like its headed in the right direction

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue IO SQL to_sql, read_sql, read_sql_query Testing pandas testing functions or related to the test suite
Projects
None yet
Development

No branches or pull requests

2 participants