Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pytest collection appears to stall/slow down/jam up when some third-party libraries are used; add function to ignore specific modules #12722

Closed
geofire opened this issue Aug 19, 2024 · 3 comments

Comments

@geofire
Copy link

geofire commented Aug 19, 2024

pytest 8.3.2
Python 3.11
Windows 10

Hi all!

I was tossing up whether this should be a bug report or a feature request, as I wasn't able to work out whether the following is expected behaviour in the documention. It took a good day of solid troubleshooting to figure this one out, which I have found a workaround, so it's not a showstopper though it was difficult and very confusing to troubleshoot.

What's the problem?

pytest will, perhaps by design?, scan through and 'collect' (some, all?) third-party libraries used in x function, when x function iself is imported into a test to run.

This gives the impression that pytest:

  • collection has stalled when collecting tests in a large project,
  • flat out doesn't work properly when only one test exists,
  • is really frustrating to use with PyCharm (and other automated-test-runner tools) when it takes upwards of 20 seconds to collect tests before each test runs.

Current behaviour (as of pytest 8.3.2)

Note: I've used the awesome pytest-richtrace library (in --verbose mode) to help me figure this issue out, as there didn't appear to be a similar function in pytest to make pytests collection activities verbose.

The library I can reliably reproduce this issue with is arcgis. https://pypi.org/project/arcgis/

arcgis is Esri's ArcGIS API for Python, allowing Python code to interact with Esri's Enterprise and Online geospatial systems without needing to write a mess of boilerplate REST API code. arcgis is a package I don't maintain, and don't need to test directly.

The same behaviour is present in both PyCharm and by manually invoking pytest with python -m via command line.

Example

# app.py
import os
from arcgis import GIS  # Connection to an server Portal instance

def connect():
    portal = GIS(url=connection_url(domain, context), username=os.getenv('USERNAME'), username=os.getenv('PASSWORD'))
    return portal

def connection_url(domain, context):
    if domain == 'arcgis.com'
        return None  # API defaults to arcgis.com if None used as parameter
    else:
        return(f"https://{domain}/{context}")
# common.py
def create_wigwam():
        # Do stuff here.
    return True
Tests
# tests/test_app.py
from app import connection_url

def test_connection_url():
    pass
# tests/test_common.py
# another random test unrelated to app.py in the same directory
from common import create_wigwam

def test_create_wigwam():
    assert True

Note that:

  • connect() is not called,
  • there isn't a test for connect(),
  • connect() isn't imported into test_app.py,
  • the connection_url() function does not call connect(), and therefore call arcgis.GIS, and
  • test_app.py essentially does nothing other than import connection_url from app.py

I have also explicitly excluded venv and site-packages as directories in pytest.ini.

The code above, as it is written right now, will see pytest traverse the arcgis package within the virtual environment that runs this code. According to pytest-richtrace, pytest doesn't appear to collect anything in that package or the venv directory. pytest seems to ignore the os package.

Using pytest-richtrace I saw the following behaviour:

hook: pytest_collection
    session: <Session  exitstatus=<ExitCode.OK: 0> testsfailed=0 testscollected=0>
...
hook: pytest_collectstart          tests/test_app.py
INFO:numexpr.utils:Note: NumExpr detected 20 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.   # This is calling into the virtual environment.
INFO:numexpr.utils:NumExpr defaulting to 8 threads. 
    # Stall here after the above line is printed to console for a period of time, in my case at least 20 seconds, no other feedback is given.
hook: pytest_itemcollected         tests/test_app.py::test_connection_url
hook: pytest_collection_modifyitems
hook: pytest_collection_finish

Simply commenting out the import line in test_app.py --

from app import connection_url

-- prevents pytest traversing into arcgis to collect. All other tests complete pretty much instantaneously.

I haven't checked whether arcgis has any tests though the collection process certainly doesn't pick any up.

Describe the solution you'd like

  • Is this expected behaviour?
  • If expected behaviour:
    • Add a command line or configuration flag that tells pytest to exclude certain libraries from collection. (excluding directories doesn't do this)
    • Make the collection function more verbose with the --verbose flag, so that it's much easier to troubleshoot collection issues:
      • Print out the directory and file being examined for collection
      • Include timings when the --durations=0 flag is called

Workaround solution

Using MagicMock in unittest.mock allows pytest to traverse into app.py without also traversing into arcgis, stopping the stalling issue without having to comment out code or refactor unnecessarily:

# test_app.py
from unittest.mock import MagicMock
sys.modules['arcgis'] = MagicMock()

# No other mocking code is needed, as this MagicMock completely substitutes arcgis when under test.

Many thanks!

@RonnyPfannschmidt
Copy link
Member

Based on the provided information it seems like importing the Library is unreasonably expensive

Please validate if importing lazyly removes the stall

@geofire
Copy link
Author

geofire commented Aug 20, 2024

Hi @RonnyPfannschmidt,

Lazy loading certainly appears to bypass the stall (with mocking code commented out in the test):

# app.py
import os
# from arcgis import GIS  # Moved from here to connect()

def connect():
    from arcgis import GIS  # Lazily load GIS from arcgis
    portal = GIS(url=connection_url(domain, context), username=os.getenv('USERNAME'), username=os.getenv('PASSWORD'))
    return portal

def connection_url(domain, context):
    if domain == 'arcgis.com'
        return None  # API defaults to arcgis.com if None used as parameter
    else:
        return(f"https://{domain}/{context}")

pytest-richtrace doesn't show pytest traversing into arcgis.

@Zac-HD
Copy link
Member

Zac-HD commented Oct 30, 2024

Unfortunately this is not something Pytest can solve; it's just that importing a module executes all the top-level code and import arcgis is unreasonably slow.

Some kind of lazy imports might help in your situation but Pytest can't solve it for you.

@Zac-HD Zac-HD closed this as completed Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants