Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flaky test_memoize_path_dir ? #55

Open
yarikoptic opened this issue Dec 7, 2021 · 9 comments
Open

flaky test_memoize_path_dir ? #55

yarikoptic opened this issue Dec 7, 2021 · 9 comments
Assignees

Comments

@yarikoptic
Copy link
Member

Fresh run https://github.com/con/fscacher/runs/4440200422?check_suite_focus=true failed on Mac

____________________________ test_memoize_path_dir _____________________________

cache = <fscacher.cache.PersistentCache object at 0x1074818e0>
tmp_path = PosixPath('/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/pytest-of-runner/pytest-0/test_memoize_path_dir0')

    def test_memoize_path_dir(cache, tmp_path):
        calls = []
    
        @cache.memoize_path
        def memoread(path, arg, kwarg=None):
            calls.append([path, arg, kwarg])
            total_size = 0
            with os.scandir(path) as entries:
                for e in entries:
                    if e.is_file():
                        total_size += e.stat().st_size
            return total_size
    
        def check_new_memoread(arg, content, expect_new=False):
            ncalls = len(calls)
            assert memoread(path, arg) == content
            assert len(calls) == ncalls + 1
            assert memoread(path, arg) == content
            assert len(calls) == ncalls + 1 + int(expect_new)
    
        fname = "foo"
        path = tmp_path / fname
    
        with pytest.raises(IOError):
            memoread(path, 0)
        # and again
        with pytest.raises(IOError):
            memoread(path, 0)
        assert len(calls) == 2
    
        path.mkdir()
        (path / "a.txt").write_text("Alpha")
        (path / "b.txt").write_text("Beta")
    
        t0 = time.time()
        try:
            # unless this computer is too slow -- there should be less than
            # cache._min_dtime between our creating the file and testing,
            # so we would force a direct read:
>           check_new_memoread(0, 9, True)

.tox/py/lib/python3.9/site-packages/fscacher/tests/test_cache.py:204: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

arg = 0, content = 9, expect_new = True

    def check_new_memoread(arg, content, expect_new=False):
        ncalls = len(calls)
        assert memoread(path, arg) == content
        assert len(calls) == ncalls + 1
        assert memoread(path, arg) == content
>       assert len(calls) == ncalls + 1 + int(expect_new)
E       AssertionError: assert 3 == ((2 + 1) + 1)
E        +  where 3 = len([['/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/pytest-of-runner/pytest-0/test_memoize_path_dir0/foo', 0, ...rivate/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/pytest-of-runner/pytest-0/test_memoize_path_dir0/foo', 0, None]])
E        +  and   1 = int(True)

.tox/py/lib/python3.9/site-packages/fscacher/tests/test_cache.py:183: AssertionError

although was all green before AFAIK.

@jwodder
Copy link
Member

jwodder commented Dec 7, 2021

@yarikoptic I'm not sure what the cause is, but I've created #56 so we can at least see what's going on in case it happens again.

@jwodder
Copy link
Member

jwodder commented Dec 7, 2021

@yarikoptic As far as I can tell, when this line is called, the first & second memoread() both occur after _min_dtime has elapsed since modifying the path, yet the total runtime of check_new_memoread() is less than _min_dtime. This suggests that too much time is elapsing between the last filesystem write and the call to time.time().

The curious part is that this happened twice in a row, but with different test functions.

@yarikoptic
Copy link
Member Author

#56 was merged, and awhile back @jwodder said that tests didn't fail for awhile and indeed https://github.com/con/fscacher/actions is all green. Closing

@yarikoptic
Copy link
Member Author

now reported to fail in debian too: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1079398 . I insofar failed to reproduce locally to troubleshoot further etc.

@yarikoptic yarikoptic reopened this Aug 23, 2024
@hosiet
Copy link

hosiet commented Nov 16, 2024

In Debian I added a timeout (e.g., 5s) before check_new_memoread() invocation, and the test would succeed. This is often an indication of synchronization problems.

I don't quite understand the assumption as written here:

        try:
            # unless this computer is too slow -- there should be less than
            # cache._min_dtime between our creating the file and testing,
            # so we would force a direct read:

I mean the code should never wish the computer to be fast enough to achieve atomicity. According to my understanding, something similar to #60 still applies.

@yarikoptic
Copy link
Member Author

yarikoptic commented Nov 18, 2024

Thank you @hosiet for taking care about this package/issue in Debian and looking into it! I have now eyeballed the code etc, and I still have no logical (as code logic goes) explanation besides

  • precision of time.time() is too coarse/rounded...
  • test server runs ntp client or alike and we manage to hit time adjustment
  • edited/added: slightly different clocks used for filesystem timestamping and time().

either of which leads to our subsequent re-check, based on time.time() in except of the test https://github.com/con/fscacher/blob/master/src/fscacher/tests/test_cache.py#L210

    except AssertionError:  # pragma: no cover
        # if computer is indeed slow (happens on shared CIs) we might fail
        # because distance is too short
        if time.time() - t0 < cache._min_dtime:
            raise  # if we were quick but still failed -- legit

is True (since exception is raised overall), whenever presumably similar (but with abs though!) check in the code at https://github.com/con/fscacher/blob/master/src/fscacher/cache.py#L239

    def modified_in_window(self, min_dtime):
        if self.last_modified is None:
            return False
        else:
            return abs(time.time() - self.last_modified * 1e-9) < min_dtime

returns False . There .last_modified is based off file stat .st_mtime_ns and thus in nano seconds.

So either precision of time. isn't good enough to catch the diff or ...

NB presumably since potentially could be False due to last_modifed being None but that shouldn't happen here! (?)

  • unclear why I added abs there on time diff in my initial prototype in dandi/dandi-cli@7ac63bd -- nothing in code/commit comment :-/

Here is the log record from that failed debian run

    def check_new_memoread(arg, content, expect_new=False):
        ncalls = len(calls)
        assert memoread(path, arg) == content
        assert len(calls) == ncalls + 1
        assert memoread(path, arg) == content
E       AssertionError: assert 3 == ((2 + 1) + 1)
E        +  where 3 = len([[PosixPath('/tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo'), 0, None], [PosixPath('/tmp/pytest-of-hosiet/...t_memoize_path_dir0/foo'), 0, None], [PosixPath('/tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo'), 0, None]])
E        +  and   1 = int(True)

fscacher/tests/test_cache.py:184: AssertionError
------------------------------ Captured log call -------------------------------
Level 5  fscacher.cache:cache.py:140 Dereferenced PosixPath('/tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo') into '/tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo'
DEBUG    fscacher.cache:cache.py:183 Cannot fingerprint /tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo: [Errno 2] No such file or directory: '/tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo'
DEBUG    fscacher.cache:cache.py:146 Calling <function test_memoize_path_dir.<locals>.memoread at 0x7fb6aabbc680> directly since no fingerprint for '/tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo'
Level 5  fscacher.cache:cache.py:140 Dereferenced PosixPath('/tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo') into '/tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo'
DEBUG    fscacher.cache:cache.py:183 Cannot fingerprint /tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo: [Errno 2] No such file or directory: '/tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo'
DEBUG    fscacher.cache:cache.py:146 Calling <function test_memoize_path_dir.<locals>.memoread at 0x7fb6aabbc680> directly since no fingerprint for '/tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo'
Level 5  fscacher.cache:cache.py:140 Dereferenced PosixPath('/tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo') into '/tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo'
DEBUG    fscacher.cache:cache.py:157 Calling memoized version of <function test_memoize_path_dir.<locals>.memoread at 0x7fb6aabbc680> for /tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo
DEBUG    fscacher.cache:cache.py:100 Running original <function test_memoize_path_dir.<locals>.memoread at 0x7fb6aabbc680> on PosixPath('/tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo')
Level 1  fscacher.cache:cache.py:166 Returning value 9
Level 5  fscacher.cache:cache.py:140 Dereferenced PosixPath('/tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo') into '/tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo'
DEBUG    fscacher.cache:cache.py:157 Calling memoized version of <function test_memoize_path_dir.<locals>.memoread at 0x7fb6aabbc680> for /tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo
Level 1  fscacher.cache:cache.py:166 Returning value 9
=========================== short test summary info ============================
FAILED fscacher/tests/test_cache.py::test_memoize_path_dir - AssertionError: ...

which shows that the first assert memoread(path, arg) == content is fine, so files are created and all is good. That should ensure that we do get last_modifed assigned and thus it is not None. And then we see that we are calling memoized version which should happen when that time diff check at https://github.com/con/fscacher/blob/master/src/fscacher/cache.py#L153 is returning False, i.e. we did "escape" that time window, that raises initial AssertionError which must have been caught and "re-analyzed" in except where the check returned True, i.e. stating that it did pass less than ...

PS sorry - leaving unfinished, need to get to a meeting, will come back to it later

yarikoptic added a commit to yarikoptic/fscacher that referenced this issue Nov 18, 2024
It might be able to provide explanation for a flaky test observed on debian
build systems etc: con#55
yarikoptic added a commit to yarikoptic/fscacher that referenced this issue Nov 19, 2024
It might be able to provide explanation for a flaky test observed on debian
build systems etc: con#55
@yarikoptic
Copy link
Member Author

I treat the fact that addition of sleep(5) before running the actual check helped, as an indication that there is small but sufficient discrepancy in clocks used by time.time() and filesystem stamping for mtime etc...

I wondered if you @hosiet were lucky to more or less reliably reproduce such an issue some way? I am proposing to add more logging to get a better idea on what is going on. If you can't reproduce it either, I would prefer to upload such a version, without patches, to debian to catch the bug by its limbs:

@hosiet
Copy link

hosiet commented Nov 19, 2024

I can make a test build-and-upload in Debian with the current fscacher git HEAD version, and remove my custom patch in this test build. If such build still yields the test error, I can show you the updated error message. Will that help you with the debug?

@yarikoptic
Copy link
Member Author

Sure, thanks in advance! no need for HEAD version. Grab 0.4.3 we released from pypi https://pypi.org/project/fscacher/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants