flaky test_memoize_path_dir ? #55

yarikoptic · 2021-12-07T15:48:00Z

Fresh run https://github.com/con/fscacher/runs/4440200422?check_suite_focus=true failed on Mac

____________________________ test_memoize_path_dir _____________________________

cache = <fscacher.cache.PersistentCache object at 0x1074818e0>
tmp_path = PosixPath('/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/pytest-of-runner/pytest-0/test_memoize_path_dir0')

    def test_memoize_path_dir(cache, tmp_path):
        calls = []
    
        @cache.memoize_path
        def memoread(path, arg, kwarg=None):
            calls.append([path, arg, kwarg])
            total_size = 0
            with os.scandir(path) as entries:
                for e in entries:
                    if e.is_file():
                        total_size += e.stat().st_size
            return total_size
    
        def check_new_memoread(arg, content, expect_new=False):
            ncalls = len(calls)
            assert memoread(path, arg) == content
            assert len(calls) == ncalls + 1
            assert memoread(path, arg) == content
            assert len(calls) == ncalls + 1 + int(expect_new)
    
        fname = "foo"
        path = tmp_path / fname
    
        with pytest.raises(IOError):
            memoread(path, 0)
        # and again
        with pytest.raises(IOError):
            memoread(path, 0)
        assert len(calls) == 2
    
        path.mkdir()
        (path / "a.txt").write_text("Alpha")
        (path / "b.txt").write_text("Beta")
    
        t0 = time.time()
        try:
            # unless this computer is too slow -- there should be less than
            # cache._min_dtime between our creating the file and testing,
            # so we would force a direct read:
>           check_new_memoread(0, 9, True)

.tox/py/lib/python3.9/site-packages/fscacher/tests/test_cache.py:204: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

arg = 0, content = 9, expect_new = True

    def check_new_memoread(arg, content, expect_new=False):
        ncalls = len(calls)
        assert memoread(path, arg) == content
        assert len(calls) == ncalls + 1
        assert memoread(path, arg) == content
>       assert len(calls) == ncalls + 1 + int(expect_new)
E       AssertionError: assert 3 == ((2 + 1) + 1)
E        +  where 3 = len([['/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/pytest-of-runner/pytest-0/test_memoize_path_dir0/foo', 0, ...rivate/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/pytest-of-runner/pytest-0/test_memoize_path_dir0/foo', 0, None]])
E        +  and   1 = int(True)

.tox/py/lib/python3.9/site-packages/fscacher/tests/test_cache.py:183: AssertionError

although was all green before AFAIK.

The text was updated successfully, but these errors were encountered:

jwodder · 2021-12-07T16:37:51Z

@yarikoptic I'm not sure what the cause is, but I've created #56 so we can at least see what's going on in case it happens again.

jwodder · 2021-12-07T17:14:08Z

@yarikoptic As far as I can tell, when this line is called, the first & second memoread() both occur after _min_dtime has elapsed since modifying the path, yet the total runtime of check_new_memoread() is less than _min_dtime. This suggests that too much time is elapsing between the last filesystem write and the call to time.time().

The curious part is that this happened twice in a row, but with different test functions.

yarikoptic · 2022-03-25T20:54:46Z

#56 was merged, and awhile back @jwodder said that tests didn't fail for awhile and indeed https://github.com/con/fscacher/actions is all green. Closing

yarikoptic · 2024-08-23T18:21:20Z

now reported to fail in debian too: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1079398 . I insofar failed to reproduce locally to troubleshoot further etc.

hosiet · 2024-11-16T15:59:11Z

In Debian I added a timeout (e.g., 5s) before check_new_memoread() invocation, and the test would succeed. This is often an indication of synchronization problems.

I don't quite understand the assumption as written here:

        try:
            # unless this computer is too slow -- there should be less than
            # cache._min_dtime between our creating the file and testing,
            # so we would force a direct read:

I mean the code should never wish the computer to be fast enough to achieve atomicity. According to my understanding, something similar to #60 still applies.

yarikoptic · 2024-11-18T16:04:39Z

Thank you @hosiet for taking care about this package/issue in Debian and looking into it! I have now eyeballed the code etc, and I still have no logical (as code logic goes) explanation besides

precision of time.time() is too coarse/rounded...
test server runs ntp client or alike and we manage to hit time adjustment
edited/added: slightly different clocks used for filesystem timestamping and time().

either of which leads to our subsequent re-check, based on time.time() in except of the test https://github.com/con/fscacher/blob/master/src/fscacher/tests/test_cache.py#L210

    except AssertionError:  # pragma: no cover
        # if computer is indeed slow (happens on shared CIs) we might fail
        # because distance is too short
        if time.time() - t0 < cache._min_dtime:
            raise  # if we were quick but still failed -- legit

is True (since exception is raised overall), whenever presumably similar (but with abs though!) check in the code at https://github.com/con/fscacher/blob/master/src/fscacher/cache.py#L239

    def modified_in_window(self, min_dtime):
        if self.last_modified is None:
            return False
        else:
            return abs(time.time() - self.last_modified * 1e-9) < min_dtime

returns False . There .last_modified is based off file stat .st_mtime_ns and thus in nano seconds.

So either precision of time. isn't good enough to catch the diff or ...

NB presumably since potentially could be False due to last_modifed being None but that shouldn't happen here! (?)

unclear why I added abs there on time diff in my initial prototype in dandi/dandi-cli@7ac63bd -- nothing in code/commit comment :-/

Here is the log record from that failed debian run

    def check_new_memoread(arg, content, expect_new=False):
        ncalls = len(calls)
        assert memoread(path, arg) == content
        assert len(calls) == ncalls + 1
        assert memoread(path, arg) == content
E       AssertionError: assert 3 == ((2 + 1) + 1)
E        +  where 3 = len([[PosixPath('/tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo'), 0, None], [PosixPath('/tmp/pytest-of-hosiet/...t_memoize_path_dir0/foo'), 0, None], [PosixPath('/tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo'), 0, None]])
E        +  and   1 = int(True)

fscacher/tests/test_cache.py:184: AssertionError
------------------------------ Captured log call -------------------------------
Level 5  fscacher.cache:cache.py:140 Dereferenced PosixPath('/tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo') into '/tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo'
DEBUG    fscacher.cache:cache.py:183 Cannot fingerprint /tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo: [Errno 2] No such file or directory: '/tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo'
DEBUG    fscacher.cache:cache.py:146 Calling <function test_memoize_path_dir.<locals>.memoread at 0x7fb6aabbc680> directly since no fingerprint for '/tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo'
Level 5  fscacher.cache:cache.py:140 Dereferenced PosixPath('/tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo') into '/tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo'
DEBUG    fscacher.cache:cache.py:183 Cannot fingerprint /tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo: [Errno 2] No such file or directory: '/tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo'
DEBUG    fscacher.cache:cache.py:146 Calling <function test_memoize_path_dir.<locals>.memoread at 0x7fb6aabbc680> directly since no fingerprint for '/tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo'
Level 5  fscacher.cache:cache.py:140 Dereferenced PosixPath('/tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo') into '/tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo'
DEBUG    fscacher.cache:cache.py:157 Calling memoized version of <function test_memoize_path_dir.<locals>.memoread at 0x7fb6aabbc680> for /tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo
DEBUG    fscacher.cache:cache.py:100 Running original <function test_memoize_path_dir.<locals>.memoread at 0x7fb6aabbc680> on PosixPath('/tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo')
Level 1  fscacher.cache:cache.py:166 Returning value 9
Level 5  fscacher.cache:cache.py:140 Dereferenced PosixPath('/tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo') into '/tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo'
DEBUG    fscacher.cache:cache.py:157 Calling memoized version of <function test_memoize_path_dir.<locals>.memoread at 0x7fb6aabbc680> for /tmp/pytest-of-hosiet/pytest-0/test_memoize_path_dir0/foo
Level 1  fscacher.cache:cache.py:166 Returning value 9
=========================== short test summary info ============================
FAILED fscacher/tests/test_cache.py::test_memoize_path_dir - AssertionError: ...

which shows that the first assert memoread(path, arg) == content is fine, so files are created and all is good. That should ensure that we do get last_modifed assigned and thus it is not None. And then we see that we are calling memoized version which should happen when that time diff check at https://github.com/con/fscacher/blob/master/src/fscacher/cache.py#L153 is returning False, i.e. we did "escape" that time window, that raises initial AssertionError which must have been caught and "re-analyzed" in except where the check returned True, i.e. stating that it did pass less than ...

PS sorry - leaving unfinished, need to get to a meeting, will come back to it later

It might be able to provide explanation for a flaky test observed on debian build systems etc: con#55

yarikoptic · 2024-11-19T00:01:13Z

I treat the fact that addition of sleep(5) before running the actual check helped, as an indication that there is small but sufficient discrepancy in clocks used by time.time() and filesystem stamping for mtime etc...

I wondered if you @hosiet were lucky to more or less reliably reproduce such an issue some way? I am proposing to add more logging to get a better idea on what is going on. If you can't reproduce it either, I would prefer to upload such a version, without patches, to debian to catch the bug by its limbs:

Log when file timestamp unexpectedly in the future + more information in the failing test_memoize_path_dir #98

hosiet · 2024-11-19T15:21:51Z

I can make a test build-and-upload in Debian with the current fscacher git HEAD version, and remove my custom patch in this test build. If such build still yields the test error, I can show you the updated error message. Will that help you with the debug?

yarikoptic · 2024-11-19T15:45:49Z

Sure, thanks in advance! no need for HEAD version. Grab 0.4.3 we released from pypi https://pypi.org/project/fscacher/

yarikoptic assigned jwodder Dec 7, 2021

jwodder mentioned this issue Dec 17, 2021

Capture all logs during tests #56

Merged

This was referenced Dec 17, 2021

BF?(TST): call os.sync to flush changes so _min_dtime works out #60

Closed

BF?(TST): call os.sync to flush FS before write so hopefully _min_dtime works #61

Closed

yarikoptic closed this as completed Mar 25, 2022

yarikoptic reopened this Aug 23, 2024

yarikoptic added a commit to yarikoptic/fscacher that referenced this issue Nov 18, 2024

ENH: log (at debug) cases where file time stamp is in the future

33dd183

It might be able to provide explanation for a flaky test observed on debian build systems etc: con#55

yarikoptic added a commit to yarikoptic/fscacher that referenced this issue Nov 19, 2024

ENH: log (at debug) cases where file time stamp is in the future

b1032ad

It might be able to provide explanation for a flaky test observed on debian build systems etc: con#55

yarikoptic mentioned this issue Nov 19, 2024

Log when file timestamp unexpectedly in the future + more information in the failing test_memoize_path_dir #98

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flaky test_memoize_path_dir ? #55

flaky test_memoize_path_dir ? #55

yarikoptic commented Dec 7, 2021

jwodder commented Dec 7, 2021

jwodder commented Dec 7, 2021 •

edited

Loading

yarikoptic commented Mar 25, 2022

yarikoptic commented Aug 23, 2024

hosiet commented Nov 16, 2024

yarikoptic commented Nov 18, 2024 •

edited

Loading

yarikoptic commented Nov 19, 2024

hosiet commented Nov 19, 2024

yarikoptic commented Nov 19, 2024

flaky test_memoize_path_dir ? #55

flaky test_memoize_path_dir ? #55

Comments

yarikoptic commented Dec 7, 2021

jwodder commented Dec 7, 2021

jwodder commented Dec 7, 2021 • edited Loading

yarikoptic commented Mar 25, 2022

yarikoptic commented Aug 23, 2024

hosiet commented Nov 16, 2024

yarikoptic commented Nov 18, 2024 • edited Loading

yarikoptic commented Nov 19, 2024

hosiet commented Nov 19, 2024

yarikoptic commented Nov 19, 2024

jwodder commented Dec 7, 2021 •

edited

Loading

yarikoptic commented Nov 18, 2024 •

edited

Loading