Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mtime cache misbehaving on nested backups #1860

Closed
gleb opened this issue Nov 19, 2016 · 26 comments
Closed

mtime cache misbehaving on nested backups #1860

gleb opened this issue Nov 19, 2016 · 26 comments
Assignees
Labels
Milestone

Comments

@gleb
Copy link

gleb commented Nov 19, 2016

I am seeing borg mtime cache consistently become ineffective during my backups. I am suspecting it has something to do with how these are structured. This is just a guess, but I am hoping somebody will recognize the pattern.

Given file hierarchy:

/foo
/foo/a
/foo/b/

I am backing it up as 3 archives:

  • /foo/a
  • /foo/b
  • then /foo <- here borg cache becomes ineffective and it rescans all the files

I am running with export BORG_FILES_CACHE_TTL=10000
Borg Version: 1.0.7-0ubuntu1.16.04.1

This seems related to #1430 but the fix for that was already included in 1.0.7

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Nov 19, 2016

Hi @gleb,

some questions:

"I am seeing borg mtime cache consistently become ineffective during my backups." - how do you see that?

Does the archive of /foo include /foo/a/* and /foo/b/* also?

Also, please give the full command sequence you use.

@gleb
Copy link
Author

gleb commented Nov 19, 2016

I am using the very helpful --filter=AME I believe this shows the files that weren't reported as not-changed by the cache. They show up as A. These correspond to the extra time/coil whine/HD light of sequential IO. But the data is deduplicated, and is not backed up again.

# excludes that apply to all archives
BORG_EXCLUDE_OPTS="--exclude-caches --keep-tag-files --exclude */.bundle/gems"

BORG="borg create --compression=lz4 --show-rc --stats --list --filter=AME -v ${BORG_EXCLUDE_OPTS}"

Yes, all the backups are recursive. Actually a more complete description of directory structure would be something like this

  • /foo/{1,2,3,4,5,6,7,8}/ <- 8 subdirectories are backed up as separate archives
  • /foo/{more directories and some files} <- these are not backed up independently
  • /foo <- this backs up foo including all subdirectories and shows the problem
  • /bar, etc <- more subdirectores are backed up
  • / <- everything is backed up including everything above

@ThomasWaldmann
Copy link
Member

Please check with borg list if the full archived pathes of the files you suspect being chunked/hashed again matches completely.

The cache key is H(path) with H being a hash function, so we only get a hit if the path is exactly the same.

@ThomasWaldmann
Copy link
Member

Maybe related to #909 and maybe fixed by #1842.

@gleb
Copy link
Author

gleb commented Nov 19, 2016

Yes, paths are the same. In fact for given file the text that borg list outputs in /foo and /foo/1 archives is identical.

@gleb
Copy link
Author

gleb commented Nov 19, 2016

In hash function above is path a relative path passed in or true/canonicalized file path? It doesn't affect this issue, but is good to know.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Nov 19, 2016

No, i just checked that: the hash function gets the full, absolute path of the file.

key = id_hash(os.path.join(self.cwd, path)

The path join is a bit strange, but it creates the full absolute path of the file, even if you invoke borg create repo::archive relpath.

@ThomasWaldmann
Copy link
Member

I modified borg/archive.py so one sees the pathes used for the path_hash (ph: ...):

        if not is_special_file:
            print("ph:", os.path.join(self.cwd, path))  # for debugging
            path_hash = self.key.id_hash(os.path.join(self.cwd, path).encode('utf-8', 'surrogateescape'))
            ids = cache.file_known_and_unchanged(path_hash, st, ignore_inode)

@ThomasWaldmann
Copy link
Member

Besides path not matching, it could be also that borg thinks the file is modified.
This is the case if mtime, inode number or size changed.
What filesystems do you back up?
Is there something "touching" your files in between (modifying mtime)?

@gleb
Copy link
Author

gleb commented Nov 19, 2016

No, nothing is touching the files. I mean /foo/{1-8} show no changed files, and immediately archiving /foo afterwards does. ext4, very vanilla, single file system.

It's as if using the cache in /foo/{1-8} invalidates the cache.

Is there a python oneliner I can use to dump the cache matching a particular path?

Also, I verified that BORG_FILES_CACHE_TTL=10000 makes it into /proc/(pid)/environ

@gleb
Copy link
Author

gleb commented Nov 19, 2016

To my great surprise I was able to reproduce this problem on a standalone test case on the first try:

set -x
export BORG_FILES_CACHE_TTL=10000 # infinite effectively
BORG_EXCLUDE_OPTS="--exclude-caches --keep-tag-files --exclude */.bundle/gems"
BORG="borg create --compression=lz4 --show-rc --stats --list --filter=AME -v ${BORG_EXCLUDE_OPTS}"

rm -r test-repo
rm -r backup-test
export BORG_PASSPHRASE='test'
export BORG_REPO="test-repo"
borg init -e keyfile test-repo
mkdir backup-test

mkdir backup-test/{1,2,3,4,5,6,7,8,9,10,11,12}
touch backup-test/{1,2,3,4,5,6,7,8,9,10,11,12}/file.txt
touch backup-test/{1,2,3,4,5,6,7,8,9,10,11,12}/file2.txt
touch backup-test/another_file.txt

# run this block more than once
$BORG ::1_{now} backup-test/1 2>&1 | egrep '^A '
$BORG ::1_{now} backup-test/2 2>&1 | egrep '^A '
$BORG ::1_{now} backup-test/3 2>&1 | egrep '^A '
$BORG ::1_{now} backup-test/4 2>&1 | egrep '^A '
$BORG ::all_{now} backup-test/ 2>&1 | egrep '^A '

@gleb
Copy link
Author

gleb commented Nov 19, 2016

Even weirder, this is timing related. By sticking sleep in different places in this script I am getting different output.

Here's a simplified script:

set -x
export BORG_FILES_CACHE_TTL=10000 # infinite effectively
BORG="borg create --list --filter=AME -v"

rm -r test-repo
rm -r backup-test
export BORG_PASSPHRASE='test'
export BORG_REPO="test-repo"
borg init -e keyfile test-repo
mkdir backup-test

# populate data
mkdir backup-test/{1,2,3,4}
touch backup-test/{1,2,3,4}/file.txt
touch backup-test/{1,2,3,4}/file2.txt
touch backup-test/another_file.txt

echo "first run"
$BORG ::1_{now} backup-test/1 2>&1 | egrep '^A '
$BORG ::1_{now} backup-test/2 2>&1 | egrep '^A '
$BORG ::all_{now} backup-test 2>&1 | egrep '^A '

sleep 5

echo "second run"
$BORG ::1_{now} backup-test/1 2>&1 | egrep '^A '
$BORG ::1_{now} backup-test/2 2>&1 | egrep '^A '
$BORG ::all_{now} backup-test 2>&1 | egrep '^A '

@ThomasWaldmann
Copy link
Member

ok, I'll try to reproduce / debug this.

@ThomasWaldmann ThomasWaldmann self-assigned this Nov 19, 2016
@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Nov 19, 2016

Slightly cleaned up and extended script:

#!/bin/bash
export BORG_REPO="test-repo"
BORG="borg create --list -v"

borg delete ::
borg init -e none ::

# populate data
rm -r backup-test
mkdir backup-test
mkdir backup-test/{1,2,3,4}
touch backup-test/{1,2,3,4}/file.txt
touch backup-test/{1,2,3,4}/file2.txt
touch backup-test/another_file.txt

# this is to avoid the "latest mtime" issue, see FAQ.
# "A" status for latest-mtime.txt is EXPECTED!
sleep 2
touch backup-test/{1,2,3,4}/latest-mtime.txt

echo "first run"
$BORG ::1_{now} backup-test/1 2>&1
$BORG ::1_{now} backup-test/2 2>&1
$BORG ::all_{now} backup-test 2>&1

sleep 5

echo "second run"
$BORG ::1_{now} backup-test/1 2>&1
$BORG ::1_{now} backup-test/2 2>&1
$BORG ::all_{now} backup-test 2>&1

@ThomasWaldmann
Copy link
Member

Output:

++ export BORG_REPO=test-repo
++ BORG_REPO=test-repo
++ BORG='borg create --list -v'
++ borg delete ::
You requested to completely DELETE the repository *including* all archives it contains:
1_2016-11-19 20:43:13.982546         Sat, 2016-11-19 20:43:13
1_2016-11-19 20:43:14.256657         Sat, 2016-11-19 20:43:14
all_2016-11-19 20:43:14.532489       Sat, 2016-11-19 20:43:14
1_2016-11-19 20:43:19.810054         Sat, 2016-11-19 20:43:19
1_2016-11-19 20:43:20.082959         Sat, 2016-11-19 20:43:20
all_2016-11-19 20:43:20.414305       Sat, 2016-11-19 20:43:20
Type 'YES' if you understand this and want to continue: YES
++ borg init -e none ::
++ rm -r backup-test
++ mkdir backup-test
++ mkdir backup-test/1 backup-test/2 backup-test/3 backup-test/4
++ touch backup-test/1/file.txt backup-test/2/file.txt backup-test/3/file.txt backup-test/4/file.txt
++ touch backup-test/1/file2.txt backup-test/2/file2.txt backup-test/3/file2.txt backup-test/4/file2.txt
++ touch backup-test/another_file.txt
++ sleep 2
++ touch backup-test/1/latest-mtime.txt backup-test/2/latest-mtime.txt backup-test/3/latest-mtime.txt backup-test/4/latest-mtime.txt

++ echo 'first run'
first run
++ borg create --list -v '::1_{now}' backup-test/1
A backup-test/1/file.txt
A backup-test/1/file2.txt
A backup-test/1/latest-mtime.txt
d backup-test/1
++ borg create --list -v '::1_{now}' backup-test/2
A backup-test/2/file.txt
A backup-test/2/file2.txt
A backup-test/2/latest-mtime.txt
d backup-test/2
++ borg create --list -v '::all_{now}' backup-test
U backup-test/1/file.txt
U backup-test/1/file2.txt
A backup-test/1/latest-mtime.txt
d backup-test/1
U backup-test/2/file.txt
U backup-test/2/file2.txt
A backup-test/2/latest-mtime.txt
d backup-test/2
A backup-test/3/file.txt
A backup-test/3/file2.txt
A backup-test/3/latest-mtime.txt
d backup-test/3
A backup-test/4/file.txt
A backup-test/4/file2.txt
A backup-test/4/latest-mtime.txt
d backup-test/4
A backup-test/another_file.txt
d backup-test
++ sleep 5

++ echo 'second run'
second run
++ borg create --list -v '::1_{now}' backup-test/1
U backup-test/1/file.txt
U backup-test/1/file2.txt
A backup-test/1/latest-mtime.txt
d backup-test/1
++ borg create --list -v '::1_{now}' backup-test/2
U backup-test/2/file.txt
U backup-test/2/file2.txt
A backup-test/2/latest-mtime.txt
d backup-test/2
++ borg create --list -v '::all_{now}' backup-test
U backup-test/1/file.txt
U backup-test/1/file2.txt
A backup-test/1/latest-mtime.txt
d backup-test/1
U backup-test/2/file.txt
U backup-test/2/file2.txt
A backup-test/2/latest-mtime.txt
d backup-test/2
U backup-test/3/file.txt
U backup-test/3/file2.txt
A backup-test/3/latest-mtime.txt
d backup-test/3
U backup-test/4/file.txt
U backup-test/4/file2.txt
A backup-test/4/latest-mtime.txt
d backup-test/4
U backup-test/another_file.txt
d backup-test

As you see, only latest-mtime.txt has A status and that is expected, see FAQ.

So the only remaining question is if your original problem was also just with the "latest mtime" files?

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Nov 19, 2016

I'ld appreciate if you base your script changes on my modified script. I had a reason why I removed all the grep and filter and crypto etc.

And I don't want to repeat same work again.

Also, if some sleeps do not matter, remove them. One sleep definitely matters, it is the one before touching latest-mtime.txt.

@ThomasWaldmann
Copy link
Member

(also remove all posts above that are superceded by your next update)

@gleb
Copy link
Author

gleb commented Nov 19, 2016

I modified your script until it showed the problem. The key seems to be adding touch backup-test/latest-mtime.txt (which also happens to model my production environment better). I also added set -x and removed repo with rm -r test-repo to simplify running the script - I don't believe these changes to be material.

I am happy to redo this if necessary based on your feedback.

modified   borg_cache_bug
@@ -1,2 +1,3 @@
 #!/bin/bash
+set -x
 export BORG_REPO="test-repo"
@@ -4,3 +5,3 @@ BORG="borg create --list -v"

-borg delete ::
+rm -r test-repo
 borg init -e none ::
@@ -19,2 +20,5 @@ sleep 2
 touch backup-test/{1,2,3,4}/latest-mtime.txt
+sleep 2
+touch backup-test/latest-mtime.txt  # key to changing behavior
+sleep 2  # optional doesn't change behavior

@@ -25,3 +29,3 @@ $BORG ::all_{now} backup-test 2>&1

-sleep 5
+sleep 5 # optional, doesn't change behavior

Script:

#!/bin/bash
set -x
export BORG_REPO="test-repo"
BORG="borg create --list -v"

rm -r test-repo
borg init -e none ::

# populate data
rm -r backup-test
mkdir backup-test
mkdir backup-test/{1,2,3,4}
touch backup-test/{1,2,3,4}/file.txt
touch backup-test/{1,2,3,4}/file2.txt
touch backup-test/another_file.txt

# this is to avoid the "latest mtime" issue, see FAQ.
# "A" status for latest-mtime.txt is EXPECTED!
sleep 2
touch backup-test/{1,2,3,4}/latest-mtime.txt
sleep 2
touch backup-test/latest-mtime.txt  # key to changing behavior
sleep 2  # optional doesn't change behavior

echo "first run"
$BORG ::1_{now} backup-test/1 2>&1
$BORG ::1_{now} backup-test/2 2>&1
$BORG ::all_{now} backup-test 2>&1

sleep 5 # optional, doesn't change behavior

echo "second run"
$BORG ::1_{now} backup-test/1 2>&1
$BORG ::1_{now} backup-test/2 2>&1
$BORG ::all_{now} backup-test 2>&1

Output:

+ export BORG_REPO=test-repo
+ BORG_REPO=test-repo
+ BORG='borg create --list -v'
+ rm -r test-repo
+ borg init -e none ::
+ rm -r backup-test
+ mkdir backup-test
+ mkdir backup-test/1 backup-test/2 backup-test/3 backup-test/4
+ touch backup-test/1/file.txt backup-test/2/file.txt backup-test/3/file.txt backup-test/4/file.txt
+ touch backup-test/1/file2.txt backup-test/2/file2.txt backup-test/3/file2.txt backup-test/4/file2.txt
+ touch backup-test/another_file.txt
+ sleep 2
+ touch backup-test/1/latest-mtime.txt backup-test/2/latest-mtime.txt backup-test/3/latest-mtime.txt backup-test/4/latest-mtime.txt
+ sleep 2
+ touch backup-test/latest-mtime.txt
+ sleep 2
+ echo 'first run'
first run
+ borg create --list -v '::1_{now}' backup-test/1
A backup-test/1/file.txt
A backup-test/1/file2.txt
A backup-test/1/latest-mtime.txt
d backup-test/1
+ borg create --list -v '::1_{now}' backup-test/2
A backup-test/2/file.txt
A backup-test/2/file2.txt
A backup-test/2/latest-mtime.txt
d backup-test/2
+ borg create --list -v '::all_{now}' backup-test
U backup-test/1/file.txt
U backup-test/1/file2.txt
A backup-test/1/latest-mtime.txt
d backup-test/1
U backup-test/2/file.txt
U backup-test/2/file2.txt
A backup-test/2/latest-mtime.txt
d backup-test/2
A backup-test/3/file.txt
A backup-test/3/file2.txt
A backup-test/3/latest-mtime.txt
d backup-test/3
A backup-test/4/file.txt
A backup-test/4/file2.txt
A backup-test/4/latest-mtime.txt
d backup-test/4
A backup-test/another_file.txt
A backup-test/latest-mtime.txt
d backup-test
+ sleep 5
+ echo 'second run'
second run
+ borg create --list -v '::1_{now}' backup-test/1
U backup-test/1/file.txt
U backup-test/1/file2.txt
U backup-test/1/latest-mtime.txt
d backup-test/1
+ borg create --list -v '::1_{now}' backup-test/2
U backup-test/2/file.txt
U backup-test/2/file2.txt
U backup-test/2/latest-mtime.txt
d backup-test/2
+ borg create --list -v '::all_{now}' backup-test
A backup-test/1/file.txt
A backup-test/1/file2.txt
A backup-test/1/latest-mtime.txt
d backup-test/1
A backup-test/2/file.txt
A backup-test/2/file2.txt
A backup-test/2/latest-mtime.txt
d backup-test/2
U backup-test/3/file.txt
U backup-test/3/file2.txt
U backup-test/3/latest-mtime.txt
d backup-test/3
U backup-test/4/file.txt
U backup-test/4/file2.txt
U backup-test/4/latest-mtime.txt
d backup-test/4
U backup-test/another_file.txt
A backup-test/latest-mtime.txt
d backup-test

@gleb
Copy link
Author

gleb commented Nov 19, 2016

Also, I verified that timestamps do not change during this script by running stat backup-test/1/file.txt as so:

modified   borg_cache_bug
@@ -24,2 +24,4 @@ sleep 2  # optional doesn't change behavior

+stat backup-test/1/file.txt
+
 echo "first run"
@@ -32,4 +34,8 @@ sleep 5 # optional, doesn't change behavior
 echo "second run"
+stat backup-test/1/file.txt
 $BORG ::1_{now} backup-test/1 2>&1
+stat backup-test/1/file.txt
 $BORG ::1_{now} backup-test/2 2>&1
+stat backup-test/1/file.txt
 $BORG ::all_{now} backup-test 2>&1
+stat backup-test/1/file.txt

@ThomasWaldmann
Copy link
Member

the "A" for the 2 latest-mtime files is easy to explain: their cache entry was killed by the first 2 borg commands in the second run.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Nov 19, 2016

Guess I've found the issue: the "latest mtime seen" is not updated for unchanged files, only for added/modified.

So, if it is never updated, it is at 0 (its initial value) and that kills all generation 0 cache entries.

@enkore enkore added the bug label Nov 19, 2016
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Nov 19, 2016
bug:
if no files were added/modified, _newest_mtime stayed at its initial 0 value.
when saving the files cache, this killed all age 0 entries.

Now using None as initial value, so we can spot that circumstance.

The 2 ** 63 - 1 value is just so it is MAX_INT on a 64bit platform, for better performance.
It can be easily increased when y2262 is coming.
@ThomasWaldmann ThomasWaldmann added this to the 1.0.9rc1 milestone Nov 19, 2016
@ThomasWaldmann
Copy link
Member

See PR #1864.

@ThomasWaldmann
Copy link
Member

Script output:

++ export BORG_REPO=test-repo
++ BORG_REPO=test-repo
++ BORG='borg create --list -v'
++ rm -r test-repo
++ borg init -e none ::
++ rm -r backup-test
++ mkdir backup-test
++ mkdir backup-test/1 backup-test/2 backup-test/3 backup-test/4
++ touch backup-test/1/file.txt backup-test/2/file.txt backup-test/3/file.txt backup-test/4/file.txt
++ touch backup-test/1/file2.txt backup-test/2/file2.txt backup-test/3/file2.txt backup-test/4/file2.txt
++ touch backup-test/another_file.txt
++ sleep 2
++ touch backup-test/1/latest-mtime.txt backup-test/2/latest-mtime.txt backup-test/3/latest-mtime.txt backup-test/4/latest-mtime.txt
++ sleep 2
++ touch backup-test/latest-mtime.txt
++ sleep 2
++ echo 'first run'
first run
++ borg create --list -v '::1_{now}' backup-test/1
A backup-test/1/file.txt
A backup-test/1/file2.txt
A backup-test/1/latest-mtime.txt
d backup-test/1
++ borg create --list -v '::1_{now}' backup-test/2
A backup-test/2/file.txt
A backup-test/2/file2.txt
A backup-test/2/latest-mtime.txt
d backup-test/2
++ borg create --list -v '::all_{now}' backup-test
U backup-test/1/file.txt
U backup-test/1/file2.txt
A backup-test/1/latest-mtime.txt
d backup-test/1
U backup-test/2/file.txt
U backup-test/2/file2.txt
A backup-test/2/latest-mtime.txt
d backup-test/2
A backup-test/3/file.txt
A backup-test/3/file2.txt
A backup-test/3/latest-mtime.txt
d backup-test/3
A backup-test/4/file.txt
A backup-test/4/file2.txt
A backup-test/4/latest-mtime.txt
d backup-test/4
A backup-test/another_file.txt
A backup-test/latest-mtime.txt
d backup-test
++ sleep 5
++ echo 'second run'
second run
++ borg create --list -v '::1_{now}' backup-test/1
U backup-test/1/file.txt
U backup-test/1/file2.txt
U backup-test/1/latest-mtime.txt
d backup-test/1
++ borg create --list -v '::1_{now}' backup-test/2
U backup-test/2/file.txt
U backup-test/2/file2.txt
U backup-test/2/latest-mtime.txt
d backup-test/2
++ borg create --list -v '::all_{now}' backup-test
U backup-test/1/file.txt
U backup-test/1/file2.txt
U backup-test/1/latest-mtime.txt
d backup-test/1
U backup-test/2/file.txt
U backup-test/2/file2.txt
U backup-test/2/latest-mtime.txt
d backup-test/2
U backup-test/3/file.txt
U backup-test/3/file2.txt
U backup-test/3/latest-mtime.txt
d backup-test/3
U backup-test/4/file.txt
U backup-test/4/file2.txt
U backup-test/4/latest-mtime.txt
d backup-test/4
U backup-test/another_file.txt
A backup-test/latest-mtime.txt
d backup-test

ThomasWaldmann added a commit that referenced this issue Nov 19, 2016
fix determination of newest mtime, fixes #1860
@ThomasWaldmann
Copy link
Member

#1864 merged.

@ThomasWaldmann
Copy link
Member

@gleb thanks for reporting!

@gleb
Copy link
Author

gleb commented Nov 19, 2016

@ThomasWaldmann thank you for fixing it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants