Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dsync/dcmp: support symlinks targets changes #618

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rezib
Copy link
Contributor

@rezib rezib commented Dec 20, 2024

Update dsync and dcmp to detect a symlinks targets changes. In this case, the symlinks are reported to be different by dcmp and are updated by dsync.

Note this works for dsync both with or without c, --contents.

New function mfu_compare_symlinks() is introduced in library API to factorize the logic for all commands.

Method

In order to reproduce the bug and validate the patch, I wrote this little script symlink.sh:

#!/bin/sh

# Prepare test environment
DATA_DIR=../data

# Remove previous data if existing
if [ -d ${DATA_DIR}/from ]; then
	rm -r ${DATA_DIR}/from
fi
if [ -d ${DATA_DIR}/to ]; then
	rm -r ${DATA_DIR}/to
fi

echo "→ Initialiazing testing data"
mkdir ${DATA_DIR}/from
echo foo > ${DATA_DIR}/from/file1
echo bar > ${DATA_DIR}/from/file2
ln -s file1 ${DATA_DIR}/from/link1
ln -s file2 ${DATA_DIR}/from/link2

echo
echo "→ Launching first dsync"
mpirun -N 2 ${HOME}/bin/dsync ${DATA_DIR}/from ${DATA_DIR}/to

# Change one symlink target
rm ${DATA_DIR}/from/link2
ln -s file1 ${DATA_DIR}/from/link2

echo
echo "→ Launching first dcmp"
mpirun -N 2 ${HOME}/bin/dcmp ${DATA_DIR}/from ${DATA_DIR}/to

echo
echo "→ Launching second dsync"
mpirun -N 2 ${HOME}/bin/dsync ${DATA_DIR}/from ${DATA_DIR}/to

echo
echo "→ Launching second dcmp"
mpirun -N 2 ${HOME}/bin/dcmp ${DATA_DIR}/from ${DATA_DIR}/to

Before

Script output:

$ sh symlink.sh 
→ Initialiazing testing data

→ Launching first dsync
[2024-12-20T08:36:36] Walking source path
[2024-12-20T08:36:36] Walking /home/remi/git/data/from
[2024-12-20T08:36:36] Walked 5 items in 0.001 secs (4545.934 items/sec) ...
[2024-12-20T08:36:36] Walked 5 items in 0.001 seconds (4353.793 items/sec)
[2024-12-20T08:36:36] Walking destination path
[2024-12-20T08:36:36] Walking /home/remi/git/data/to
[2024-12-20T08:36:36] [0] [/home/remi/git/mpifileutils/src/common/mfu_flist_walk.c:516] ERROR: Failed to stat: '/home/remi/git/data/to' (errno=2 No such file or directory)
[2024-12-20T08:36:36] Walked 0 items in 0.001 secs (0.000 items/sec) ...
[2024-12-20T08:36:36] Walked 0 items in 0.001 seconds (0.000 items/sec)
[2024-12-20T08:36:36] Started   : Dec-20-2024, 08:36:36
[2024-12-20T08:36:36] Completed : Dec-20-2024, 08:36:36
[2024-12-20T08:36:36] Seconds   : 0.000
[2024-12-20T08:36:36] Items     : 0
[2024-12-20T08:36:36] Item Rate : 0 items in 0.000295 seconds (0.000000 items/sec)
[2024-12-20T08:36:36] Copying items to destination
[2024-12-20T08:36:36] Copying to /home/remi/git/data/to
[2024-12-20T08:36:36] Items: 5
[2024-12-20T08:36:36]   Directories: 1
[2024-12-20T08:36:36]   Files: 2
[2024-12-20T08:36:36]   Links: 2
[2024-12-20T08:36:36] Data: 8.000 B (4.000 B per file)
[2024-12-20T08:36:36] Creating 1 directories
[2024-12-20T08:36:36] Creating 4 files.
[2024-12-20T08:36:36] Copying data.
[2024-12-20T08:36:36] Copy data: 8.000 B (8 bytes)
[2024-12-20T08:36:36] Copy rate: 693.124 B/s (8 bytes in 0.012 seconds)
[2024-12-20T08:36:36] Syncing data to disk.
[2024-12-20T08:36:36] Sync completed in 0.011 seconds.
[2024-12-20T08:36:36] Setting ownership, permissions, and timestamps.
[2024-12-20T08:36:36] Updated 5 items in 0.000 seconds (25281.382 items/sec)
[2024-12-20T08:36:36] Syncing directory updates to disk.
[2024-12-20T08:36:36] Sync completed in 0.002 seconds.
[2024-12-20T08:36:36] Started: Dec-20-2024,08:36:36
[2024-12-20T08:36:36] Completed: Dec-20-2024,08:36:36
[2024-12-20T08:36:36] Seconds: 0.024
[2024-12-20T08:36:36] Items: 5
[2024-12-20T08:36:36]   Directories: 1
[2024-12-20T08:36:36]   Files: 2
[2024-12-20T08:36:36]   Links: 2
[2024-12-20T08:36:36] Data: 8.000 B (8 bytes)
[2024-12-20T08:36:36] Rate: 326.574 B/s (008 bytes in 0.024 seconds)
[2024-12-20T08:36:36] Updating timestamps on newly copied files
[2024-12-20T08:36:36] Completed updating timestamps
[2024-12-20T08:36:36] Completed sync

→ Launching first dcmp
[2024-12-20T08:36:37] Walking source path
[2024-12-20T08:36:37] Walking /home/remi/git/data/from
[2024-12-20T08:36:37] Walked 5 items in 0.001 secs (4468.399 items/sec) ...
[2024-12-20T08:36:37] Walked 5 items in 0.001 seconds (4269.239 items/sec)
[2024-12-20T08:36:37] Walking destination path
[2024-12-20T08:36:37] Walking /home/remi/git/data/to
[2024-12-20T08:36:37] Walked 5 items in 0.001 secs (7051.838 items/sec) ...
[2024-12-20T08:36:37] Walked 5 items in 0.001 seconds (6870.501 items/sec)
[2024-12-20T08:36:37] Comparing items
[2024-12-20T08:36:37] Comparing file contents
[2024-12-20T08:36:37] Started   : Dec-20-2024, 08:36:37
[2024-12-20T08:36:37] Completed : Dec-20-2024, 08:36:37
[2024-12-20T08:36:37] Seconds   : 0.010
[2024-12-20T08:36:37] Items     : 5
[2024-12-20T08:36:37] Item Rate : 5 items in 0.010120 seconds (494.092677 items/sec)
[2024-12-20T08:36:37] Bytes read: 16.000 B (16 bytes)
[2024-12-20T08:36:37] Byte Rate : 1.544 KiB/s (016 bytes in 0.010 seconds)
Number of items that exist in both directories: 5 (Src: 5 Dest: 5)
Number of items that exist only in one directory: 0 (Src: 0 Dest: 0)
Number of items that exist in both directories and have the same type: 5 (Src: 5 Dest: 5)
Number of items that exist in both directories and have different types: 0 (Src: 0 Dest: 0)
Number of items that exist in both directories and have the same content: 5 (Src: 5 Dest: 5)
Number of items that exist in both directories and have different contents: 0 (Src: 0 Dest: 0)

→ Launching second dsync
[2024-12-20T08:36:38] Walking source path
[2024-12-20T08:36:38] Walking /home/remi/git/data/from
[2024-12-20T08:36:38] Walked 5 items in 0.001 secs (6070.945 items/sec) ...
[2024-12-20T08:36:38] Walked 5 items in 0.001 seconds (5802.053 items/sec)
[2024-12-20T08:36:38] Walking destination path
[2024-12-20T08:36:38] Walking /home/remi/git/data/to
[2024-12-20T08:36:38] Walked 5 items in 0.001 secs (9796.104 items/sec) ...
[2024-12-20T08:36:38] Walked 5 items in 0.001 seconds (9540.583 items/sec)
[2024-12-20T08:36:38] Comparing file sizes and modification times of 2 items
[2024-12-20T08:36:38] Started   : Dec-20-2024, 08:36:38
[2024-12-20T08:36:38] Completed : Dec-20-2024, 08:36:38
[2024-12-20T08:36:38] Seconds   : 0.000
[2024-12-20T08:36:38] Items     : 2
[2024-12-20T08:36:38] Item Rate : 2 items in 0.000189 seconds (10564.682267 items/sec)
[2024-12-20T08:36:38] Updating timestamps on newly copied files
[2024-12-20T08:36:38] Completed updating timestamps
[2024-12-20T08:36:38] Completed sync

→ Launching second dcmp
[2024-12-20T08:36:39] Walking source path
[2024-12-20T08:36:39] Walking /home/remi/git/data/from
[2024-12-20T08:36:39] Walked 5 items in 0.002 secs (2566.423 items/sec) ...
[2024-12-20T08:36:39] Walked 5 items in 0.002 seconds (2456.619 items/sec)
[2024-12-20T08:36:39] Walking destination path
[2024-12-20T08:36:39] Walking /home/remi/git/data/to
[2024-12-20T08:36:39] Walked 5 items in 0.001 secs (3797.583 items/sec) ...
[2024-12-20T08:36:39] Walked 5 items in 0.001 seconds (3672.312 items/sec)
[2024-12-20T08:36:39] Comparing items
[2024-12-20T08:36:39] Comparing file contents
[2024-12-20T08:36:39] Started   : Dec-20-2024, 08:36:39
[2024-12-20T08:36:39] Completed : Dec-20-2024, 08:36:39
[2024-12-20T08:36:39] Seconds   : 0.001
[2024-12-20T08:36:39] Items     : 5
[2024-12-20T08:36:39] Item Rate : 5 items in 0.000552 seconds (9062.173762 items/sec)
[2024-12-20T08:36:39] Bytes read: 16.000 B (16 bytes)
[2024-12-20T08:36:39] Byte Rate : 28.319 KiB/s (016 bytes in 0.001 seconds)
Number of items that exist in both directories: 5 (Src: 5 Dest: 5)
Number of items that exist only in one directory: 0 (Src: 0 Dest: 0)
Number of items that exist in both directories and have the same type: 5 (Src: 5 Dest: 5)
Number of items that exist in both directories and have different types: 0 (Src: 0 Dest: 0)
Number of items that exist in both directories and have the same content: 5 (Src: 5 Dest: 5)
Number of items that exist in both directories and have different contents: 0 (Src: 0 Dest: 0)

The content in files tree:

$ ls ../data/from/ -lh
total 8,0K
-rw-rw-r-- 1 remi remi 4 déc.  20 08:36 file1
-rw-rw-r-- 1 remi remi 4 déc.  20 08:36 file2
lrwxrwxrwx 1 remi remi 5 déc.  20 08:36 link1 -> file1
lrwxrwxrwx 1 remi remi 5 déc.  20 08:36 link2 -> file1
$ ls ../data/to/ -lh
total 8,0K
-rw-rw-r-- 1 remi remi 4 déc.  20 08:36 file1
-rw-rw-r-- 1 remi remi 4 déc.  20 08:36 file2
lrwxrwxrwx 1 remi remi 5 déc.  20 08:36 link1 -> file1
lrwxrwxrwx 1 remi remi 5 déc.  20 08:36 link2 -> file2

You can see that link2 still points to file2 despite the second dsync run.

After

Script output:

$ sh symlink.sh 
→ Initialiazing testing data

→ Launching first dsync
[2024-12-20T08:34:33] Walking source path
[2024-12-20T08:34:33] Walking /home/remi/git/data/from
[2024-12-20T08:34:33] Walked 5 items in 0.003 secs (1832.340 items/sec) ...
[2024-12-20T08:34:33] Walked 5 items in 0.003 seconds (1760.092 items/sec)
[2024-12-20T08:34:33] Walking destination path
[2024-12-20T08:34:33] Walking /home/remi/git/data/to
[2024-12-20T08:34:33] [0] [/home/remi/git/mpifileutils/src/common/mfu_flist_walk.c:516] ERROR: Failed to stat: '/home/remi/git/data/to' (errno=2 No such file or directory)
[2024-12-20T08:34:33] Walked 0 items in 0.002 secs (0.000 items/sec) ...
[2024-12-20T08:34:33] Walked 0 items in 0.002 seconds (0.000 items/sec)
[2024-12-20T08:34:33] Started   : Dec-20-2024, 08:34:33
[2024-12-20T08:34:33] Completed : Dec-20-2024, 08:34:33
[2024-12-20T08:34:33] Seconds   : 0.001
[2024-12-20T08:34:33] Items     : 0
[2024-12-20T08:34:33] Item Rate : 0 items in 0.000680 seconds (0.000000 items/sec)
[2024-12-20T08:34:33] Copying items to destination
[2024-12-20T08:34:33] Copying to /home/remi/git/data/to
[2024-12-20T08:34:33] Items: 5
[2024-12-20T08:34:33]   Directories: 1
[2024-12-20T08:34:33]   Files: 2
[2024-12-20T08:34:33]   Links: 2
[2024-12-20T08:34:33] Data: 8.000 B (4.000 B per file)
[2024-12-20T08:34:33] Creating 1 directories
[2024-12-20T08:34:33] Creating 4 files.
[2024-12-20T08:34:33] Copying data.
[2024-12-20T08:34:33] Copy data: 8.000 B (8 bytes)
[2024-12-20T08:34:33] Copy rate: 631.445 B/s (8 bytes in 0.013 seconds)
[2024-12-20T08:34:33] Syncing data to disk.
[2024-12-20T08:34:33] Sync completed in 0.016 seconds.
[2024-12-20T08:34:33] Setting ownership, permissions, and timestamps.
[2024-12-20T08:34:33] Updated 5 items in 0.000 seconds (25938.453 items/sec)
[2024-12-20T08:34:33] Syncing directory updates to disk.
[2024-12-20T08:34:33] Sync completed in 0.001 seconds.
[2024-12-20T08:34:33] Started: Dec-20-2024,08:34:33
[2024-12-20T08:34:33] Completed: Dec-20-2024,08:34:33
[2024-12-20T08:34:33] Seconds: 0.032
[2024-12-20T08:34:33] Items: 5
[2024-12-20T08:34:33]   Directories: 1
[2024-12-20T08:34:33]   Files: 2
[2024-12-20T08:34:33]   Links: 2
[2024-12-20T08:34:33] Data: 8.000 B (8 bytes)
[2024-12-20T08:34:33] Rate: 253.288 B/s (008 bytes in 0.032 seconds)
[2024-12-20T08:34:33] Updating timestamps on newly copied files
[2024-12-20T08:34:33] Completed updating timestamps
[2024-12-20T08:34:33] Completed sync

→ Launching first dcmp
[2024-12-20T08:34:34] Walking source path
[2024-12-20T08:34:34] Walking /home/remi/git/data/from
[2024-12-20T08:34:34] Walked 5 items in 0.001 secs (5848.090 items/sec) ...
[2024-12-20T08:34:34] Walked 5 items in 0.001 seconds (5429.672 items/sec)
[2024-12-20T08:34:34] Walking destination path
[2024-12-20T08:34:34] Walking /home/remi/git/data/to
[2024-12-20T08:34:34] Walked 5 items in 0.001 secs (9674.211 items/sec) ...
[2024-12-20T08:34:34] Walked 5 items in 0.001 seconds (9396.977 items/sec)
[2024-12-20T08:34:34] Comparing items
[2024-12-20T08:34:34] Comparing file contents
[2024-12-20T08:34:34] Started   : Dec-20-2024, 08:34:34
[2024-12-20T08:34:34] Completed : Dec-20-2024, 08:34:34
[2024-12-20T08:34:34] Seconds   : 0.010
[2024-12-20T08:34:34] Items     : 5
[2024-12-20T08:34:34] Item Rate : 5 items in 0.010105 seconds (494.798970 items/sec)
[2024-12-20T08:34:34] Bytes read: 16.000 B (16 bytes)
[2024-12-20T08:34:34] Byte Rate : 1.546 KiB/s (016 bytes in 0.010 seconds)
Number of items that exist in both directories: 5 (Src: 5 Dest: 5)
Number of items that exist only in one directory: 0 (Src: 0 Dest: 0)
Number of items that exist in both directories and have the same type: 5 (Src: 5 Dest: 5)
Number of items that exist in both directories and have different types: 0 (Src: 0 Dest: 0)
Number of items that exist in both directories and have the same content: 4 (Src: 4 Dest: 4)
Number of items that exist in both directories and have different contents: 1 (Src: 1 Dest: 1)

→ Launching second dsync
[2024-12-20T08:34:35] Walking source path
[2024-12-20T08:34:35] Walking /home/remi/git/data/from
[2024-12-20T08:34:35] Walked 5 items in 0.001 secs (5686.063 items/sec) ...
[2024-12-20T08:34:35] Walked 5 items in 0.001 seconds (5300.288 items/sec)
[2024-12-20T08:34:35] Walking destination path
[2024-12-20T08:34:35] Walking /home/remi/git/data/to
[2024-12-20T08:34:35] Walked 5 items in 0.001 secs (9298.987 items/sec) ...
[2024-12-20T08:34:35] Walked 5 items in 0.001 seconds (9030.993 items/sec)
[2024-12-20T08:34:35] Comparing file sizes and modification times of 4 items
[2024-12-20T08:34:35] Started   : Dec-20-2024, 08:34:35
[2024-12-20T08:34:35] Completed : Dec-20-2024, 08:34:35
[2024-12-20T08:34:35] Seconds   : 0.000
[2024-12-20T08:34:35] Items     : 4
[2024-12-20T08:34:35] Item Rate : 4 items in 0.000298 seconds (13433.908529 items/sec)
[2024-12-20T08:34:35] Deleting items from destination
[2024-12-20T08:34:35] Removing 1 items
[2024-12-20T08:34:35] Removed 1 items in 0.000 seconds (3086.277 items/sec)
[2024-12-20T08:34:35] Copying items to destination
[2024-12-20T08:34:35] Copying to /home/remi/git/data/to
[2024-12-20T08:34:35] Items: 1
[2024-12-20T08:34:35]   Directories: 0
[2024-12-20T08:34:35]   Files: 0
[2024-12-20T08:34:35]   Links: 1
[2024-12-20T08:34:35] Data: 0.000 B (0.000 B per file)
[2024-12-20T08:34:35] Creating 1 files.
[2024-12-20T08:34:35] Copying data.
[2024-12-20T08:34:35] Copy data: 0.000 B (0 bytes)
[2024-12-20T08:34:35] Copy rate: 0.000 B/s (0 bytes in 0.000 seconds)
[2024-12-20T08:34:35] Syncing data to disk.
[2024-12-20T08:34:35] Sync completed in 0.012 seconds.
[2024-12-20T08:34:35] Setting ownership, permissions, and timestamps.
[2024-12-20T08:34:35] Updated 1 items in 0.000 seconds (6755.935 items/sec)
[2024-12-20T08:34:35] Syncing directory updates to disk.
[2024-12-20T08:34:35] Sync completed in 0.003 seconds.
[2024-12-20T08:34:35] Started: Dec-20-2024,08:34:35
[2024-12-20T08:34:35] Completed: Dec-20-2024,08:34:35
[2024-12-20T08:34:35] Seconds: 0.016
[2024-12-20T08:34:35] Items: 1
[2024-12-20T08:34:35]   Directories: 0
[2024-12-20T08:34:35]   Files: 0
[2024-12-20T08:34:35]   Links: 1
[2024-12-20T08:34:35] Data: 0.000 B (0 bytes)
[2024-12-20T08:34:35] Rate: 0.000 B/s (000 bytes in 0.016 seconds)
[2024-12-20T08:34:35] Updating timestamps on newly copied files
[2024-12-20T08:34:35] Completed updating timestamps
[2024-12-20T08:34:35] Completed sync

→ Launching second dcmp
[2024-12-20T08:34:36] Walking source path
[2024-12-20T08:34:36] Walking /home/remi/git/data/from
[2024-12-20T08:34:36] Walked 5 items in 0.002 secs (2460.492 items/sec) ...
[2024-12-20T08:34:36] Walked 5 items in 0.002 seconds (2283.881 items/sec)
[2024-12-20T08:34:36] Walking destination path
[2024-12-20T08:34:36] Walking /home/remi/git/data/to
[2024-12-20T08:34:36] Walked 5 items in 0.001 secs (3496.357 items/sec) ...
[2024-12-20T08:34:36] Walked 5 items in 0.001 seconds (3455.165 items/sec)
[2024-12-20T08:34:36] Comparing items
[2024-12-20T08:34:36] Comparing file contents
[2024-12-20T08:34:36] Started   : Dec-20-2024, 08:34:36
[2024-12-20T08:34:36] Completed : Dec-20-2024, 08:34:36
[2024-12-20T08:34:36] Seconds   : 0.000
[2024-12-20T08:34:36] Items     : 5
[2024-12-20T08:34:36] Item Rate : 5 items in 0.000320 seconds (15621.143530 items/sec)
[2024-12-20T08:34:36] Bytes read: 16.000 B (16 bytes)
[2024-12-20T08:34:36] Byte Rate : 48.816 KiB/s (016 bytes in 0.000 seconds)
Number of items that exist in both directories: 5 (Src: 5 Dest: 5)
Number of items that exist only in one directory: 0 (Src: 0 Dest: 0)
Number of items that exist in both directories and have the same type: 5 (Src: 5 Dest: 5)
Number of items that exist in both directories and have different types: 0 (Src: 0 Dest: 0)
Number of items that exist in both directories and have the same content: 5 (Src: 5 Dest: 5)
Number of items that exist in both directories and have different contents: 0 (Src: 0 Dest: 0)

The content in files tree:

$ ls ../data/from/ -lh
total 8,0K
-rw-rw-r-- 1 remi remi 4 déc.  20 08:34 file1
-rw-rw-r-- 1 remi remi 4 déc.  20 08:34 file2
lrwxrwxrwx 1 remi remi 5 déc.  20 08:34 link1 -> file1
lrwxrwxrwx 1 remi remi 5 déc.  20 08:34 link2 -> file1
$ ls ../data/to/ -lh
total 8,0K
-rw-rw-r-- 1 remi remi 4 déc.  20 08:34 file1
-rw-rw-r-- 1 remi remi 4 déc.  20 08:34 file2
lrwxrwxrwx 1 remi remi 5 déc.  20 08:34 link1 -> file1
lrwxrwxrwx 1 remi remi 5 déc.  20 08:34 link2 -> file1

You can see that link2 now points to file1 after the second dsync run.

Note

I would like to emphasize that this work is sponsored by @cea-hpc.

fix #412

Update dsync and dcmp to detect a symlinks targets changes. In this
case, the symlinks are reported to be different by dcmp and are updated
by dsync.

Note this works for dsync both with or without c, --contents.

New function mfu_compare_symlinks() is introduced in library API to
factorize the logic for all commands.

Signed-off-by: Rémi Palancher <[email protected]>
@ofaaland ofaaland requested a review from carbonneau1 January 7, 2025 03:23
@rezib
Copy link
Contributor Author

rezib commented Jan 22, 2025

Is there anything I can do to help you merge this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dcp and dsync: handle when destination link exists with different target
2 participants