Skip to content

Embedded calls to find in cleanUsersData.php do not scale up #19

@jmfernandez

Description

@jmfernandez

I have found in the logs emitted by cleanUsersData.php from other openVRE deployments messages like next:

Looking for temporary data (/gpfs/vre/userdata//*/*/.tmp/)
-- find /gpfs/vre/userdata//*/*/.tmp/  -maxdepth 1  -mtime +7 -type f  -exec rm -fv {} \;
sh: 1: find: Argument list too long
-- find /gpfs/vre/userdata//*/*/.tmp/  -maxdepth 1  -mtime +7 -type d -name '?*_[0-9]*'  -exec rm -Rfv {} \;
sh: 1: find: Argument list too long
-- find /gpfs/vre/userdata//*/*/.tmp/  -maxdepth 1  -size +2G -mtime -7 -type f -name '*tar.gz'  -exec rm -fv {} \;
sh: 1: find: Argument list too long

And they are happening when the user data directory from that openVRE deployment contains too many entries (more than the number of parameters which can passed to a program from command line). These are the blamed lines:

//$cmd="find ".$GLOBALS['dataDir']."/*/*/".$GLOBALS['tmpUser_dir']." -maxdepth 1 -mtime +".$caduca_strict." -type f -exec rm -fv {} \\;";
$cmd = "find ".$GLOBALS['dataDir']."/*/*/".$GLOBALS['tmpUser_dir']." -maxdepth 1 -mtime +".$caduca_strict." -type f ";
if ($dry_run === false) {
$cmd.= " -exec rm -fv {} \\;";
}
print "-- $cmd\n";
system($cmd);
// find and delete directories in .tmp named like '?*_[0-9]*' and older than 'caduca_strict'
//$cmd="find ".$GLOBALS['dataDir']."/*/*/".$GLOBALS['tmpUser_dir']." -maxdepth 1 -mtime +".$caduca_strict." -type d -name '?*_[0-9]*' -exec rm -Rfv {} \\;";
$cmd = "find ".$GLOBALS['dataDir']."/*/*/".$GLOBALS['tmpUser_dir']." -maxdepth 1 -mtime +".$caduca_strict." -type d -name '?*_[0-9]*' ";
if ($dry_run === false) {
$cmd.= " -exec rm -Rfv {} \\;";
}
print "-- $cmd\n";
system($cmd);
// find and delete TAR files in .tmp bigger than 2GB // TODO: no need of it when issue with 'massive TARs' is solved
//$cmd="find ".$GLOBALS['dataDir']."/*/*/".$GLOBALS['tmpUser_dir']." -maxdepth 1 -size +2G -mtime -".$caduca_strict." -type f -name '*tar.gz' -exec rm -fv {} \\;";
$cmd = "find ".$GLOBALS['dataDir']."/*/*/".$GLOBALS['tmpUser_dir']." -maxdepth 1 -size +2G -mtime -".$caduca_strict." -type f -name '*tar.gz' ";
if ($dry_run === false) {
$cmd.= " -exec rm -fv {} \\;";
}

This issue can be fixed using something similar to next command line, where find command delegates on xargs calling nested find commands, so no wildcard expansion happens:

    if(substr($GLOBALS['tmpUser_dir'], -1) == "/") {
        $tmpuser_dir = substr($GLOBALS['tmpUser_dir'], 0, -1);
    } else {
        $tmpuser_dir = $GLOBALS['tmpUser_dir'];
    }

    $cmd = "find ".$GLOBALS['dataDir']." -type d -name ".$tmpuser_dir." -print0 | xargs -0 -I{.} find {.} -maxdepth 1  -mtime +".$caduca_strict." -type f ";
    if ($dry_run === false) {
        $cmd.= " -exec rm -fv {} \\;";
    }
    print "-- $cmd\n";
    system($cmd);

    $cmd = "find ".$GLOBALS['dataDir']." -type d -name ".$tmpuser_dir." -print0 | xargs -0 -I{.} find {.} -maxdepth 1  -mtime +".$caduca_strict." -type d -name '?*_[0-9]*' ";
    if ($dry_run === false) {
        $cmd.= " -exec rm -Rfv {} \\;";
    }
    print "-- $cmd\n";
    system($cmd);

    $cmd = "find ".$GLOBALS['dataDir']." -type d -name ".$tmpuser_dir." -print0 | xargs -0 -I{.} find {.} -maxdepth 1  -size +2G -mtime -".$caduca_strict." -type f -name '*tar.gz' ";
    if ($dry_run === false) {
        $cmd.= " -exec rm -fv {} \\;";
    }
    print "-- $cmd\n";
    system($cmd);

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions