Improve file management regarding orphaned files #10274
jonasraoni
started this conversation in
Proposals
Replies: 1 comment 1 reply
-
Before working on this, I'd like us to generally refactor/reposition our file management to consistently use Flysystem. We currently use it only partially. This might allow us to (better) support remote filesystems, and may present its own options for garbage collection, checking/matching, error handling, etc. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Problem
Due to incomplete operations, such as unexpected exceptions and outages, orphaned files might be left on disk, and it's difficult to understand what can be removed or not.
Discussion
temporary_files
and afiles
table, both can be leveraged to improve the situation.temporary_files
entity has more useful information than itsfiles
counterpart. Both can be merged into a single entityfiles
, to simplify the code maintenance.files
table and linked through an ID.files
can also be included in the cleanup./tmp
folder, which might be too small, not cleaned up often and perhaps a bit unsafe (shared storage).Possible workflow
Creating a temporary file
files
table with metadata information (name, mime, size, user, date, ...), the path can be left asnull
to identify it as a temporary file (a flag field is also ok). If it fails: No orphan data will remain.{$files_dir}/temporary/{$file_id}
, by using a static path we won't need to reserve paths/update the database with paths (generating a trueUUID
to be used as filename also works). If it fails: The database entry will be garbage collected later on.Promoting a temporary file
files
entry's path to its final destination (e.g.{$files_dir}/public/{$file_id_or_name_or_uuid}.jpg
). If it fails: It's still a temporary file and will be garbage collected.Creating a permanent file out of another file/stream
files
table using the final path (e.g.{$files_dir}/public/{$file_id_or_name_or_uuid}.jpg
)). If it fails: No orphan data will remain.Garbage collector
NULL
path. This could be executed often, there shouldn't be an ever-growing number of entries.files
table and identify which entries are not linked to anything. This is more expensive to run, so it should be executed seldomly.files
table, and offer the user to see the entries/delete. This is more resource intensive and should be executed manually (or perhaps once a year with a report "There might be 30GB of useless files on your installation").Drawbacks
Beta Was this translation helpful? Give feedback.
All reactions