Welcome to fpart Discussions! #52
Replies: 4 comments 3 replies
-
Hello, Is there any option for fpart mulithreading? Would be nice to have an option for fpart to make parts just by files/directories number without size calculation. |
Beta Was this translation helpful? Give feedback.
-
And about fpsync+rsync.. On slow networks found strange rsync behavour.. sometime process is freezing with D-state and for some reason changes PID number. |
Beta Was this translation helpful? Give feedback.
-
Hi,
If you're using fpart to feed a file transfer program, you don't have to
wait for fpart to finish before starting to sync. Once the first partition
is done, you can start sending data. fpsync works like this, as do some
other data schleppers that use fpart as the chunking agent
(parsyncfp2/pfp2). The problem I think you're seeing is the zillions of
tiny files (zotfiles) in a dir where the time to stat each zotfile can
become overwhelming at the rsync level. pfp2 (dev version / I wrote it)
addresses this by asynchronously pre-tarchiving the zotfiles, sending the
tarchive, then untar'ing on the remote side. It's especially useful on
long-latency transfers but useful even on LANs (depending on how much CPU
you can dedicate to tar/gzip).
harry
…On Thu, Apr 11, 2024 at 6:23 AM AlexanderDorozhkin ***@***.***> wrote:
Hello,
Is there any option for fpart mulithreading?
When I syncing TB-sized folders with tons of subfolders and billions of
files fpart works tooo slow (hours), it calculates space for each folder
for fair partitioning.. As result fpart runs much longer than exact file
copy..
As workaround I use pretty fast 'locar' tool to scan dirs and use it for
input for fpart (dir-only mode) without space calculation.
Not a fair, but overall sync time for my case is 2x faster.
Would be nice to have an option for fpart to make parts just by
files/directories number without size calculation.
—
Reply to this email directly, view it on GitHub
<#52 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AASF3Y27MV6FKYRVPOPYEYDY42FG3AVCNFSM6AAAAAA5SM5D46VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TAOBUGIZDA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
Harry Mangalam
|
Beta Was this translation helpful? Give feedback.
-
Hello Alexander, Fpart multithreading is not available. To be honest, I am not sure we would gain much in that area as fpart is not CPU-bound but mostly waiting for IOs while stat()ing files; multiprocessing/multithreading would even probably induce latency (think of needed ITC/IPCs to orchestrate the crawling... just to queue even more IO requests). Last but not least, parallel multithreading would break crawling order (fpart needs to be depth-first to allow certain options) as well as run reproducibility (we probably want to generate the same partitions from one call to another). Fpart's "fair partitioning" you mention is limited in live mode (the mode used by fpsync) : it just limits the size/files of partitions while performing crawling. It does not involve the creation of optimized/balanced ones (as done in non-live mode). That should be very fast. As Harry wrote, fpsync starts transferring generated partitions from the beginning, as soon as they are available. If rsync child processes can't easily cope with you many-many small files, maybe try to reduce the files per partition ? You can also try cpio or tar tools for your initial (incremental) syncs. About you second question, it is weird a "frozen" process (waiting for IO) is seen as completed by fpart, it should not. Have you tried latest fpart (1.6.0) ? |
Beta Was this translation helpful? Give feedback.
-
👋 Welcome!
We’re using Discussions as a place to connect with other members of our community. We hope that you:
build together 💪.
To get started, comment below with an introduction of yourself and tell us about what you do with this community.
Beta Was this translation helpful? Give feedback.
All reactions