- top
- future ideas - list of dreams which will probably never happen
- design
- up2k - quick outline of the up2k protocol
- why not tus - I didn't know about tus
- why chunk-hashes - a single sha512 would be better, right?
- list of chunk-sizes - specific chunksizes are enforced
- up2k - quick outline of the up2k protocol
- hashed passwords - regarding the curious decisions
- http api
- event hooks - on writing your own hooks
- hook effects - hooks can cause intentional side-effects
- assumptions
- sfx repack - reduce the size of an sfx by removing features
- building
- dev env setup
- just the sfx
- build from release tarball - uses the included prebuilt webdeps
- complete release
- debugging
- music playback halting on phones - mostly fine on android
- discarded ideas
list of dreams which will probably never happen
- the JS is a mess -- a
preactrewrite would be nice- preferably without build dependencies like webpack/babel/node.js, maybe a python thing to assemble js files into main.js
- good excuse to look at using virtual lists (browsers start to struggle when folders contain over 5000 files)
- maybe preact / vdom isn't the best choice, could just wait for the Next Big Thing
- the UX is a mess -- a proper design would be nice
- very organic (much like the python/js), everything was an afterthought
- true for both the layout and the visual flair
- something like the tron board-room ui (or most other hollywood ones, like ironman) would be 💯
- would preferably keep the information density, just more organized yet not too boring
- some of the python files are way too big
up2k.py
ended up doing all the file indexing / db managementhttpcli.py
should be separated into modules in general
quick outline of the up2k protocol, see uploading for the web-client
- the up2k client splits a file into an "optimal" number of chunks
- 1 MiB each, unless that becomes more than 256 chunks
- tries 1.5M, 2M, 3, 4, 6, ... until <= 256 chunks or size >= 32M
- client posts the list of hashes, filename, size, last-modified
- server creates the
wark
, an identifier for this uploadsha512( salt + filesize + chunk_hashes )
- and a sparse file is created for the chunks to drop into
- client sends a series of POSTs, with one or more consecutive chunks in each
- header entries for the chunk-hashes (comma-separated) and wark
- server writes chunks into place based on the hash
- client does another handshake with the hashlist; server replies with OK or a list of chunks to reupload
up2k has saved a few uploads from becoming corrupted in-transfer already;
- caught an android phone on wifi redhanded in wireshark with a bitflip, however bup with https would probably have noticed as well (thanks to tls also functioning as an integrity check)
- also stopped someone from uploading because their ram was bad
regarding the frequent server log message during uploads;
6.0M 106M/s 2.77G 102.9M/s n948 thank 4/0/3/1 10042/7198 00:01:09
- this chunk was
6 MiB
, uploaded at106 MiB/s
- on this http connection,
2.77 GiB
transferred,102.9 MiB/s
average,948
chunks handled - client says
4
uploads OK,0
failed,3
busy,1
queued,10042 MiB
total size,7198 MiB
and00:01:09
left
I didn't know about tus when I made this, but:
- up2k has the advantage that it supports parallel uploading of non-contiguous chunks straight into the final file -- tus does a merge at the end which is slow and taxing on the server HDD / filesystem (unless i'm misunderstanding)
- up2k has the slight disadvantage of requiring the client to hash the entire file before an upload can begin, but this has the benefit of immediately skipping duplicate files
- and the hashing happens in a separate thread anyways so it's usually not a bottleneck
a single sha512 would be better, right?
this was due to crypto.subtle
not yet providing a streaming api (or the option to seed the sha512 hasher with a starting hash)
as a result, the hashes are much less useful than they could have been (search the server by sha512, provide the sha512 in the response http headers, ...)
however it allows for hashing multiple chunks in parallel, greatly increasing upload speed from fast storage (NVMe, raid-0 and such)
- both the browser uploader and the commandline one does this now, allowing for fast uploading even from plaintext http
hashwasm would solve the streaming issue but reduces hashing speed for sha512 (xxh128 does 6 GiB/s), and it would make old browsers and iphones unsupported
- blake2 might be a better choice since xxh is non-cryptographic, but that gets ~15 MiB/s on slower androids
specific chunksizes are enforced depending on total filesize
each pair of filesize/chunksize is the largest filesize which will use its listed chunksize; a 512 MiB file will use chunksize 2 MiB, but if the file is one byte larger than 512 MiB then it becomes 3 MiB
for the purpose of performance (or dodging arbitrary proxy limitations), it is possible to upload combined and/or partial chunks using stitching and/or subchunks respectively
filesize | filesize | chunksize | chunksz |
---|---|---|---|
268 435 456 | 256 MiB | 1 048 576 | 1.0 MiB |
402 653 184 | 384 MiB | 1 572 864 | 1.5 MiB |
536 870 912 | 512 MiB | 2 097 152 | 2.0 MiB |
805 306 368 | 768 MiB | 3 145 728 | 3.0 MiB |
1 073 741 824 | 1.0 GiB | 4 194 304 | 4.0 MiB |
1 610 612 736 | 1.5 GiB | 6 291 456 | 6.0 MiB |
2 147 483 648 | 2.0 GiB | 8 388 608 | 8.0 MiB |
3 221 225 472 | 3.0 GiB | 12 582 912 | 12 MiB |
4 294 967 296 | 4.0 GiB | 16 777 216 | 16 MiB |
6 442 450 944 | 6.0 GiB | 25 165 824 | 24 MiB |
137 438 953 472 | 128 GiB | 33 554 432 | 32 MiB |
206 158 430 208 | 192 GiB | 50 331 648 | 48 MiB |
274 877 906 944 | 256 GiB | 67 108 864 | 64 MiB |
412 316 860 416 | 384 GiB | 100 663 296 | 96 MiB |
549 755 813 888 | 512 GiB | 134 217 728 | 128 MiB |
824 633 720 832 | 768 GiB | 201 326 592 | 192 MiB |
1 099 511 627 776 | 1.0 TiB | 268 435 456 | 256 MiB |
1 649 267 441 664 | 1.5 TiB | 402 653 184 | 384 MiB |
2 199 023 255 552 | 2.0 TiB | 536 870 912 | 512 MiB |
3 298 534 883 328 | 3.0 TiB | 805 306 368 | 768 MiB |
4 398 046 511 104 | 4.0 TiB | 1 073 741 824 | 1.0 GiB |
6 597 069 766 656 | 6.0 TiB | 1 610 612 736 | 1.5 GiB |
8 796 093 022 208 | 8.0 TiB | 2 147 483 648 | 2.0 GiB |
13 194 139 533 312 | 12.0 TiB | 3 221 225 472 | 3.0 GiB |
17 592 186 044 416 | 16.0 TiB | 4 294 967 296 | 4.0 GiB |
26 388 279 066 624 | 24.0 TiB | 6 442 450 944 | 6.0 GiB |
35 184 372 088 832 | 32.0 TiB | 8 589 934 592 | 8.0 GiB |
regarding the curious decisions
there is a static salt for all passwords;
- because most copyparty APIs allow users to authenticate using only their password, making the username unknown, so impossible to do per-account salts
- the drawback of this is that an attacker can bruteforce all accounts in parallel, however most copyparty instances only have a handful of accounts in the first place, and it can be compensated by increasing the hashing cost anyways
- table-column
params
= URL parameters;?foo=bar&qux=...
- table-column
body
= POST payload - method
jPOST
= json post - method
mPOST
= multipart post - method
uPOST
= url-encoded post FILE
= conventional HTTP file upload entry (rfc1867 et al, filename inContent-Disposition
)
authenticate using header Cookie: cppwd=foo
or url param &pw=foo
method | params | result |
---|---|---|
GET | ?ls |
list files/folders at URL as JSON |
GET | ?ls&dots |
list files/folders at URL as JSON, including dotfiles |
GET | ?ls=t |
list files/folders at URL as plaintext |
GET | ?ls=v |
list files/folders at URL, terminal-formatted |
GET | ?lt |
in listings, use symlink timestamps rather than targets |
GET | ?b |
list files/folders at URL as simplified HTML |
GET | ?tree=. |
list one level of subdirectories inside URL |
GET | ?tree |
list one level of subdirectories for each level until URL |
GET | ?tar |
download everything below URL as a gnu-tar file |
GET | ?tar=gz:9 |
...as a gzip-level-9 gnu-tar file |
GET | ?tar=xz:9 |
...as an xz-level-9 gnu-tar file |
GET | ?tar=pax |
...as a pax-tar file |
GET | ?tar=pax,xz |
...as an xz-level-1 pax-tar file |
GET | ?zip |
...as a zip file |
GET | ?zip=dos |
...as a WinXP-compatible zip file |
GET | ?zip=crc |
...as an MSDOS-compatible zip file |
GET | ?tar&w |
pregenerate webp thumbnails |
GET | ?tar&j |
pregenerate jpg thumbnails |
GET | ?tar&p |
pregenerate audio waveforms |
GET | ?shares |
list your shared files/folders |
GET | ?dls |
show active downloads (do this as admin) |
GET | ?ups |
show recent uploads from your IP |
GET | ?ups&filter=f |
...where URL contains f |
GET | ?ru |
show all recent uploads |
GET | ?ru&filter=f |
...where URL contains f |
GET | ?ru&j |
...as json |
GET | ?mime=foo |
specify return mimetype foo |
GET | ?v |
render markdown file at URL |
GET | ?v |
open image/video/audio in mediaplayer |
GET | ?txt |
get file at URL as plaintext |
GET | ?txt=iso-8859-1 |
...with specific charset |
GET | ?th |
get image/video at URL as thumbnail |
GET | ?th=opus |
convert audio file to 128kbps opus |
GET | ?th=caf |
...in the iOS-proprietary container |
method | body | result |
---|---|---|
jPOST | {"q":"foo"} |
do a server-wide search; see the [🔎] search tab raw field for syntax |
method | params | body | result |
---|---|---|---|
jPOST | ?tar |
["foo","bar"] |
download folders foo and bar inside URL as a tar file |
method | params | result |
---|---|---|
POST | ?copy=/foo/bar |
copy the file/folder at URL to /foo/bar |
POST | ?move=/foo/bar |
move/rename the file/folder at URL to /foo/bar |
method | params | body | result |
---|---|---|---|
PUT | (binary data) | upload into file at URL | |
PUT | ?j |
(binary data) | ...and reply with json |
PUT | ?ck |
(binary data) | upload without checksum gen (faster) |
PUT | ?ck=md5 |
(binary data) | return md5 instead of sha512 |
PUT | ?gz |
(binary data) | compress with gzip and write into file at URL |
PUT | ?xz |
(binary data) | compress with xz and write into file at URL |
mPOST | f=FILE |
upload FILE into the folder at URL |
|
mPOST | ?j |
f=FILE |
...and reply with json |
mPOST | ?ck |
f=FILE |
...and disable checksum gen (faster) |
mPOST | ?ck=md5 |
f=FILE |
...and return md5 instead of sha512 |
mPOST | ?replace |
f=FILE |
...and overwrite existing files |
mPOST | ?media |
f=FILE |
...and return medialink (not hotlink) |
mPOST | act=mkdir , name=foo |
create directory foo at URL |
|
POST | ?delete |
delete URL recursively | |
POST | ?eshare=rm |
stop sharing a file/folder | |
POST | ?eshare=3 |
set expiration to 3 minutes | |
jPOST | ?share |
(complicated) | create temp URL for file/folder |
jPOST | ?delete |
["/foo","/bar"] |
delete /foo and /bar recursively |
uPOST | msg=foo |
send message foo into server log |
|
mPOST | act=tput , body=TEXT |
overwrite markdown document at URL |
upload modifiers:
http-header | url-param | effect |
---|---|---|
Accept: url |
want=url |
return just the file URL |
Accept: json |
want=json |
return upload info as json; same as ?j |
Rand: 4 |
rand=4 |
generate random filename with 4 characters |
Life: 30 |
life=30 |
delete file after 30 seconds |
CK: no |
ck |
disable serverside checksum (maybe faster) |
CK: md5 |
ck=md5 |
return md5 checksum instead of sha512 |
CK: sha1 |
ck=sha1 |
return sha1 checksum |
CK: sha256 |
ck=sha256 |
return sha256 checksum |
CK: b2 |
ck=b2 |
return blake2b checksum |
CK: b2s |
ck=b2s |
return blake2s checksum |
-
life
only has an effect if the volume has a lifetime, and the volume lifetime must be greater than the file's -
server behavior of
msg
can be reconfigured with--urlform
method | params | result |
---|---|---|
GET | ?reload=cfg |
reload config files and rescan volumes |
GET | ?scan |
initiate a rescan of the volume which provides URL |
GET | ?stack |
show a stacktrace of all threads |
method | params | result |
---|---|---|
GET | ?pw=x |
logout |
GET | ?grid |
ui: show grid-view |
GET | ?imgs |
ui: show grid-view with thumbnails |
GET | ?grid=0 |
ui: show list-view |
GET | ?imgs=0 |
ui: show list-view |
GET | ?thumb |
ui, grid-mode: show thumbnails |
GET | ?thumb=0 |
ui, grid-mode: show icons |
on writing your own hooks
hooks can cause intentional side-effects, such as redirecting an upload into another location, or creating+indexing additional files, or deleting existing files, by returning json on stdout
reloc
can redirect uploads before/after uploading has finished, based on filename, extension, file contents, uploader ip/name etc.idx
informs copyparty about a new file to index as a consequence of this uploaddel
tells copyparty to delete an unrelated file by vpath
for these to take effect, the hook must be defined with the c1
flag; see example reloc-by-ext
a subset of effect types are available for a subset of hook types,
- most hook types (xbu/xau/xbr/xar/xbd/xad/xm) support
idx
anddel
for all http protocols (up2k / basic-uploader / webdav), but not ftp/tftp/smb - most hook types will abort/reject the action if the hook returns nonzero, assuming flag
c
is given, see examples reject-extension and reject-mimetype xbu
supportsreloc
for all http protocols (up2k / basic-uploader / webdav), but not ftp/tftp/smbxau
supportsreloc
for basic-uploader / webdav only, not up2k or ftp/tftp/smb- so clients like sharex are supported, but not dragdrop into browser
to trigger indexing of files /foo/1.txt
and /foo/bar/2.txt
, a hook can print(json.dumps({"idx":{"vp":["/foo/1.txt","/foo/bar/2.txt"]}}))
(and replace "idx" with "del" to delete instead)
- note: paths starting with
/
are absolute URLs, but you can also do../3.txt
relative to the destination folder of each uploaded file
- outgoing replies will always fit in one packet
- if a client mentions any of our services, assume it's not missing any
- always answer with all services, even if the client only asked for a few
- not-impl: probe tiebreaking (too complicated)
- not-impl: unicast listen (assume avahi took it)
reduce the size of an sfx by removing features
if you don't need all the features, you can repack the sfx and save a bunch of space; all you need is an sfx and a copy of this repo (nothing else to download or build, except if you're on windows then you need msys2 or WSL)
393k
size of original sfx.py as of v1.1.3310k
after./scripts/make-sfx.sh re no-cm
269k
after./scripts/make-sfx.sh re no-cm no-hl
the features you can opt to drop are
cm
/easymde, the "fancy" markdown editor, saves ~89khl
, prism, the syntax hilighter, saves ~41kfnt
, source-code-pro, the monospace font, saves ~9kdd
, the custom mouse cursor for the media player tray tab, saves ~2k
for the re
pack to work, first run one of the sfx'es once to unpack it
note: you can also just download and run /scripts/copyparty-repack.sh -- this will grab the latest copyparty release from github and do a few repacks; works on linux/macos (and windows with msys2 or WSL)
you need python 3.9 or newer due to type hints
the rest is mostly optional; if you need a working env for vscode or similar
python3 -m venv .venv
. .venv/bin/activate
pip install jinja2 strip_hints # MANDATORY
pip install argon2-cffi # password hashing
pip install pyzmq # send 0mq from hooks
pip install mutagen # audio metadata
pip install pyftpdlib # ftp server
pip install partftpy # tftp server
pip install impacket # smb server -- disable Windows Defender if you REALLY need this on windows
pip install Pillow pyheif-pillow-opener pillow-avif-plugin # thumbnails
pip install pyvips # faster thumbnails
pip install psutil # better cleanup of stuck metadata parsers on windows
pip install black==21.12b0 click==8.0.2 bandit pylint flake8 isort mypy # vscode tooling
if you just want to modify the copyparty source code (py/html/css/js) then this is the easiest approach
build the sfx using any of the following examples:
./scripts/make-sfx.sh # regular edition
./scripts/make-sfx.sh fast # build faster (worse js/css compression)
./scripts/make-sfx.sh gz no-cm # gzip-compressed + no fancy markdown editor
uses the included prebuilt webdeps
if you downloaded a release source tarball from github (for example copyparty-1.6.15.tar.gz so not the autogenerated one) you can build it like so,
python3 -m pip install --user -U build setuptools wheel jinja2 strip_hints
bash scripts/run-tests.sh python3 # optional
python3 -m build
if you are unable to use build
, you can use the old setuptools approach instead,
python3 setup.py install --user setuptools wheel jinja2
python3 setup.py build
# you now have a wheel which you can install. or extract and repackage:
python3 setup.py install --skip-build --prefix=/usr --root=$HOME/pe/copyparty
also builds the sfx so skip the sfx section above
WARNING: rls.sh
has not yet been updated with the docker-images and arch/nix packaging
does everything completely from scratch, straight from your local repo
in the scripts
folder:
- run
make -C deps-docker
to build all dependencies - run
./rls.sh 1.2.3
which uploads to pypi + creates github release + sfx
mostly fine on android, but still haven't find a way to massage iphones into behaving well
- conditionally starting/stopping mp.fau according to mp.au.readyState <3 or <4 doesn't help
- loop=true doesn't work, and manually looping mp.fau from an onended also doesn't work (it does nothing)
- assigning fau.currentTime in a timer doesn't work, as safari merely pretends to assign it
- on ios 16.7.7, mp.fau can sometimes make everything visibly work correctly, but no audio is actually hitting the speakers
can be reproduced with --no-sendfile --s-wr-sz 8192 --s-wr-slp 0.3 --rsp-slp 6
and then play a collection of small audio files with the screen off, ffmpeg -i track01.cdda.flac -c:a libopus -b:a 128k -segment_time 12 -f segment smol-%02d.opus
- optimization attempts which didn't improve performance
- remove brokers / multiprocessing stuff; https://github.com/9001/copyparty/tree/no-broker
- reduce the nesting / indirections in
HttpCli
/httpcli.py
- nearly zero benefit from stuff like replacing all the
self.conn.hsrv
with a localhsrv
variable
- nearly zero benefit from stuff like replacing all the
- single sha512 across all up2k chunks?
- crypto.subtle cannot into streaming, would have to use hashwasm, expensive
- separate sqlite table per tag
- performance fixed by skipping some indexes (
+mt.k
)
- performance fixed by skipping some indexes (
- audio fingerprinting
- only makes sense if there can be a wasm client and that doesn't exist yet (except for olaf which is agpl hence counts as not existing)
os.copy_file_range
for up2k cloning- almost never hit this path anyways
- up2k partials ui
- feels like there isn't much point
- cache sha512 chunks on client
- too dangerous -- overtaken by turbo mode
- comment field
- nah
- look into android thumbnail cache file format
- absolutely not
- indexedDB for hashes, cfg enable/clear/sz, 2gb avail, ~9k for 1g, ~4k for 100m, 500k items before autoeviction
- blank hashlist when up-ok to skip handshake
- too many confusing side-effects
- blank hashlist when up-ok to skip handshake
- hls framework for Someone Else to drop code into :^)
- probably not, too much stuff to consider -- seeking, start at offset, task stitching (probably np-hard), conditional passthru, rate-control (especially multi-consumer), session keepalive, cache mgmt...