Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flawed Architecture: The Code that the Patches are Applied to, is Expected to be Available at a Single URL #144

Open
martinvahi opened this issue Aug 18, 2017 · 2 comments

Comments

@martinvahi
Copy link

martinvahi commented Aug 18, 2017

The Flaw/Bug

Currently the patching Makefiles seem to contain some
mechanism for downloading the source from a single URL.
For example, at least one of the 2017_08_18 versions
of the libevent
contains the following 5 lines:

UPSTREAM=https://github.com/libevent/libevent/releases/download/release-2.0.22-stable/libevent-2.0.22-stable.tar.gz
TARBALL=$(notdir $(UPSTREAM))

# ... some Makefile code omitted to make this citation shorter

dl/$(TARBALL):
    mkdir -p dl
    ../scripts/fetch.sh ${UPSTREAM} dl/$(TARBALL)

If, for whatever reason X, Y, Z
the single URL is not available, the collection of rumprun-packages
that is meant to be available over a longer timeperiod, becomes broken.
If there are dependencies between the rumprun-packages, then
the lack of a package at the "rootlike level" of the dependence tree
makes the whole tree broken, unavailable.

Proposed System

A smarter solution is to describe the patchable code through some
SHA256 or other secure hash of a tar file that contains the code.
That way it does not matter, from where the code gets downloaded
and different users can use "warez-like" file sharing networks for
downloading the packages by using Bittorrent-like solutions.
The packaging system should just include the size and secure hash
of the tar-file that contains the patchable code.

A Simple Bootstrapping System

The central repository, at this day and age, the very Git repository that
this bug report is attached to, should contain a plain text file,
where there is at most one URL per line and
those URL's refer to other text files all around the world, at the servers
of different volunteers, who serve files out using plain http/https.
The text files at the volunteers' servers (hereafter: local_list_of_files), contain
RELATIVE FILE PATHS of the tar files that contain the patchable code.

Demo

My Declaration of Interests/Biases

I'm very biased by writing this bug report right now, because
I have my own small project called Silktorrent, which
ended up being my personal self-education endeavor and where I
tried to develop base technology for totally censorship proof web
applications, including software package distribution. Part of the Silktorrent
use case is that the Internet is totally offline and people can exchange
files only with USB-sticks, possibly with "mail-pigeon-drones" that
transport the USB-sticks. The concept of defining files through
their location, URL, does not make sense in that scenario
and as it turns out, the P2P file sharing systems define files
not by their location, URL, but through the secure hash of the file.

The end result is one efficiency wise terribly stupidly written Bash-and-Ruby script (archival copy,
if no command line arguments are given, then the script prints usage instructions)

that, despite all the "bloat", does essentially only one thing:
it creates a tar-file and renames the tar-file according to its secure hash.

(Actually, the script can also "unpack" the tar files and
verify the file name format without reading the file and
salts the tar-file at its creation to make it possible to "package" the same "payload" to
tar files that have different hashes, which forces the censors to
download at least parts of the tar-files to see, whether the tar-file contains censored
material. At some point the downloading and checking should
overwhelm the censoring system.)

There's no need to use the Silktorrent script for the rumprun-packages,
because probably a simpler tar-file creation and renaming implementation
will work just fine, but I use my script for this demo.

The Core of the Demo

At some "central" location, quotes because it doesn't need to be, should not be, central,
there is a list of URLs to text files that list relative file paths.
An example (archival copy) of one such URL:

https://demodrome.softf1.com/rumpkernel_org/distribution_demo_001/list_of_relative_file_paths.txt

By combining the URL with relative file paths
at the "list_of_relative_file_paths.txt",
URLs of the tar-files can be derived.

Thank You for reading my comment.

@anttikantee
Copy link
Member

It is impossible to have a flawed architecture when there is no architecture at all.

An architect's position is available. Run your scheme by the mailing list, and in all likelyhood, start pushing. The only requirement is that there may not be usability regressions (especially for non-developers).

@martinvahi
Copy link
Author

Thank You for the answer and for the encouragement.

I'll need to write the downloader first, regardless of, whether the proposal
gets accepted or rejected later. What regards to usability regressions then
could You please tell/write, what operating systems must the build scripts run on?

So far all of my code has been running only on Linux and BSD and
it has been occasionally tested on CygWin, but the CygWin is
not something that many people have available and even on Linux
not all people have the newest Ruby installed.

The other issue with my proposed scheme is that it uses the
nice, fast, secure hash console tools that are available on Linux and BSD,
but which might not be available on a CygWin installation.
Dependency tree completeness wise the most reliable solution that I'm aware of, is
to create a VirtualBox appliance, because then everything can be pre-tested and installed, but the
problem with VirtualBox appliances is that they are huge, in my case,
about 22GiB minimum. In practice I ran into trouble with one of my clients,
because the ~100GiB virtual appliance was difficult to download.
Another, in my view, even more serious, issue with the VirtualBox
appliances is that they assume the use of x86 CPU based hardware,
but the x86 CPU-s boot non-Windows operating systems only
due to the mercy of the Microsoft
and the AMD and Intel also have microcode update
capability, which sounds like a huge security hole to me. Add to that the
issue that both, the AMD and the Intel have been making quite an
effort to keep the market free of other x86 manufacturers
(the Via must have been an accident of the AMD/Intel lawyers)
and the various non-x86 CPU-s are pretty much the future of the
hardware that requires security and can be used
for privacy respecting applications. x86 specific
VirtualBox appliances will not run on non-x86 hardware,
unless the hardware has some x86 mode like the former
Crusoe CPUs
and the Elbrus had

Build systems tend to be special purpose applications software
that, like applications software, have their dependencies.
The reason, why my Silktorrent script
is such a slow monster is that I first wanted to make it as
portable and "free-of-dependencies" as possible and thought
that if I write it in the very old Bash, then it should be pretty
"foolproof" in terms of lacking dependencies. After all,
the core of it is that it just creates a tar-file and renames it.
When I run into the checks and string processing code,
then I thought that I'll use some very old, "standard",
command line tools like the Awk and gawk, but
that was a mistake, because it turned out that the
BSD and Linux installations ave different "Awk-s" and
due to the various Awk and commandline related
quirks the Ruby code ended up being simpler and
more portable than the Awk code, so the fast and
quick-to-boot Awk calls were replaced with
slow-to-boot Ruby interpreter launches and the
slowness of the script comes from the re-initialization
of the Ruby interpreter, which allocates about 40MiB
of RAM at every start-up. The end result, the 2017_03 Silktorrent script,
is essentially rigorously adhering to a speed optimization ANTI-pattern.
My conclusion of that experience in the context of Rumpkernel.org
"usability" is that the use of Bash or other "simple-and-light"
tools leads to a heavy and unoptimized solution and
it's smarter to build any build/test/make scripts
by using something more capable, "heavy", from
the very start.
In my case the preferred language is Ruby,
for which I have my own libraries (under the BSD license).

What regards to development methodology, then
I believe (at least I prefer) a solution, where no person
should be required to modify other person's code and
within one's own code people use whatever they want, as long
as the resource usage of their code fits the project specific limits.
Code reviews are OK, depending on how they are carried out,
including, how much freedom people are given, id est
I can adhere to other people's style requirements, depending on what they are, but
I certainly find it very stupid to require others to follow my style preferences
.
However, generally I do not believe in manual code reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants