Skip to content

Commit

Permalink
Big rewrite
Browse files Browse the repository at this point in the history
1. Foliage takes as input a complete description of the index, where
   source distributions and revisions come with a timestamp. This allows
   us to recreate the entire index in a reproducible way.

2. Added a experimental command to import an index from a Hackage (as
   downloaded with Cabal). This was originally a testing/development
   need but there might be different use cases.
  • Loading branch information
andreabedini committed Mar 28, 2022
1 parent 5ce3fc0 commit 080197e
Show file tree
Hide file tree
Showing 30 changed files with 1,022 additions and 1,785 deletions.
4 changes: 4 additions & 0 deletions NOTES-HRT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Hackage Repo Tool

It looks like the update re-adds cabal files to the index based on
timestamp but package.json on content?
173 changes: 77 additions & 96 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,141 +5,122 @@ A hash-friendly Haskell Package Repository.
Foliage is a tool to create custom or private Haskell package repositories,
in a fully reproducible way.

## Background

The problem of build reproducibility in the Haskell ecosystem has discussed
many times. Hackage does not natively offer a way to pin down the files it
serves.

Although there are workarounds to obtain a fixed repository (e.g. by
truncating the index file, which is append only) I think we can solve this
at the root.

## Main idea

_Like GitHub Pages but for Haskell Packages_

A "Hackage repository" is collection of files (source distributions, cabal
files, public keys and signatures).
A "Hackage repository" is collection of source distributions and cabal
files. In addition, Hackage has implemented [The Update
Framework (TUF)](https://theupdateframework.com) and the repository also
includes cryptographic metadata (public keys and signatures).

These files are commonly served by Hackage proper, that is the central
deployment of [hackage-server](https://github.com/haskell/hackage-server/).

Foliage explores the idea of serving this content as a static website,
which is generated programmatically from a small set of input files.
Foliage explores the idea of creaating and serving this content as a static
website, generated programmatically from textual input files.

Both the input files and the generated repository can be stored in a git
repository and referred to via stable URL corresponding to commit hashes.
## Example

Foliage expects a folder `_sources` with a subfolder per package name and
version.

## Example
E.g.

An input file could look like the following
```
_sources
└── typed-protocols
   └── 0.1.0.0
   └── meta.toml
```

The file `meta.toml` describes a package and looks like this

```toml
[[sources]]
url = "https://.../source1.tar.gz"

[[sources]]
url = "https://.../source2.tar.gz"
subdirs = [
"a",
"b",
"c"
]
timestamp = 2022-03-28T07:57:10Z
url = 'https://github.com/input-output-hk/ouroboros-network/tarball/d2d219a86cda42787325bb8c20539a75c2667132'
subdir = 'typed-protocols' # optional
```

This file basically mirrors the functionality of
[`source-repository-package`](https://cabal.readthedocs.io/en/3.6/cabal-project.html#specifying-packages-from-remote-version-control-locations)
in Cabal.
Foliage will download the source url for each package (assumed to be a
tarball), decompress it, make a source distribution and take the cabal
file.

For each source (and each subdir, if any is specified), foliage will
download the tarball and make a sdist. Foliage will then use the
hackage-repo-tool to create an on-disk repository (e.g. in `_repo`) from
the collected packages. Additionally, one can specify revisions to each
package version.
After all packages have been processed, foliage will create a repository,
including the index and the TUF metadata. With the input above foliage will
produce the following:

```
_repo
├── 01-index.tar
├── 01-index.tar.gz
├── index
│   └── typed-protocols
│   └── 0.1.0.0
│   ├── package.json
│   └── typed-protocols.cabal
├── mirrors.json
├── package
│   └── typed-protocols-0.1.0.0.tar.gz
├── root.json
├── snapshot.json
└── timestamp.json
```

* `typed-protocols-0.1.0.0.tar.gz` is obtained by running
`cabal sdist` of the repository (and, optionally, subfolder) specified in
`meta.toml`.
* `type-protocols.cabal` is extracted from the repository.
* `01-index.tar` will include the cabal files and signed target file, using
the timestamp in `meta.toml`.
```bash
$ TZ=UTC tar tvf _repo/01-index.tar
-rw-r--r-- foliage/foliage 1627 2022-03-28 07:57 typed-protocols/0.1.0.0/typed-protocols.cabal
-rw-r--r-- foliage/foliage 833 2022-03-28 07:57 typed-protocols/0.1.0.0/package.json
```
* The TUF files (`mirrors.json`, `root.json`, `snapshot.json`,
`timestamp.json`) are signed and contains reasonable defaults.

## Revisions

Foliage supports cabal file revisions. Adding the following snippet to a
package's `meta.toml`, will make foliage look for a cabal file in
`<pkgName>/<pkgVersion>/revisions/1.cabal`.

```
[[revisions]]
number = 1
timestamp = 2022-03-22T14:15:00+00:00
```

The revised cabal file will enter the index with the timestamp provided in
`meta.toml`.

## Using the repository with cabal

The resulting repository can then be server through HTTPS and used with
cabal, e.g. in a `cabal.project`
cabal, e.g. in a `cabal.project`:

```
repository packages.example.org
url: https://packages.example.org/
secure: True
```

Alternatively, cabal can read the repository directly off disk
Alternatively, cabal can read the repository directly off disk:

```
repository packages.example.org
url: file:///path/to/_repo
secure: True
```

**Note:** The package id (package name + package version) is unknown at
download time and only known after looking at the cabal file. This is the
reason package names and versions do not show in the input file. Foliage
ensures two sources do not provide colliding package ids.

**Note:** Hackage implements [The Update
Framework](https://theupdateframework.io) which requires a set of public
and private keys. Foliage can either generate a new set of keys or reuse a
pre-existing one. Cabal can either trust a repository at first use or
verify signatures against public keys obtained separately.

## GitHub

Foliage can make use of three features supported by GitHub, to further advance automation.

1. GitHub has long suppored accessing git repositories via HTTPS. E.g. one can access a blob in a git repo through the following URL.

https://raw.githubusercontent.com/{owner}/{repo}/{ref}/path

where `ref` can either be a commit hash or a branch name.

2. GitHub also offer URLs of tarballs for repos at given commit, e.g.

https://github.com/Quid2/flat/tarball/ee59880f47ab835dbd73bea0847dab7869fc20d8

Afaik, these tarballs might not be entirely immutable (TODO)

3. GitHub offers URLs for tagged releases (these tarballs are supposed to be immutable).

4. GitHub Actions can be used to automate the generation

5. (Perhaps optional) GitHub Pages supports publishing a git branch over HTTP.

This means we automatically have a stable url for any package whose source is available on GitHub.
Also the generated repository can be committed to a git branch and be immediately available through HTTPS to cabal.

E.g.

This configuration

https://github.com/andreabedini/byo-hackage/blob/933760117a3800366b420b07c8c887c1313e2b22/packages.tsv

(warning old TSV format)

Generated this repo https://github.com/andreabedini/byo-hackage/tree/1e8c5184836acb05972dfff00ac8edca575e1df1

Which can be give to cabal like this

```
repository my-hackage-repo
url: https://raw.githubusercontent.com/andreabedini/byo-hackage/1e8c5184836acb05972dfff00ac8edca575e1df1
secure: True
```

## To infinity and Beyond

One can think of more features

- A pretty website could be automatically generated along with the
repository. With a list of packages, their versions, metadata, etc
- The input file itself could be automatically generated, e.g. from all
tagged releases in a GitHub organisation. Making it a turn-key Hackage
repository for any GitHub Organisation.

## Author

- Andrea Bedini (@andreabedini)
Expand Down
Loading

0 comments on commit 080197e

Please sign in to comment.