Skip to content

Feedback/proposal: separate concerns more and provide a safe wrapper around the car format #416

@Jorropo

Description

@Jorropo

Note this mostly exclude indexes from the picture because I havn't used them and havn't needed them so I can't comment well on their API.

The APIs are either too low level and require consumers to have a copy of the car spec to be used or provide a level above the CAR format and requires consumer to provide a bunch of features they might not have.

APIs that are too level to be used without a copy of the car spec:

The things above are usefull, but I don't think they are enough to claim this librairy can be used to easily decode car files.
It's like trying to use encoding/json but you can only use json.Decoder.Token.

APIs that are too high level and provides features and types that are not needed to interact with the car format:

Thoses are usefull, but they are specialised helper functions, if I am not creating a car file from a random access CID block interface (github.com/ipfs/go-ipld-format.NodeGetter) or if I am not using https://pkg.go.dev/github.com/ipfs/go-merkledag or https://pkg.go.dev/github.com/ipld/go-ipld-prime I cannot use thoses.

Things I think are good:

  • CarReader
    It is simple, has one job (provide an iterator that read from an io.Reader and return you blocks as they are found in the carv1 stream), with a sane safe API, it does not require consumers to understand deep things about the carv1 spec.
  • BlockReader
    Same as above

Streaming a carv1 body from a stream of blocks.

This can't be found in neither the v1 or v2 packages.
You have to write this code:

util.LdWrite(writer, block.Cid().Bytes(), block.RawData())

Which is impossible to figure out for any new comer without a deep read and exploration of the car spec or by looking up some code that already do this.

Note that it's also really easy to messup, the ...[]byte might lead you to think you can do this for example: util.LdWrite(writer, block.Cid().Bytes(), block.RawData(), block2.Cid().Bytes(), block2.RawData()) but no this does not follow the car spec and will be silently incorrectly serialised.

I get why this API exists, I can see edge cases where it would be usefull, I don't think it is acceptable as the only way to stream a stream of blocks.


Things I think would make this better:

Provide an util.Ldwrite free way to stream a car body.
An API like this would be enough:

// BlockWriter streams blocks to an io.Writer.
type BlockWriter struct{/* ... */}

func NewBlockWriter(w io.Writer, roots []cid.Cid, opts ...WriteOption) (*BlockWriter, error) {/* ... */}

func (bw *BlockWriter) Write(b blocks.Block) error {/* ... */}

// WriteFromReader allows for zero copy through [io.ReaderFrom] or [io.WriterTo].
func (bw *BlockWriter) WriteFromReader(c cid.Cid, r io.Reader) error {/* ... */}

I would also move the helpers and lower level functions away in different packages. Given the current state creating a new package like github.com/ipld/go-car/simple bundling easy safe wrappers around the car format sounds simpler (no need to have a tool rewrite consumers to a new import path).


Somewhat out of scope notes:

It is impossible to do anything allocation free, random example about reading blocks:
It would be nice if Blockreaders object had a Peek() (cid []byte, block []byte, error) method, the difference is that it use bufio.Reader.Peek and returns a pointer to bufio.Reader's internal pointer, this allows to read a block without allocation.


Just so you know I'll make thoses changes to github.com/ipfs/boxo/car and provide a lighter API (just expose BlockReader and BlockWriter).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions