Prioritizing unixfs-v2 #19

mikeal · 2019-01-22T21:15:15Z

Files in IPFS are currently encoded using the old dag-pb and often with CIDv0.

For some time we'e been working on a specification for encoding files using dag-cbor (or any IPLD code that supports the full data model).

There we numerous reasons we started the transition away from dag-pb including ease of development and performance. The longer we put off completing this transition in IPFS the more "old" data we'll be creating. Additionally, a lot of performance work we might do in IPFS may end up getting thrown out in this transition since it's based on the old encoding system.

We expect the unixfs-v2 spec to continue to evolve over time based on feedback from implementations. However, we now have one independent implementation and think it's time for IPFS to begin adopting it and working with the IPLD team on incorporating any feedback into the spec.

Given our workload and the limited resources in IPLD we'd like to know what priority this has in the IPFS project and where it should fit in the roadmap and OKR's so that we can appropriately support IPFS' adoption.

The text was updated successfully, but these errors were encountered:

momack2 · 2019-02-02T10:14:25Z

This is a good question - thanks for raising it. Would love thoughts from @Stebalien @alanshaw @daviddias on the extent they see this contributing to go/js goals (and the extent to which work we're doing now might be deprecated by this transition)

alanshaw · 2019-02-04T13:09:01Z

It's currently part of a P2 milestone in the JS roadmap and I literally just (ROUGHLY) timelined this to be completed in Q3.

That said, which implementation is ready? JS? We should get it in earlier if so, and people can start playing with it and then switch the default when we're happy?

mikeal · 2019-02-04T18:03:10Z

@alanshaw there’s an independent implementation in JS but it wouldn’t be good for more than just a reference point, it wasn’t designed to be integrated into IPFS, it was only designed to test that the spec was implementable. For instance, it doesn’t use js-ipld or any of the other interfaces we tend to use. Also, it needs to be updated to the next iteration of the spec to be path based instead of node based.

That said, if you give me a list of the interfaces it needs to use and what functionality you’d like to see from the implementation, I can go off and write another implementation fairly quickly. I could even get this in the Q2 OKR’s if necessary.

eocarragain · 2019-02-04T20:31:25Z

As of this commit the 2019 IPFS roadmap has: "Go and JS IPFS enable modern IPFS data formats (UnixFSv2, CIDv1, raw blocks) by default and in a reproducible way"

The last part is also relevant to recent discussions on reproducible file imports here: ipld/legacy-unixfs-v2#15

momack2 · 2019-03-20T06:20:33Z

@Stebalien says this would be great to happen in Q2 - this could be a great excuse to integrate go-ipld-prime into go-ipfs. Aiming to have this as optional / in heavy testing by EOQ puts us on a really good trajectory toward 1.0. Note, we'd want to switch to Rabin (or alt) chunking at the same time. I guess the question is whether @warpfork would be freeing up for this given his deep expertise/thinking.

warpfork · 2019-03-20T09:45:02Z

Great goal, I'm onboard -- and simultaneous switch to (new) Rabin would indeed be ideal -- 50/50 on if that's actually reachable by early summer. But I'd be perfectly happy to be reaching in that direction.

lidel · 2019-05-21T19:39:51Z

Big 👍 for reproducible file imports being shipped with unixfsv2.
Lack of future-proofing in this area is a constant pain point and a source of bad rep discouraging adoption.

Sidenote: apart from storing metadata in unixfsv2 by default (ipld/legacy-unixfs-v2#15), we need tools to be able to deterministically freeze/reproduce all parameters during import (eg. ipfs add --fmt=<fmtstr>, as noted in ipld/legacy-unixfs-v2#15 (comment))

mikeal · 2019-05-23T15:34:54Z

We found a rather elegant way to handle the chunker part of this w/ the IPLD type system. We can easily choose the same chunker when updating a file based on its Binary Type.

However, we need IPFS to have non-configurable logic on which chunker to choose for new files to make this entirely deterministic. We’ll also need the file metadata IPFS adds to files and directories to be consistent and non-configurable for the same reasons.

As far as the spec goes, it doesn’t guarantee determinism because so many things can be done optionally, but we can guarantee determinism in the way IPFS produces unixfsv2 files/directories if we are willing to remove the configuration.

mikeal · 2019-05-23T19:44:20Z

Oh, another thing I’ve said to people but probably haven’t written down yet.

There is no “ideal chunker” for every file you come across. Different files will have different optimal chunkers. Rabin produces far more small chunks than you would like with compressed media, and doesn’t provide any useful de-duplication. Compressed media ideally has a chunker that understands the compression algorithm and can chunk the keyframe boundaries which makes range requests in the file operate much more efficiently.

At some point we’re going to want a content type dependent chooser that selects specific chunkers for compressed media, rabin for text, and a fixed size encoder for everything else (or maybe rabin, not sure what the profile here is).

warpfork · 2019-05-26T13:10:39Z

So if we verify content downloaded (and we should [by running the hashing function locally and seeing if it matches with the query, combined with filesize in manifest]) then we’ll get false negatives.
-- from https://discuss.status.im/t/ipfs-alternatives-snt-utility/1228/5

Mega +1 from me on this too. I sometimes describe this concern as "IPFS is content-addressible on read... and content-plus-a-bunch-of-flags-addressable on write"... which is an issue that varies between merely terrifyingly to being an outright blocker depending on application. "content-plus-a-bunch-of-flags-addressable" gives up a lot of the benefits content-addressability promises in the first place!

There are lots of parts of the IPFS stack where it's perfectly sensible for libraries to be designed to be super configurable... but in IPFS as an application as a whole, we should be getting significantly less configurable for lots of these things, because too much flexibility is the source of this problem. It can be (seemingly paradoxically) better for the ecosystem as a whole if we don't expose so many knobs that it lets the ecosystem fracture itself based on relatively inconsequential twiddlings of those knobs!

we need IPFS to have non-configurable logic on which chunker to choose for new files to make this entirely deterministic.

👍👍👍

github-actions · 2023-10-09T00:06:29Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

mikeal mentioned this issue Aug 8, 2019

UnixFS Reboot ipld/legacy-unixfs-v2#28

Closed

github-actions bot added the Stale label Oct 9, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prioritizing unixfs-v2 #19

Prioritizing unixfs-v2 #19

mikeal commented Jan 22, 2019 •

edited

Loading

momack2 commented Feb 2, 2019

alanshaw commented Feb 4, 2019

mikeal commented Feb 4, 2019 •

edited

Loading

eocarragain commented Feb 4, 2019

momack2 commented Mar 20, 2019

warpfork commented Mar 20, 2019

lidel commented May 21, 2019

mikeal commented May 23, 2019 •

edited

Loading

mikeal commented May 23, 2019

warpfork commented May 26, 2019

github-actions bot commented Oct 9, 2023

Prioritizing unixfs-v2 #19

Prioritizing unixfs-v2 #19

Comments

mikeal commented Jan 22, 2019 • edited Loading

momack2 commented Feb 2, 2019

alanshaw commented Feb 4, 2019

mikeal commented Feb 4, 2019 • edited Loading

eocarragain commented Feb 4, 2019

momack2 commented Mar 20, 2019

warpfork commented Mar 20, 2019

lidel commented May 21, 2019

mikeal commented May 23, 2019 • edited Loading

mikeal commented May 23, 2019

warpfork commented May 26, 2019

github-actions bot commented Oct 9, 2023

mikeal commented Jan 22, 2019 •

edited

Loading

mikeal commented Feb 4, 2019 •

edited

Loading

mikeal commented May 23, 2019 •

edited

Loading