Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prioritizing unixfs-v2 #19

Closed
mikeal opened this issue Jan 22, 2019 · 11 comments
Closed

Prioritizing unixfs-v2 #19

mikeal opened this issue Jan 22, 2019 · 11 comments
Labels

Comments

@mikeal
Copy link

mikeal commented Jan 22, 2019

Files in IPFS are currently encoded using the old dag-pb and often with CIDv0.

For some time we'e been working on a specification for encoding files using dag-cbor (or any IPLD code that supports the full data model).

There we numerous reasons we started the transition away from dag-pb including ease of development and performance. The longer we put off completing this transition in IPFS the more "old" data we'll be creating. Additionally, a lot of performance work we might do in IPFS may end up getting thrown out in this transition since it's based on the old encoding system.

We expect the unixfs-v2 spec to continue to evolve over time based on feedback from implementations. However, we now have one independent implementation and think it's time for IPFS to begin adopting it and working with the IPLD team on incorporating any feedback into the spec.

Given our workload and the limited resources in IPLD we'd like to know what priority this has in the IPFS project and where it should fit in the roadmap and OKR's so that we can appropriately support IPFS' adoption.

@momack2
Copy link
Contributor

momack2 commented Feb 2, 2019

This is a good question - thanks for raising it. Would love thoughts from @Stebalien @alanshaw @daviddias on the extent they see this contributing to go/js goals (and the extent to which work we're doing now might be deprecated by this transition)

@alanshaw
Copy link
Member

alanshaw commented Feb 4, 2019

It's currently part of a P2 milestone in the JS roadmap and I literally just (ROUGHLY) timelined this to be completed in Q3.

That said, which implementation is ready? JS? We should get it in earlier if so, and people can start playing with it and then switch the default when we're happy?

@mikeal
Copy link
Author

mikeal commented Feb 4, 2019

@alanshaw there’s an independent implementation in JS but it wouldn’t be good for more than just a reference point, it wasn’t designed to be integrated into IPFS, it was only designed to test that the spec was implementable. For instance, it doesn’t use js-ipld or any of the other interfaces we tend to use. Also, it needs to be updated to the next iteration of the spec to be path based instead of node based.

That said, if you give me a list of the interfaces it needs to use and what functionality you’d like to see from the implementation, I can go off and write another implementation fairly quickly. I could even get this in the Q2 OKR’s if necessary.

@eocarragain
Copy link

As of this commit the 2019 IPFS roadmap has: "Go and JS IPFS enable modern IPFS data formats (UnixFSv2, CIDv1, raw blocks) by default and in a reproducible way"

The last part is also relevant to recent discussions on reproducible file imports here: ipld/legacy-unixfs-v2#15

@momack2
Copy link
Contributor

momack2 commented Mar 20, 2019

@Stebalien says this would be great to happen in Q2 - this could be a great excuse to integrate go-ipld-prime into go-ipfs. Aiming to have this as optional / in heavy testing by EOQ puts us on a really good trajectory toward 1.0. Note, we'd want to switch to Rabin (or alt) chunking at the same time. I guess the question is whether @warpfork would be freeing up for this given his deep expertise/thinking.

@warpfork
Copy link
Member

Great goal, I'm onboard -- and simultaneous switch to (new) Rabin would indeed be ideal -- 50/50 on if that's actually reachable by early summer. But I'd be perfectly happy to be reaching in that direction.

@lidel
Copy link
Member

lidel commented May 21, 2019

Big 👍 for reproducible file imports being shipped with unixfsv2.
Lack of future-proofing in this area is a constant pain point and a source of bad rep discouraging adoption.

Sidenote: apart from storing metadata in unixfsv2 by default (ipld/legacy-unixfs-v2#15), we need tools to be able to deterministically freeze/reproduce all parameters during import (eg. ipfs add --fmt=<fmtstr>, as noted in ipld/legacy-unixfs-v2#15 (comment))

@mikeal
Copy link
Author

mikeal commented May 23, 2019

We found a rather elegant way to handle the chunker part of this w/ the IPLD type system. We can easily choose the same chunker when updating a file based on its Binary Type.

However, we need IPFS to have non-configurable logic on which chunker to choose for new files to make this entirely deterministic. We’ll also need the file metadata IPFS adds to files and directories to be consistent and non-configurable for the same reasons.

As far as the spec goes, it doesn’t guarantee determinism because so many things can be done optionally, but we can guarantee determinism in the way IPFS produces unixfsv2 files/directories if we are willing to remove the configuration.

@mikeal
Copy link
Author

mikeal commented May 23, 2019

Oh, another thing I’ve said to people but probably haven’t written down yet.

There is no “ideal chunker” for every file you come across. Different files will have different optimal chunkers. Rabin produces far more small chunks than you would like with compressed media, and doesn’t provide any useful de-duplication. Compressed media ideally has a chunker that understands the compression algorithm and can chunk the keyframe boundaries which makes range requests in the file operate much more efficiently.

At some point we’re going to want a content type dependent chooser that selects specific chunkers for compressed media, rabin for text, and a fixed size encoder for everything else (or maybe rabin, not sure what the profile here is).

@warpfork
Copy link
Member

So if we verify content downloaded (and we should [by running the hashing function locally and seeing if it matches with the query, combined with filesize in manifest]) then we’ll get false negatives.
-- from https://discuss.status.im/t/ipfs-alternatives-snt-utility/1228/5

Mega +1 from me on this too. I sometimes describe this concern as "IPFS is content-addressible on read... and content-plus-a-bunch-of-flags-addressable on write"... which is an issue that varies between merely terrifyingly to being an outright blocker depending on application. "content-plus-a-bunch-of-flags-addressable" gives up a lot of the benefits content-addressability promises in the first place!

There are lots of parts of the IPFS stack where it's perfectly sensible for libraries to be designed to be super configurable... but in IPFS as an application as a whole, we should be getting significantly less configurable for lots of these things, because too much flexibility is the source of this problem. It can be (seemingly paradoxically) better for the ecosystem as a whole if we don't expose so many knobs that it lets the ecosystem fracture itself based on relatively inconsequential twiddlings of those knobs!

we need IPFS to have non-configurable logic on which chunker to choose for new files to make this entirely deterministic.

👍👍👍

@github-actions
Copy link

github-actions bot commented Oct 9, 2023

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Oct 9, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants