CID length and identity hashes #21

Stebalien · 2018-06-08T19:40:04Z

Moved from: ipfs/kubo#4918 as this isn't go-ipfs specific and will affect the spec.

Basically, we'd like to allow inlining small blocks into CIDs (using the identity hash function) for performance reasons. However, the larger the block we allow to be inlined, the less user friendly CIDs get. Unfortunately, we have to pick a "default inlined size" up front or we'll end up changing a bunch of hashes later.

Open questions:

Do we have a hard limit. That is, do we say that all CIDs must be shorter than X?
What should be the maximum size of CIDs created by default?

@whyrusleeping @kevina @diasdavid @vmx @kyledrake

Stebalien · 2018-06-08T19:54:43Z

@kevina's options:

Don't set a hard limit on CID digest size, but by default id hashes will have a maxium digest length (and thus content length) of 64 bytes
Set a hard limit of 128 bytes on digest length (to keep things from getting to out of hand, but also to not artificially limit our options) but limit id hashes to 64 bytes by default.
Set a hard limit of 64 bytes on digest length and thus limit id hashes to this length

@Stebalien's additional options:

Soft limit of 38 bytes for the entire CID. That'll allow a base32 encoded CID to fit in a domain name segment.
Soft limit of 42 bytes for the entire CID. That's what we use for inlining peer IDs (although we may want to reduce this to 38 given the DNS restriction.

Unfortunately, my options limit the utility of this feature. However, they do increase the usability.

kevina · 2018-06-08T20:09:37Z

It is important to note that the 64 bytes comes from the maximum size modern crypro. hashes output (512 bits). If we set the limit lower than this will we prevent the option of using the all the output bits.

Stebalien · 2018-06-08T20:13:43Z

We definitely can't set a lower hard limit, but we could set a lower auto-inline limit. That really depends on how likely we feel we are to move to a larger hash sometime soon.

vmx · 2018-06-13T08:57:33Z

I don't have a strong opinion on the limits. Though I'm not really sure about this whole data inlining. It will make the whole stack more complicated. Currently it's always "CID + data" and then if will become "CID + maybe data, depending on the CID". This is a huge change a lot of components need to learn about.

whyrusleeping · 2018-06-13T09:58:03Z

@vmx i'm not entirely sure what you mean. Conceptually, its pretty simple. We're just allowing the 'hash' function of the CID to be f(x) = x. Everything else works exactly the same. The thing this enables though, is a cool optimization where we don't have to actually store data for CIDs using this particular 'hash' function.

vmx · 2018-06-13T10:07:06Z

@whyrusleeping I'm thinking in in code. Currently it's:

get request via CID
ask storage for this CID
return the thing the storage returned

Then a new step between 1 and 2 is introduced:

check if there's inlined data
- if yes, return that
- else go on with 2.

I'm not sure how bad this really is. If you all think that's not really a big deal, that's fine for me :)

Stebalien · 2018-06-13T18:27:19Z

So, our plan is to just modify the block service to "do the right thing". That is, when you try to put a block with a CID that uses the identity hash, it'll just throw it away. When you try to get the block, it'll extract it from the CID.

Currently we have to create indirect, large CIDs even for really tiny objects, files, and directories.

richardschneider · 2018-06-14T06:00:27Z

It was easy to change my block service to support getting an inlined CID.

However, I have some concerns when putting.

this should be optional/experimental behaviour. Otherwise, pre-existing tests fail because the CID is different
as others have commented, a default limit for inlining is needed

kevina · 2018-06-14T06:41:31Z

this should be optional/experimental behaviour. Otherwise, pre-existing tests fail because the CID is different

Why?

richardschneider · 2018-06-14T06:48:46Z

@kevina putting a small block in a test, blockService.Put(byte[] { 0x01 }), generates a different CID if CID inlining is enabled. By definition CID v1 must be used, whereas without CID inlining, CID v0 can be used.

kevina · 2018-06-14T06:53:17Z

@richardschneider the automatic use of identity hashes will require a command line flags that is not enabled by default. See ipfs/kubo#4910 for a proof-of-concept implementation.

richardschneider · 2018-06-14T07:18:37Z

@kevina Thanks, did not know about --id-hash-limit option. So question 1 is answered.

Does the limit specify the number bytes in (1) the data block size or (2) the identity hash digest size or (3) the CID binary length?

Also, is id an alias for the identity hash algorithm?

kevina · 2018-06-14T07:25:47Z

Does the limit specify the number bytes in (1) the data block size or (2) the identity hash digest size or (3) the CID binary length?

(2) the identity hash digest size

Also, is id an alias for the identity hash algorithm?

Yes

richardschneider · 2018-06-14T07:39:43Z

Thanks @kevina. You have provided enough info for me to implement the putting. Cheers!

richardschneider · 2018-06-14T08:53:32Z

What should be the behavior when the block service remove is called with an inline Cid?

I'm thinking this a no-op and no error is returned.

Stebalien · 2018-06-14T18:16:08Z

I'm thinking this a no-op and no error is returned.

I agree. Personally, I'd prefer it if moved to being idempotent over deletes for both performance and usability reasons.

richardschneider · 2018-06-15T03:01:50Z

@kevina Would you mind commenting on richardschneider/net-ipfs-engine#20

Stebalien assigned daviddias and whyrusleeping Jun 8, 2018

richardschneider mentioned this issue Jun 14, 2018

Alias for HashingAlgorithm name richardschneider/net-ipfs-core#63

Closed

Stebalien mentioned this issue Jun 14, 2018

Length Limit for CID/Multihash? ipfs/kubo#4918

Closed

Stebalien mentioned this issue Aug 23, 2018

Add support for inlinling via the id-hash ipfs/kubo#5281

Merged

ianopolous mentioned this issue Jul 9, 2020

Size limit of identity hash multiformats/multihash#130

Open

en0ma mentioned this issue Jul 19, 2022

failed to compute piece commitment - blockstore: block not found application-research/estuary#330

Closed

rphair mentioned this issue Dec 1, 2023

CIP-0100? | Governance Metadata cardano-foundation/CIPs#556

Merged

daviddias removed their assignment Jan 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CID length and identity hashes #21

CID length and identity hashes #21

Stebalien commented Jun 8, 2018 •

edited

Loading

Stebalien commented Jun 8, 2018

kevina commented Jun 8, 2018

Stebalien commented Jun 8, 2018

vmx commented Jun 13, 2018

whyrusleeping commented Jun 13, 2018

vmx commented Jun 13, 2018

Stebalien commented Jun 13, 2018

richardschneider commented Jun 14, 2018 •

edited

Loading

kevina commented Jun 14, 2018

richardschneider commented Jun 14, 2018

kevina commented Jun 14, 2018 •

edited

Loading

richardschneider commented Jun 14, 2018

kevina commented Jun 14, 2018

richardschneider commented Jun 14, 2018

richardschneider commented Jun 14, 2018

Stebalien commented Jun 14, 2018

richardschneider commented Jun 15, 2018

CID length and identity hashes #21

CID length and identity hashes #21

Comments

Stebalien commented Jun 8, 2018 • edited Loading

Stebalien commented Jun 8, 2018

kevina commented Jun 8, 2018

Stebalien commented Jun 8, 2018

vmx commented Jun 13, 2018

whyrusleeping commented Jun 13, 2018

vmx commented Jun 13, 2018

Stebalien commented Jun 13, 2018

richardschneider commented Jun 14, 2018 • edited Loading

kevina commented Jun 14, 2018

richardschneider commented Jun 14, 2018

kevina commented Jun 14, 2018 • edited Loading

richardschneider commented Jun 14, 2018

kevina commented Jun 14, 2018

richardschneider commented Jun 14, 2018

richardschneider commented Jun 14, 2018

Stebalien commented Jun 14, 2018

richardschneider commented Jun 15, 2018

Stebalien commented Jun 8, 2018 •

edited

Loading

richardschneider commented Jun 14, 2018 •

edited

Loading

kevina commented Jun 14, 2018 •

edited

Loading