Make base32 CIDv1 the default for go-ipfs #4143

kyledrake · 2017-08-15T01:36:37Z

I understand there's a switch to CIDv1 soon. I think go-ipfs should use lowercased base32 (rfc4648 - no padding - highest letter) as the default multibase.

The reason this encoding is preferable: it's the one encoding that will work with subdomains (RFC1035 + RFC1123). The restrictions are: case-insensitive, a-b0-9 and less than 63 bytes.

For a slight increase in length, you reap enormous benefits:

The ability to do proper security origins for the HTTP gateway with subdomains (cidv1abcde.dweb.link). This is very important if we want to handle reports with Google's safe browsing system (which is designed for origins). With the current design, all content is on the same browser origin, and a single phishing/malware report on any of the IPFS gateways (hosted by us or someone else) will make web browsers block every single thing on the origin with a giant red warning message until it's cleared up with Google (which from experience can take several days!)
Root paths are in the right place, which dramatically improves compatibility with existing web sites that tend to do a lot of this:
<img src="/rootimg.jpg">
Allows us to register dweb.link (and ipfs.io, etc.) to the Public Suffix List, which will prevent the sandboxed content from reading/manipulating cookies on the parent domain (and on other cidv1 subdomains).
Opens up the ability for go-ipfs to do HTTP Host Header parsing and automatic Let's Encrypt support (if we wanted to), so anyone can set up a public IPFS gateway without additional software. Once Let's Encrypt gets their wildcard cert domains shipped (Dec 2017), this could be a fully automated process. Otherwise something like nginx would be needed (I could write an example nginx.conf that people could use for it).

It should use lowercase base32 characters by default, so that it's consistent with subdomain usage (all the browsers will force lowercase). IIRC the RFC doesn't care if it's lowercased, I think people just default to upper case for legacy reasons.

Obviously an abstraction layer could be written that converts between base32 and something else for use with web gateways and then have a different default, but I think it would less confusing for end users to use one default: the one that will let origins in browsers work.

This approach shouldn't be a problem for webextension plugins, but @lidel feel free to chime in.

Further reading: https://github.com/neocities/hshca

The text was updated successfully, but these errors were encountered:

lidel · 2017-08-15T13:39:04Z

Good arguments, especially the one about GoogleSafeBrowsing's false-positives for public gateways 👍

CIDv1 format is strongly related to discussion at ipfs/in-web-browsers: Tackle identifying origins with (or without?) fs: paths.

I was unable to find definitive, final decision on which exact encoding will be used apart from @lgierth initially pondering "base16 or base32" and @samholmes suggesting base32 with Crockford's Encoding.

Was the decision made elsewhere?
If not, this ticket provides good opportunity to do so 🔧

daviddias · 2017-08-15T13:42:49Z

Thank you for creating this issue, @kyledrake. I agree with your proposal, we can take the opportunity that we are bringing CID to the world for the first time to get base32 as the new default.

If not handled internally correctly (i.e using the string format vs the binary format) it will add significant overhead, but that is just something we can change internally to make sure that we use memory efficiently.

ghost · 2017-08-15T13:48:10Z

I was unable to find definitive, final decision on which exact encoding will be used apart from @lgierth initially pondering "base16 or base32" and @samholmes suggesting base32 with Crockford's Encoding.

I'm strongly in favour of making base32 the general default encoding for CIDs everywhere. We need base32 for the ipfs:// URL scheme, and it'd suck if people had to deal with different CID encodings, or even have to use converter tools.

ghost · 2017-08-15T13:48:55Z

And, I think we haven't had any decision on it -- we just sticked with base58 as that was the original encoding used from the beginning.

daviddias · 2017-08-15T13:56:51Z

e just sticked with base58 as that was the original encoding used from the beginning.

Yeah, that was pretty much how the decision got made. Still in time to change though.

ghost · 2017-08-15T13:58:00Z

Still in time to change though.

Well I'm all for it :):)

samholmes · 2017-08-15T23:33:36Z

Notes on Base 32 Encoding

What would it take to get a new base added to the multibase table? Specifically, what would it take to add Crockford's Encoding to the table. As of commenting, it appears RFC4648 and z-base-32 are the only base 32 encodings included in the multibase spec.

An added reason to push for Crockford's Base32 is that it meets the same criteria as Base58Check, the base58 encoding Bitcoin uses for bitcoin addressses:

// Why base-58 instead of standard base-64 encoding?
// - Don't want 0OIl characters that look the same in some fonts and
// could be used to create visually identical looking account numbers.
// - A string with non-alphanumeric characters is not as easily accepted as an account number.
// - E-mail usually won't line-break if there's no punctuation to break at.
// - Doubleclicking selects the whole number as one word if it's all alphanumeric.

It seems to be like Crockford's Base32 naturally fits the same goals as Base58Check with the added feature of being case-insensitive.

ghost · 2017-08-15T23:40:08Z

Let's use whatever base32 variant Javascript and (less important) other programming languages use as their default base32. (I assume it's Crockford's)

samholmes · 2017-08-15T23:53:01Z

Notes on URLs and URI Schemes

From what I can tell, there is no obvious direction for solving issues surrounding URI Schemes and browser origin policies coupled with them thus far. However, my rough proposal is up for further commenting.

However, my inclination is to specify an alternative format and standard from URI. Then, leave it up to implementations to bridge this new format to a purposed URI scheme. Although a hack at the implementation level, it would open up an opportunity to re-think what a web address could be. Maybe a multiresource standard should be defined and added to the multiformats basket?

samholmes · 2017-08-16T00:05:55Z

@lgierth I don't know if there is a default base32 encoding in Javascript. If you would consider Javascript's native toString Number method:

var a = []; for (var i = 0; i < 32; i++) a.push((i).toString(32))
console.log(a);
// (32) ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v"]

It appears toString uses Base 32 Encoding with Extended Hex Alphabet from RFC4648.

Other than this, the Javascript community modules include many variants of base32 encodings; among them is Crockford's. So, it's safe to say that it's not an obscure encoding at the least.

kevina · 2017-08-16T00:30:44Z

@samholmes

What would it take to get a new base added to the multibase table?

I am not sold that Crockford's Base32 is better than than rfc4648 (that we already use the to in the flatfs datastore) but I don't see any problem with adding an entry to the table and implementing it in go-multibase. Step one would be to open an issue here: https://github.com/multiformats/multibase/issues.

kyledrake · 2017-08-16T00:32:00Z

Support for crockford base32 (base32check?) seems fairly widespread:

Since there seems to be a strong preference for it, I hereby revise the proposal to use crockford base32.

kyledrake · 2017-08-16T00:58:46Z

Worth noting is that there's several different flavors of base32, including (my personal favorite) one that Nintendo games used that was designed to avoid profanity.

I'm kindof indifferent as to which version gets used. I chose RFC because it's a standard, it's been around a while, nginx-misc-module supports it, and it probably has the widest support across all programming languages. My only strong preference here is that it's a variation most programming languages already support, so we can minimize devs having to re-invent wheels.

Crockford seems to fit the bill more-or-less as well as RFC, which is my rationale for being OK with using it.

@kevina would you have very strong objections to crockford being used by default by go-ipfs with cidv1?

kevina · 2017-08-16T01:22:14Z

@kyledrake concerning some of the issues (in particular the use of cidv1abcde.dweb.link) have a look at #1678 (comment) it the full issue rather long but contains lots of useful context in why we currently use /ipfs/Qm.../hash

kevina · 2017-08-18T20:48:14Z

@kyledrake if we switch to using Base32 I do not have any strong objection to using crockford over the RFC one. The only reason I would chose the RFC one is because it is a standard and more likely to have an implementation available as part of the language.

What I do have an slightly stronger objection to is switching to Base 32 from Base 58 due to the increase in length. Let's see how this is progress with various proposed changed:

What	Length	Increase	Example
CidV0	46		`QmUNLLsPACCz1vLxQVkXqqLX5R1X345qqfHbsf67hvA3Nn`
CidV1	49	+6.5%	`zdj7WbTaiJT1fgatdet9Ei9iDB5hdCxkbVyhyh8YTUnXMiwYi`
Base32	59	+28%	`BAFYBEICZSSCDSBS7FFQZ55ASQDF3SMV6KLCW3GOFSZVWLYARCI47BGF354`

Blank2b-256	52	+13%	`zDMZof1kvswQMT8txrmnb3JGBuna6qXCTry6hSifrkZEd6VmHbBm`
Base32	62	+35%	`BAFYKBZACEBUGFUTJIR6QIE7APO5SHPRY32RUWFI762UYTD5G3U2GK7TPSCNDQ`

So if we ultimately go with using CidV1 using Blank2b-256 and Base32 as the default the length of the Cid string will increase 35%. That is a non-trivial amount as apposed to the (I think) original plan of switching to CidV1 using same sha256 hash which provides a minor increase of 3 characters or 6.5%.

However, if everyone else is okay with this length increase I am not going to block a move to using Blank2b-256 and/or Base32.

samholmes · 2017-08-18T21:41:28Z

However, if everyone else is okay with this length increase I am not going to block a move to using Blank2b-256 and/or Base32.

I'm okay with the increase in length. If I am not mistaken, the trade-off would be making it easier to use the same CID within an /ipfs/<CID> address and a URL address. 😃

ghost · 2017-08-29T04:18:46Z

I'm very comfortable trading increased length for increased portability. Not making base32 or base16 the default means that the browser UX of IPFS will suffer. Even if we skip the <hash>.dweb.link idea, we need base32 CIDs for ipfs://<hash>, and not being able to paste CIDs from go-ipfs into the browser would be a little catastrophe :(

Support for crockford base32 (base32check?) seems fairly widespread:

I'd be more interested in what stdlib-type libraries use, rather than some individual's library. Random data points: golang's encoding/base32 uses RFC 4648, and the coreutils base32 command does too.

Could someone check what other important libraries and tools use, so that we get a small survey?

ghost · 2017-08-29T04:20:55Z

On a different note, we should default to lowercase base32 for readability (and of course accept reading both uppercase and lowercase).

kevina · 2017-08-29T04:28:53Z

Any objects I have to the increase length are mild. It just that things like increase in length can creep up on you and at some point a few years later we stand having keys 2-3 times the length of the original. Not saying it will happen, but want to explain where my (mild) objection is coming from.

@lgierth why won't our existing base (base58btc) work if you go with ipfs://<hash>? A pointer to another issue documentation is fine.

kevina · 2017-08-29T04:30:31Z

Also, I agree with using lowecase as the default.

Stebalien · 2017-08-29T04:32:43Z

In general, browsers assume that security origins are insensitive. With ipfs://hash, hash is the security origin.

ghost · 2017-08-29T04:35:12Z

As per the WHATWG URL spec, the hash in ipfs://<hash> is a domain, which needs to be a valid label according to RFC 1035.

That's why @kyledrake made hshca

kevina · 2017-08-29T04:41:46Z

@Stebalien @lgierth thanks

kevina · 2017-08-29T07:48:03Z

So, if we do make base32 the default so that we can represent them on the domain component of the URL, the question I have is: How will we reference CidV0 objects, since we can't completely eliminate them?

I created an issue to discuss this ipfs/go-cid#34.

lidel · 2022-04-08T00:30:51Z

go-ipfs 0.12 shipped the blockstore migration from full CID to Multihash keys, removing the final obstacle.
we can now switch produced CIDs to be CIDv1 by default! Migration to CIDv1 (default base32) specs#247 (comment)

@schomatis mind prioritizing this one and picking it up after you finish the current work?

This is an important UX change that is a long time due.
The entire ecosystem will be grateful for cleaning this up and producing CIDv1 by default. 🙏

Specific tasks would be to switch below places to create and output CIDv1 (instead of the current CIDv0):

MFS (ipfs files + default empty MFS root created on ipfs init)
legacy ipfs object new|put|patch (i know we deprecated those, but we have to do this for completeness, otherwise we will see people producing CIDv0 with them)
ipfs add – poc in test: ipfs add with CIDv1 as default #8185 (just to see which tests fail, ok to close it and open a new PR)

My thoughts:

This is a bigger chunk of work, but well scoped, and you could work on this without being blocked by others, or waiting for reviews.
These commands already have tests, after you flip default they need to be refactored to use CIDv1.
Given that ipfs add breaks most of the tests, you can focus on other tasks first.
Fine do it all in a single PR or have separate ones for MFS, add, and other commands. Whatever works better for you.

olizilla · 2022-04-08T10:56:20Z

@lidel If i'm reading this right, the greatest change since the invention of CIDs could be scheduled to land in go-ipfs 0.13... an obviously haunted release number. In light of this, the next release of go-ipfs shoud be v1 for parity with the CID version number. 😎

lidel · 2022-04-08T12:49:16Z

@olizilla this won't make the cut for 0.13, but I like the idea of bumping semver to 1.0.0 when we are ready to ship CIDv1 everywhere, will think about it around 0.14+ 👍

ps. before 1.x we also need to discuss renaming the project so something other than the generic "go-ipfs"

schomatis · 2022-04-13T13:48:27Z

@lidel Can I have a simple example of what would a CID change look like in ipfs files, please? Like what is the current behavior and what would you expect after the change. I'm still grappling with what are we aiming for in terms of UX.

schomatis · 2022-04-13T14:11:05Z

(Trying to interpret the spirit of the change, but can be way off. This is just something concrete to start the discussion from.) In the case of ipfs init for example, the CID that gets stored in the datastore under /local/filesroot is an empty UnixFS directory created with the go-merkledag tooling NodeWithData, which creates a ProtoNode without a CID builder. This in turn is interpreted by its CidBuilder() as using CIDv0 by default if it hasn't been specified.

We would then want to refactor that library to use CIDv1 by default? Or am I going too deep in the change and instead we would need to find a bypass in MFS/UnixFS to avoid this tooling and make sure they set the CIDv1 everywhere? The first seems the cleanest if we're transitioning to CIDv1 everywhere but might cause unexpected behavior to other consumers. The second has less impact but will be messier and harder to maintain (thinking off the top of my head, need to look closer into the code).

I can still be going too deep and we just need to find a spot to make the CIDv0-to-CIDv1 change in the command output itself, and still remain with CIDv0 internally (this also goes in the direction of an even less impactful change, but again with the sense of a messier approach, at least at first sight, of having to catch and intercept these exposures of our internal CIDs to change them for the user).

lidel · 2022-04-13T19:01:21Z

@schomatis thank you for looking into this.

Switching to v1CidPrefix in CidBuilder() sounds like a good way to ensure the ecosystem moves to CIDv1, but could also block go-ipfs work for a while. Let's discuss this approach independently in ipfs/go-merkledag#86.

My suggestion is to look for a less invasive approach – maybe call SetCidBuilder after NodeWithData is created by go-ipfs? We already do that in some places like core/commands/files.go – we should have a "default builder" hardcoded in go-ipfs somewhere, so we are independent of default in go-merkledag.

If that is too messy, what if we manually execute filesChcidCmd as the last step in ipfs init (to switch MFS root to CIDv1)?

schomatis · 2022-04-14T12:56:33Z

I agree with taking a middle-of-the-road approach. I think I'll go with setting a CIDv1 alternative in merkledag itself (without forcing the current default) and start switching to that wherever it makes sense (like in UnixFS when creating the default directory that ends up being used in ipfs init), with the idea of start visualizing that alternative to anyone who is planning to created CIDs. I'll experiment a bit with this and get back to you.

If I understand correctly (please confirm this), we seek as part of this transition to store CIDs v1 internally, not just showing them externally to the user.

lidel · 2023-04-25T21:52:37Z

Relevant libraries moved to boxo and have clear maintainers and ownership (IPFS Stewards):

We should be able to update defaults to CIDv1 with raw-leaves=true as the new default there.

lidel · 2023-09-07T12:53:53Z

Switching to --cid-version 1 by default implies enabling --raw-leaves by default, which will change the DAG chunks and the multihash of the final CID.

It feels like a good opportunity to ALSO enable --inline with --inline-limit=32, to have one release with "breaking change" instead of two.

lidel · 2024-05-14T17:28:36Z

#10421 got merged and is scheduled to ship with Kubo 0.29.

It introduced Import configuration section which allows customizing CID and UnixFS flags for commands like ipfs add.

There are also two new profiles with cidv0 and cidv1 presets:

ipfs config profile apply legacy-cid-v0 will make ipfs add always produce CIDv0 (solidifying current behavior).

Right now, it sets:

kubo/config/profile.go

Lines 211 to 214 in 099ce9c

    
           c.Import.CidVersion = *NewOptionalInteger(0) 
        
           c.Import.UnixFSRawLeaves = False 
        
           c.Import.UnixFSChunker = *NewOptionalString("size-262144") 
        
           c.Import.HashFunction = *NewOptionalString("sha2-256")

ipfs config profile apply test-cid-v1 allows switching to CIDv1 by default with 1 MiB block size (potentially the future default, matching CIDs produced by services like https://web3.storage, and cutting down the number of CIDs that need to be announced to Amino DHT)

Feedback about what should go into test-cid-v1 is welcome. Right now, it sets four flags:

kubo/config/profile.go

Lines 222 to 225 in 099ce9c

    
           c.Import.CidVersion = *NewOptionalInteger(1) 
        
           c.Import.UnixFSRawLeaves = True 
        
           c.Import.UnixFSChunker = *NewOptionalString("size-1048576") 
        
           c.Import.HashFunction = *NewOptionalString("sha2-256")

bumblefudge · 2024-10-24T09:16:20Z

Feedback about what should go into test-cid-v1 is welcome

I'm trying to solve this on two different levels. On the spec level, I feel like every single parameter of DAG production needs to be defined for the kubo-CIDv1 and kubo CIDv0 profiles, even if the CLI doesn't expose each of those parameters as flags (it's totally OK if it never does!). On the kubo level, it's probably better if many of these parameters aren't exposed as flag, and it seems uncontroversially acceptible if special-snowflake use cases that define their own IPLD dialect ("off-profile" is the new "off-spec", voids your warranty, etc), and that change some of these low-level DAG-shape settings, find that they cannot recreate their CIDs with kubo from the same inputs. I'm hoping the IPIP outlining what a "complete" profile is lists all these "implicit" (in kubo, at least) DAG params and chunking params, and that the formal, independently-reimplementable definition of these profiles includes their values, even if changing any of them requires forking kubo or helia or otherwise going off-roading.

lidel · 2024-11-07T23:51:37Z

Yes, specs will be a laborous lift, but I think what we could do in Kubo, is to "freeze defaults" by writing them to JSON config (Import section) when initial ipfs init is called, and also for existing users who have it missing from config.

This way we can switch new instances to CIDv1 without impacting existing users that depend on CID being deterministic.

Going forward, we if we make any new knob that impacts produced CID, we would add it there and also freeze the current default before changing it for new users.

This way users who are oblivious to all the complexity will still get "the same CID" and if they start getting different one on a different Kubo instance, they can DIFF config, spot the difference, and fix problem themselves.

samholmes mentioned this issue Aug 15, 2017

Tackle identifying origins with (or without?) dweb: paths ipfs/in-web-browsers#6

Closed

JustinDrake mentioned this issue Aug 17, 2017

Consider using base32 CIDv1 OpenBazaar/openbazaar-go#634

Closed

kyledrake mentioned this issue Aug 28, 2017

Dynamic DNS and Let's Encrypt certificates for secure websockets ipfs/notes#252

Open

kevina mentioned this issue Aug 29, 2017

Provide Way of Representing CidV0 objects in an alternative base ipfs/go-cid#34

Closed

whyrusleeping added the need/community-input Needs input from the wider community label Aug 29, 2017

daviddias mentioned this issue Aug 31, 2017

Update the spec from the implementation. multiformats/cid#14

Merged

lidel assigned schomatis Apr 8, 2022

lidel mentioned this issue Apr 13, 2022

Switch default CidBuilder to v1CidPrefix ipfs/go-merkledag#86

Open

This was referenced Apr 14, 2022

feat(unixfs): use cidv1 by default #8886

Draft

Update ipfs object to CIDv1 #8887

Closed

lidel modified the milestones: go-ipfs 0.13, go-ipfs 0.14 Apr 15, 2022

sergiimk mentioned this issue Apr 17, 2022

Consider going back to base16 hash encoding open-data-fabric/open-data-fabric#23

Closed

schomatis mentioned this issue Apr 21, 2022

feat(test): migrate CIDv0-v1 tool #8898

Draft

Jorropo mentioned this issue May 14, 2022

add --cid-version=1 --raw-leaves=false isn't hash coherent with add --cid-version=0 #8974

Closed

3 tasks

BigLep modified the milestones: kubo 0.14, kubo 0.15 Jul 22, 2022

lidel unassigned schomatis Nov 22, 2022

lidel mentioned this issue Dec 13, 2022

ipfs should default to CIDv1 #9494

Closed

3 tasks

RubenKelevra mentioned this issue Mar 26, 2023

Switching to CIDv1 by default sameer/git-lfs-ipfs#11

Open

lidel mentioned this issue Apr 25, 2023

Adding a file/folder from disk should use CIDv1 by default. ipfs/ipfs-desktop#2361

Open

lidel mentioned this issue May 8, 2024

config: introduce Import section #10421

Merged

dmaretskyi mentioned this issue Jun 10, 2024

DXN spec dxos/dxos#6953

Closed

lidel mentioned this issue Nov 8, 2024

feat: default MFS to CIDv1 ipfs/ipfs-desktop#2527

Draft

Make base32 CIDv1 the default for go-ipfs #4143

Make base32 CIDv1 the default for go-ipfs #4143

Comments

kyledrake commented Aug 15, 2017 • edited Loading

lidel commented Aug 15, 2017

daviddias commented Aug 15, 2017

ghost commented Aug 15, 2017

ghost commented Aug 15, 2017

daviddias commented Aug 15, 2017

ghost commented Aug 15, 2017

samholmes commented Aug 15, 2017

Notes on Base 32 Encoding

ghost commented Aug 15, 2017

samholmes commented Aug 15, 2017

Notes on URLs and URI Schemes

samholmes commented Aug 16, 2017

kevina commented Aug 16, 2017 • edited Loading

kyledrake commented Aug 16, 2017

kyledrake commented Aug 16, 2017

kevina commented Aug 16, 2017 • edited Loading

kevina commented Aug 18, 2017

samholmes commented Aug 18, 2017

ghost commented Aug 29, 2017

ghost commented Aug 29, 2017

kevina commented Aug 29, 2017

kevina commented Aug 29, 2017

Stebalien commented Aug 29, 2017

ghost commented Aug 29, 2017

kevina commented Aug 29, 2017

kevina commented Aug 29, 2017 • edited Loading

lidel commented Apr 8, 2022 • edited Loading

olizilla commented Apr 8, 2022

lidel commented Apr 8, 2022 • edited Loading

schomatis commented Apr 13, 2022

schomatis commented Apr 13, 2022

lidel commented Apr 13, 2022 • edited Loading

schomatis commented Apr 14, 2022

lidel commented Apr 25, 2023 • edited Loading

lidel commented Sep 7, 2023

lidel commented May 14, 2024 • edited Loading

bumblefudge commented Oct 24, 2024 • edited Loading

lidel commented Nov 7, 2024 • edited Loading

kyledrake commented Aug 15, 2017 •

edited

Loading

kevina commented Aug 16, 2017 •

edited

Loading

kevina commented Aug 16, 2017 •

edited

Loading

kevina commented Aug 29, 2017 •

edited

Loading

lidel commented Apr 8, 2022 •

edited

Loading

lidel commented Apr 8, 2022 •

edited

Loading

lidel commented Apr 13, 2022 •

edited

Loading

lidel commented Apr 25, 2023 •

edited

Loading

lidel commented May 14, 2024 •

edited

Loading

bumblefudge commented Oct 24, 2024 •

edited

Loading

lidel commented Nov 7, 2024 •

edited

Loading