Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 78 additions & 27 deletions bip-0093.mediawiki
Original file line number Diff line number Diff line change
Expand Up @@ -66,16 +66,17 @@ efficient to read out loud, write, type or to put into QR codes.</ref> format ca
===codex32===

A codex32 string is similar to a bech32 string defined in [https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki BIP-0173].
It reuses the base-32 character set from BIP-0173, and consists of:
It reuses the base-32 character set from BIP-0173, is at most 94 characters long, and consists of:

* A human-readable part, which is the string "ms" (or "MS").
* A separator, which is always "1".
* A data part which is in turn subdivided into:
** A threshold parameter, which MUST be a single digit between "2" and "9", or the digit "0".
*** If the threshold parameter is "0" then the share index, defined below, MUST have a value of "s" (or "S").
** An identifier consisting of 4 bech32 characters.
** A share index, which is any bech32 character. Note that a share index value of "s" (or "S") is special and denotes the unshared secret (see section "Unshared Secret").
** A payload which is a sequence of up to 74 bech32 characters. (However, see '''Long codex32 Strings''' below for an exception to this limit.)
** A share index, which is any bech32 character.
*** Note that a share index value of "s" (or "S") is special and denotes the unshared secret (see section '''Unshared Secret''').
** A payload which is a sequence of up to 72 bech32 characters. (However, see '''Long codex32''' below for an exception to this limit.)
** A checksum which consists of 13 bech32 characters as described below.

As with bech32 strings, a codex32 string MUST be entirely uppercase or entirely lowercase.
Expand Down Expand Up @@ -148,13 +149,6 @@ We do not specify how an implementation should implement error correction. Howev

When the share index of a valid codex32 string (converted to lowercase) is the letter "s", we call the string a codex32 secret.

The secret is decoded by converting the payload to bytes:

* Translate the characters to 5 bits values using the bech32 character table from BIP-0173, most significant bit first.
* Re-arrange those bits into groups of 8 bits. Any incomplete group at the end MUST be 4 bits or less, and is discarded.

Note that unlike the decoding process in BIP-0173, we do NOT require that the incomplete group be all zeros.

For an unshared secret, the threshold parameter (the first character of the data part) is ignored (beyond the fact it must be a digit for the codex32 string to be valid).
We recommend using the digit "0" for the threshold parameter in this case.
The 4 character identifier also has no effect beyond aiding users in distinguishing between multiple different secrets in cases where they have more than one.
Expand Down Expand Up @@ -198,13 +192,50 @@ A secret seed is a codex32 encoding of:
* The data-part values:
** A threshold parameter, which MUST be a single digit between "2" and "9", or the digit "0".
** An identifier consisting of 4 bech32 characters.
*** We do not define how to choose the identifier, beyond noting that it SHOULD be distinct for every master seed and share set the user may need to disambiguate.
*** Note that the identifier SHOULD be distinct for every master seed and share set the user may need to disambiguate.
*** Implementations MAY derive identifier characters by computing bech32-encoded BIP-0032 master key fingerprint<ref>'''Why use BIP-0032 master [https://github.com/bitcoin/bips/blob/master/bip-0032.mediawiki#user-content-Key_identifiers key fingerprint] for identifier?'''
This has less than 1 in 10<sup>6</sup> chance of collision for small numbers of master seeds.
It enables implementations to help users locate shares for a given master seed, since the fingerprint appears in backups, PSBTs (Partially Signed Bitcoin Transactions), and descriptor key origin information.
A known fingerprint also detects errors and corrects erasures in the identifier, complementing the codex32 checksum, and has less than a 1 in 10<sup>6</sup> chance of failing to detect malicious tampering after recovery.
</ref>(s) of the following data:
**** For a fresh secret seed: the master seed bytes.
**** For a reshared master seed: the master share set<ref>'''Why are all ''k'' initial shared strings concatenated to get a reshare fingerprint identifier?'''
If fewer shares are used a ''k''-1 adversary could learn the secret was reshared, implying the existence of other backups or the identifier might not change if new sets reused shares.
Previous share headers are included for uniqueness. If shares were interpolated or XOR-combined the payload order would be lost and identifiers could be reused for different sets. Shares are alphabetized so the same set derives the same identifier.
A pubkey fingerprint is more work to grind the last string against than a simple hash.</ref>, encoded as UTF-8 bytes (see '''Generating Shares'''): <source lang="python">master_share_set = ''.join(sorted(set_of_k_initial_shared_strings)).lower().encode()</source>
***# From the appropriate "seed" data above, create a BIP32 master node and get the master fingerprint: <code>BIP32.from_seed(data).get_fingerprint()</code>
***# Start with the bits of the fingerprint, most significant bit per byte first.
***# Re-arrange those bits into 4 groups of 5, and discard the 12 bits at the end.
***# Translate those bits to characters using the bech32 character table from BIP-0173.
** The share index "s".
** A conversion of the 16-to-64-byte BIP-0032 HD master seed to bech32:
*** The master seed length in bytes MUST be a multiple of 4.
*** Start with the bits of the master seed, most significant bit per byte first.
*** Re-arrange those bits into groups of 5, and pad with arbitrary bits at the end if needed.
**** Note that deterministic implementations SHOULD derive padding bits using a CRC<ref>'''Why use a CRC for padding?'''
The CRC adds a trivial amount of bit error detection and makes the incomplete group less predictable than zero-padding.
The recommended is CRC-<code>w</code> where: <code>w = pad_bits_needed</code>, <code>poly = 1 << w | 3</code>, <code>init = 1</code>, <code>const = 1 << w - 1</code>, <code>refIn = false</code>, and <code>refOut = false</code></ref>.
*** Translate those bits to characters using the bech32 character table from BIP-0173.
** A valid checksum in accordance with the Checksum section.
** A valid checksum in accordance with the '''Checksum''' section.

'''Decoding'''

Software interpreting a codex32-encoded master seed:
* MUST verify that the human-readable part is "ms".
* SHOULD interpret the first six data-part characters as the header.
* Convert the payload characters to bytes:
** Translate the characters to 5-bit values using the bech32 character table from BIP-0173, most significant bit first.
** Re-arrange those bits into groups of 8 bits. Any incomplete group at the end MUST be 4 bits or less, and is discarded.
** There MUST be between 16 and 64 groups, which are interpreted as the bytes of the master seed.
** The master seed MUST be a multiple of 4 bytes.

Note that unlike the decoding process in BIP-0173, we do NOT require that the incomplete group be all zeros.

Decoders SHOULD enforce known-length restrictions on master seeds.

As a result of the previous rules, secret seeds cannot be between 94 and 100 characters long, and their length modulo 8 cannot be 1.
Regular codex32-encoded master seeds are always between 48 and 93 characters long, and their length modulo 8 cannot be 4 or 7.
Long codex32-encoded master seeds are always between 101 and 127 characters long, and their length modulo 8 cannot be 3 or 6.

===Recovering Secret===

Expand Down Expand Up @@ -259,8 +290,9 @@ def ms32_recover(shares):

===Generating Shares===

If we already have ''k'' valid codex32 strings such that:
When the threshold parameter of a valid codex32 string is not the digit "0", we call the string a codex32 shared string.

If we already have ''k'' valid codex32 shared strings such that:
* All strings have the same threshold value ''k'', the same identifier, and the same length
* All of the share index values are distinct

Expand All @@ -270,21 +302,33 @@ The newly derived share will have the provided share index.
Once a user has generated ''n'' shares, they may discard the codex32 secret (if it exists).
The ''n'' shares form a ''k'' of ''n'' Shamir's secret sharing scheme of a codex32 secret.

'''Decoding'''

Software interpreting a codex32 shared string:
* MUST verify that the human-readable part is "ms".
* MAY convert the codex32 shared string to bytes:
** Translate the threshold parameter minus 2 to a 3-bit value, most significant bit first.
** Translate the share index and payload characters to 5-bit values using the bech32 character table from BIP-0173, most significant bit first.
** Re-arrange those bits into groups of 8 bits. Any incomplete group at the end MUST be zero-padded to 8 bits, and MUST be retained.
** There MUST be between 18 and 66 groups, which are interpreted as the minimal bytes to recover the master seed unambiguously.

Note that unlike the decoding process in BIP-0173, we do NOT discard the incomplete group.

There are two ways to create an initial set of ''k'' valid codex32 strings, depending on whether the user already has an existing secret to split.

====For a fresh secret====

In the case that the user wishes to generate a fresh secret, the user generates random initial shares, as follows:

# Choose a bitsize, between 128 and 512, which must be a multiple of 8
# Choose a bitsize, between 128 and 512, which must be a multiple of 32
# Choose a threshold value ''k'' between 2 and 9, inclusive
# Choose a 4 bech32 character identifier
#* We do not define how to choose the identifier, beyond noting that it SHOULD be distinct for every secret the user may need to disambiguate
# ''k'' many times, generate a random share by:
## Take the next available letter from the bech32 alphabet, in alphabetical order, as <code>a</code>, <code>c</code>, <code>d</code>, ..., to be the share index
## Set the first nine characters to be the prefix <code>ms1</code>, the threshold value ''t'', the 4-character identifier, and then the share index
## Choose the next ceil(''bitlength / 5'') characters uniformly at random
## Generate a valid checksum in accordance with the Checksum section, and append this to the resulting shares
## Generate a valid checksum in accordance with the '''Checksum''' section, and append this to the resulting shares

The result will be ''k'' distinct shares, all with the same initial 8 characters, and a distinct share index as the 9th character.

Expand All @@ -300,17 +344,18 @@ The conversion process consists of:
#* We do not define how to choose the identifier, beyond noting that it SHOULD be distinct for every set of shares the user may need to disambiguate
# Set the share index to <code>s</code>
# Set the payload to a bech32 encoding of the secret data, padded with arbitrary bits
# Generate a valid checksum in accordance with the Checksum section
# Generate a valid checksum per the '''Checksum''' section

Along with the codex32 secret, the user must generate ''k''-1 other codex32 shares, each with the same threshold value, the same identifier, and a distinct share index.
These shares should be generated as described in the "fresh secret" section.
These shares should be generated as described in the '''fresh secret''' section.

The codex32 secret and the ''k''-1 codex32 shares form a set of ''k'' valid initial codex32 strings from which additional shares can be derived as described above.
The codex32 secret and the ''k''-1 codex32 shares form a set of ''k'' valid initial codex32 shared strings from which additional shares can be derived as described above.
We call this set of ''k'' strings a "master share set", if all ''k''-1 shares are generated as per the '''fresh secret''' section.

===Long codex32===

The 13 character checksum design only supports up to 80 data characters.
Excluding the threshold, identifier and index characters, this limits the payload to 74 characters or 46 bytes.
Excluding the HRP, threshold, identifier and index characters, this limits the payload to 73 characters or 45 bytes.
While this is enough to support the 32-byte advised size of BIP-0032 master seeds, BIP-0032 allows seeds to be up to 64 bytes in size.
We define a long codex32 format to support these longer seeds by defining an alternative checksum.

Expand Down Expand Up @@ -348,10 +393,11 @@ random errors.

A long codex32 string follows the same specification as a regular codex32 string with the following changes.

* The payload is a sequence of between 75 and 103 bech32 characters.
* The length is between 97 and 1024 characters long.
* The payload is a sequence of up to 1001 bech32 characters.
* The checksum consists of 15 bech32 characters as defined above.

A codex32 string with a data part of 94 or 95 characters is never legal as a regular codex32 string is limited to 93 data characters and a long codex32 string is at least 96 data characters.
A codex32 string with a length of 95 or 96 characters is never legal as a regular codex32 string is limited to 94 characters and a long codex32 string is at least 97 characters.

Generation of long shares and recovery of the long secret from long shares proceeds in exactly the same way as for regular shares with the <code>ms32_interpolate</code> function.

Expand All @@ -369,9 +415,13 @@ This fact allows the header data to be covered by the checksum.
The checksum size and identifier size have been chosen so that the encoding of 128-bit master seeds and shares fit within 48 characters.
This is a standard size for many common seed storage formats, which has been popularized by the 12 four-letter word format of the BIP-0039 mnemonic.

The 13 character checksum is adequate to correct 4 errors in up to 93 characters (80 characters of data and 13 characters of the checksum).
The 13 character checksum is adequate to correct 4 errors in up to 93 characters (80 characters of HRP and data,<ref>'''Why cover HRP characters?'''
Under the [https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki#cite_note-5 BIP-0173 error model] that HRP errors
only change the low 5 bits (like changing an alphabetical character into another), errors are restricted to the ''[low hrp] [data]''
part, which is at most 93 characters; thus all error correction properties (see appendix) remain applicable.</ref> and 13 characters of the checksum).
We can correct up to 8 erasures (errors with known locations), and up to 13 consecutive errors (burst errors).
Beyond that, our code is guaranteed to detect up to 8 errors.
Beyond that, our code is guaranteed
to detect up to 8 errors.
More generally, any number of random errors will be detected with overwhelming (1 - 2^65) probability. However, the checksum does not protect against maliciously constructed errors.
These parameters are slightly better than those of the checksum used in SLIP-0039.

Expand All @@ -382,9 +432,7 @@ While we could use the 15 character checksum for both cases, we prefer to keep t
We only guarantee to correct 4 characters no matter how long the string is.
Longer strings mean more chances for transcription errors, so shorter strings are better.

The longest data part using the regular 13 character checksum is 93 characters and corresponds to a 368-bit secret.
At this length, the prefix <code>MS1</code> is not covered by the checksum.
This is acceptable because the checksum scheme itself requires you to know that the <code>MS1</code> prefix is being used in the first place.
The longest codex32-encoded master seed using the regular 13 character checksum is 93 characters and corresponds to a 352-bit seed.
If the prefix is damaged and a user is guessing that the data might be using this scheme, then the user can enter the available data explicitly using the suspected <code>MS1</code> prefix.

===Not BIP-0039 Entropy===
Expand Down Expand Up @@ -419,7 +467,10 @@ The main advantage of this alternative approach would be that wallets could give
In practice, we do not expect users in switch back and forth between backup formats, and instead just generate a fresh master seed using Codex32.

Seeing little value with BIP-0039 compatibility (English-only), all the difficulties with BIP-0039 language choice, not to mention the PBKDF2 overhead of using BIP-0039, we think it is best to abandon BIP-0039 and encode BIP-0032 master seeds directly.
Our approach is semi-convertible with BIP-0039's 512-bit master seeds (in all languages, see Backwards Compatibility) and fully interconvertible with SLIP-39 encoded master seeds or any other encoding of BIP-0032 master seeds.
Our approach is semi-convertible with BIP-0039's 512-bit master seeds (in all languages, see '''Backwards Compatibility''') and fully interconvertible with SLIP-39 encoded master seeds or any other encoding of BIP-0032 master seeds.

'''Footnotes'''
<references />

==Backwards Compatibility==

Expand Down