-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Would it make sense to extend the ULID to include an optional checksum? #88
Comments
From specification:
Seems to me that there are spare 2bits in the textual ULID representation that could be used for storing checksum. Care would need to be taken to not encode the checksum into first digit(s) to prevent it from breaking sortability. |
Since ULID is fixed size, you likely could just use the length to figure out if there is a checksum payload. You could use the 2 bits to encode if there is 0 no checksum or if there is 1 or more digits of checksumming etc... This post from a crockford32 implementer said that the standard itself doesn't have a way to distinguish if there is a checksum or not however. starulli/crockford32#1 (comment) |
If checksum is really needed by an application, one could use the Luhn Mod N Algorithm, which is an extension to the Luhn algorithm, which in turn "is used to validate a variety of identification numbers, such as credit card numbers, IMEI numbers, National Provider Identifier numbers in the United States" etc. This is a Javascript example extracted from Wikipedia: // Crockford's Base32
const codePoints = "0123456789ABCDEFGHJKMNPQRSTVWXYZ";
function numberOfValidInputCharacters() {
return codePoints.length;
}
function codePointFromCharacter(character) {
return codePoints.indexOf(character);
}
function characterFromCodePoint(codePoint) {
return codePoints.charAt(codePoint);
}
function generateCheckCharacter(input) {
let factor = 2;
let sum = 0;
let n = numberOfValidInputCharacters();
// Starting from the right and working leftwards is easier since
// the initial "factor" will always be "2".
for (let i = input.length - 1; i >= 0; i--) {
let codePoint = codePointFromCharacter(input.charAt(i));
let addend = factor * codePoint;
// Alternate the "factor" that each "codePoint" is multiplied by
factor = (factor == 2) ? 1 : 2;
// Sum the digits of the "addend" as expressed in base "n"
addend = (Math.floor(addend / n)) + (addend % n);
sum += addend;
}
// Calculate the number that must be added to the "sum"
// to make it divisible by "n".
let remainder = sum % n;
let checkCodePoint = (n - remainder) % n;
return characterFromCodePoint(checkCodePoint);
}
generatedULID = "0123456789ZZZZZZZZZZZZZZZZ"; // generatedULID = ulid()
truncatedULID = generatedULID.substr(0, 25); // delete the last char of the random component
checkCharacter = generateCheckCharacter(truncatedULID);
checkCharacterULID = truncatedULID + checkCharacter;
console.log("Generated ULID: " + generatedULID);
console.log("Check Character ULID: " + checkCharacterULID); Output of the example:
Note that the last character of the original ULID, generated with Of course, by doing this, 5 bits will be "wasted". Still, it will be more advantageous than UUIDv7, which uses 6 fixed bits to mark its structure. For Monotonic ULIDs, the first character of the random component should be deleted instead of the last one. So the statement that removes 1 character could be changed from this: // delete the last character of the random component
truncatedULID = generatedULID.substr(0, 25); to this: // delete the FIRST character of the random component
truncatedULID = generatedULID.substr(0, 10) + generatedULID.substr(11, 15); EDIT: fixed the base alphabet used in the constant |
Thanks for the example. I noticed that the code you provided is not actually Crockford's Base32 but is actually more of a generic base36, but you got the point across on the idea of modifying the ULID spec to add checksum support. Did a bit of calculation and it's more like you would waste 3bits if you do a single character truncation of the last char of the textual ULID string.
Eitherway, bit iffy of just changing the spec to store a checksum intended for humans to human communication into it, even if its only 3bits wasted. As it would increase the complexity of implementation and potential edge cases. Plus those who already uses it would be forced to figure out how to migrate their ID to the new format. |
Sorry. I forgot to copy and paste the correct alphabet from the spec README: 0123456789ABCDEFGHJKMNPQRSTVWXYZ. Fixed. |
Ah not a problem. Anyway I've had a read around and I understand the checksum implementation a bit more now. https://www.johndcook.com/blog/2018/12/28/check-sums-and-error-detection/ The check char is actually the ULID as a number but modulo by 37 and appended with a slightly extended character set (*, ~, $, =, and U) as explained in this quote below from Johnd Cook.
He was able to prove that this prime number choice actually has a reason in terms of detecting "wrong-symbol and transposed-symbol errors". Anyway, I would be proposing that if we have So in short the textual format I can now propose is this (where
hence this format proposal
Just realised that the checksum character |
This would be useful in the case that the ULID is communicated over the phone or by handwriting...
Standardising the form would make it easier to recognise when it's being used and throw it away upon insertion into a database system... as well as how to regenerate it again and verify it's integrity when transmitted manually.
Looks like Crockford's Base32 which you are using does have some provision for appending an optional check symbol... however it does not provide a way to disambiguate between a checksummed data vs a non checksummed data... but for our purpose since the ULID is a fixed length... you could assume any extra data appended is the check symbol... anyway best to clarify that in the spec as an implementation recommendation
The text was updated successfully, but these errors were encountered: