Skip to content

doc: PL/pgSQL to convert between bytea to crockford base32 #6

@coolaj86

Description

@coolaj86

My ultimate goal is to convert between UUIDv4 and UUIDv7 to a more friendly prefixed base32 (like what Stripe and many other API providers do).

There may be minor tweaks that could be done (arrays vs substrings vs maps) that could yield significant gains for different use cases (ours is just short 128-bit IDs, not large blobs).

PL/pgSQL function to encode bytea to crockford base32

CREATE
OR REPLACE function bytea_to_crockford_base32 (input_bytea bytea) returns TEXT language plpgsql AS $$
DECLARE
    crockford_chars TEXT := '0123456789abcdefghjkmnpqrstvwxyz';
    input_len INT := LENGTH(input_bytea);
    result TEXT := '';
    byte INTEGER;
    bits INT := 0;
    value INTEGER := 0;
    buffer TEXT[] := '{}';  -- Use an array to accumulate characters
BEGIN
    FOR i IN 0..input_len-1 LOOP
        -- Get each byte and process its 8 bits
        byte := get_byte(input_bytea, i);
        bits := bits + 8;
        value := (value << 8) + byte;

        -- While we have at least 5 bits, map it to a Crockford character
        WHILE bits >= 5 LOOP
            bits := bits - 5;
            buffer := array_append(buffer, substr(crockford_chars, 1 + ((value >> bits) & 31), 1));
        END LOOP;
    END LOOP;

    -- Handle any remaining bits (pad to 5 bits if necessary)
    IF bits > 0 THEN
        buffer := array_append(buffer, substr(crockford_chars, 1 + ((value << (5 - bits)) & 31), 1));
    END IF;

    -- Join all characters in the buffer into a single result string
    result := array_to_string(buffer, '');

    RETURN result;
END;
$$
;

PL/pgSQL function to decode crockford base32 to bytea

CREATE
OR REPLACE function crockford_base32_to_bytea (input_text TEXT) returns bytea AS $$
DECLARE
    decoded BYTEA := '\x';  -- Initialize as an empty bytea string.
    crockford_map TEXT := '0123456789abcdefghjkmnpqrstvwxyz';  -- Crockford Base32 Alphabet (without I, L, O, and U)
    bits_accum INTEGER := 0;  -- Accumulates the bits as we decode.
    bits_count INTEGER := 0;  -- Keeps track of how many bits we have accumulated.
    ch CHAR;
    value INT;
BEGIN
    -- Loop through each character in the input string.
    FOR i IN 1..length(input_text) LOOP
        ch := lower(substr(input_text, i, 1));  -- Extract and uppercase the character.
        -- Get the corresponding value from the crockford_map.
        value := position(ch IN crockford_map) - 1;

        IF value = -1 THEN
            RAISE EXCEPTION 'Invalid character in base32 string: %', ch;
        END IF;

        -- Accumulate the value's bits into the accumulator.
        bits_accum := (bits_accum << 5) | value;
        bits_count := bits_count + 5;

        -- If we've accumulated at least 8 bits, extract a byte.
        WHILE bits_count >= 8 LOOP
            decoded := decoded || chr((bits_accum >> (bits_count - 8)) & 255)::BYTEA;
            bits_count := bits_count - 8;
        END LOOP;
    END LOOP;

    -- If there are leftover bits that don't form a full byte, ignore them (padding).
    RETURN decoded;
END;
$$ language plpgsql
;

Alternate bytea_to_crockford_base32

This is probably not as performant as the other, but it also works.

CREATE
OR REPLACE function bytea_to_crockford_base32_naive (input_bytea bytea) returns TEXT language plpgsql AS $$
DECLARE
    crockford_chars TEXT[] := '{0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f,g,h,j,k,m,n,p,q,r,s,t,v,w,x,y,z}';
    input_len INT;
    result TEXT := '';
    byte INTEGER;
    bits INT := 0;
    value INTEGER := 0;
BEGIN
    input_len := LENGTH(input_bytea);

    FOR i IN 1..input_len LOOP
        -- Get each byte and process its 8 bits
        byte := get_byte(input_bytea, i - 1);
        bits := bits + 8;
        value := (value << 8) + byte;

        -- While we have at least 5 bits, map it to a Crockford character
        WHILE bits >= 5 LOOP
            bits := bits - 5;
            result := result || crockford_chars[1 + ((value >> bits) & 31)];
        END LOOP;
    END LOOP;

    -- Handle any remaining bits (pad to 5 bits if necessary)
    IF bits > 0 THEN
        result := result || crockford_chars[1 + ((value << (5 - bits)) & 31)];
    END IF;

    RETURN result;
END;
$$
;

License

The code snippets and documentation in this issue are licensed under the CC0-1.0 (Public Domain), as follows:

PostgreSQL Crockford Base32 - convert between bytea and crockford base32

Authored in 2024 by AJ ONeal aj@therootcompany.com
To the extent possible under law, the author(s) have dedicated all copyright
and related and neighboring rights to this software to the public domain
worldwide. This software is distributed without any warranty.

You should have received a copy of the CC0 Public Domain Dedication along with
this software. If not, see https://creativecommons.org/publicdomain/zero/1.0/

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions