Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Word method with tests #177

Merged
merged 2 commits into from
Nov 9, 2024
Merged

Conversation

CannibalVox
Copy link
Contributor

In another project I have been tasked with recording a series of boolean values and then outputting them 6 bits at a time in ascii characters. bitset makes the recording very convenient, but outputting the data is surprisingly inconvenient. Ultimately I'll likely be forced to pulling the underlying bytes and perform a lot of the same logic BitSet performs in order to pull 6-bit numbers from the array of uint64's

This PR adds a method Word that allows the consumer to pull data from the BitSet 64 bits at a time beginning at a requested index. This is very helpful for cases where the consumer needs numerical data constructed from bits that isn't aligned with the 64-bit boundaries between elements of the underlying array.

@lemire
Copy link
Member

lemire commented Nov 5, 2024

This PR is of high quality and we can merge this, no problem.

But let me ask questions first:

In another project I have been tasked with recording a series of boolean values and then outputting them 6 bits at a time in ascii characters. bitset makes the recording very convenient, but outputting the data is surprisingly inconvenient.

Basically, you are doing base64 encoding? Right? Maybe it is a variation on base64 but it ought to be quite similar.

Doing base64 encoding one character at a time is slow. Typically, one does it 4 characters at a time... at least... that is, you take in 24 bits (so 3 bytes) and you output four characters. Maybe at the very end, if you have fewer than 3 bytes, you do something special.

In any case, I do not see how the function you are proposing helps in this problem.

Note that I do not object to the PR... Happy to merge it, but it would be nice to understand the motivation.

@CannibalVox
Copy link
Contributor Author

Is there a good way of pulling 24 bits from a BitSet that I'm maybe not aware of?

@CannibalVox
Copy link
Contributor Author

It's actually sixel rendering so while the output strongly resembles base64 encoding, the input involves a lot of BitSet manipulation

@lemire
Copy link
Member

lemire commented Nov 5, 2024

Is there a good way of pulling 24 bits from a BitSet that I'm maybe not aware of?

Well, you create the following helper function...

func uint64SliceAsByteSlice(slice []uint64) []byte {
	header := *(*reflect.SliceHeader)(unsafe.Pointer(&slice))
	header.Len *= 8
	header.Cap *= 8
	result := *(*[]byte)(unsafe.Pointer(&header))
	runtime.KeepAlive(&slice)
	return result
}

This function is effectively constant time (free).

And then you can do

 mybytes := uint64SliceAsByteSlice(b.set)

Again, no allocation and hardly any compute.

That's a slice of bytes. Now any three bytes is 24 bits.

    // Iterate over the slice three bytes at a time
    i := 0
    for ; i+2 < len(data); i += 3 {
            triple := data[i : i+3]
            fmt.Printf("Bytes %d-%d: %v\n", i+1, i+3, triple)
    }
    // you have 0, 1 or 2 bytes leftover...
    // you can make a slice out of it... data[i:]

That does not seem too hard?

Note that my code is not real code, I just typed it... it is untested. But the idea is sound.

@lemire
Copy link
Member

lemire commented Nov 5, 2024

If it helps, I could add uint64SliceAsByteSlice to the library (write a function that returns a slice of 8-bit words).

@CannibalVox
Copy link
Contributor Author

// 4 shifts, 1 and, 1 or
word := b.Word(index)

pixel1 := word & 63
pixel2 := (word >> 6) & 63
pixel3 := (word >> 12) & 63
pixel 4 := (word >> 18) & 63

Total: 7 shifts, 5 ands, 1 or

mybytes := uint64SliceAsByteSlice(b.Bytes())

pixel1 := mybytes[index] & 63
pixel2 := (mybytes[index] >> 6) | ((mybytes[index+1] & 15) << 4)
pixel3 := (mybytes[index+1] >> 4) | ((mybytes[index+2] & 3) << 4)
pixel4 := mybytes[index+2] >> 2

5 shifts, 3 ands, 2 ors

Performance-wise, there's not much of a difference from my pov and I definitely prefer the first snippet

bitset.go Show resolved Hide resolved
bitset.go Outdated Show resolved Hide resolved
@lemire
Copy link
Member

lemire commented Nov 5, 2024

@CannibalVox

Your function will be efficient, I expect. It wasn't my concern. If you are happy with this PR, then great.

I have two minor comments. Please consider them.

@lemire
Copy link
Member

lemire commented Nov 5, 2024

Running tests. We shall merge once green.

@CannibalVox
Copy link
Contributor Author

Is this all good?

@lemire lemire merged commit 417751b into bits-and-blooms:master Nov 9, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants