-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bit packing #162
base: master
Are you sure you want to change the base?
Bit packing #162
Conversation
Some general comments (I'll try to do a proper review some time tomorrow):
|
Hi Joshua. Thanks for the initial review. Let me rework some of this. The original intention behind the bit-packing format was to be just as opinionated as the original input format requirements to avoid trying to be everything for everyone. However, I had originally thought of something a bit more flexible to deal with the wide range of potential input formats but thought it better to keep it slightly simpler. However, allow me to present the rough outline of the idea. Using a reworked |
This is a PR that primarily covers permitting the input data to be bit-packed data. We often find that vendors provide us with byte-oriented data, single-bit packed oriented data, and sometimes nibble-packed data (4-bits). It seems that bit-packed data should be a valid input format rather than forcing all data (especially multi-bit-oriented data) to be transformed prior to input.
There are several changes that were made to help this along, and the majority of the changes are in
shared/utils.h
.Of note, the following larger changes:
shared/utils.h
, theread_file
andread_file_subset
functions were refactored to be a single function. They were doing the same thing and therefore duplicating logic. The only difference being the subset length and index being passed in one vs. the other.read_file_subset
, more complete error-handling added in typical goto error-handling fashion. This ensures more complete and less error-prone error handling and resource recovery.rawsymbols
,symbols
andbsymbols
are more thoroughly documented in the function. This is important because originally the function assumed the input byte data was the same asrawsymbols
(which is not the case for bit-packed data).shared/utils.h
, a change was made to theseed()
function to permit deterministic seeds instead of relying on/dev/urandom
thereby aiding in deterministic regression testing.transpose_main.cpp
code, bit-packed data is transposed by exploiting thebsymbols
data.Technically, the bit-unpacking code in
read_file_subset()
will operate correctly on any sample bit alignment, but this was artificially restricted to be 1,2,4 or 8 bits on purpose due to ambiguity on sample alignment.A test harness was built to help ensure that the bit-oriented data input functions the same as the byte-oriented (masked) data.
Even if the bit-packing PR is considered superfluous, some of these fixes should be considered since it simplifies the code and makes unit-testing a little bit easier.
(Note that there were a number of whitespace changes due to use of space-tab expansions.)