-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backing via [u8; N] #7
base: main
Are you sure you want to change the base?
Conversation
I'm not happy with this
The first two might be improved by refactoring and optimizing them. |
Ok the generated and inlined code is better than expected: https://godbolt.org/z/4n3Mv6fK8 |
Yeah, I would've been very surprised if the generated code ended up any worse than that. The lack of any/all panics is especially good to see!
Well that's certainly unfortunate. Here's an idea though: see if you can const-ify it without relying on helper functions (i.e: just emit the required code on a per accessor basis). If that works, you can start streamlining + optimizing that code. e.g: by pre-computing which if-statements apply to which field, and manually emitting only the case you care about. Of course, having that feature be stable would be optimal... but I'm sure that with a bit of tasteful code contortions, it should be workable without it as well. |
Ok here is the first workaround, moves instead of mut refs. The generated code, however, is less optimal. Even with optimizations, it copies a lot more. |
I'll try to poke around this code a bit. Maybe I can spot some easy wins / optimizations. In the meantime, here's a rewritten #[inline(never)]
fn bit_print(v: &[u8]) {
fn putchar(c: u8) {
extern "C" {
fn putchar(c: u8);
}
unsafe { putchar(c) }
}
for b in v {
let mut b = *b;
for _ in 0..8 {
putchar(if b & 0x80 == 0 { b'0' } else { b'1' });
b <<= 1;
}
putchar(b' ');
}
putchar(b'\n');
} |
I played around with it a bit, and it seems that you can get pretty dang close to identical by helping the Rust compiler reason about the moves. See https://godbolt.org/z/WTs8boW9f In a nutshell: the following construct seems to iron out the codegen issues quite nicely: // new "inner" function
fn set_flag_copy(x: [u8; 4], flag: bool) -> [u8; 4] {
let src = [flag as u8];
bit_copy(x, 3, &src, 0, 1)
}
// same API as what you had
fn set_flag(x: &mut [u8; 4], flag: bool) {
*x = set_flag_copy(*x, flag)
} Oh, and for posterity: here's a godbolt I was playing around with that compares (streamlined versions of) the non-cost and my new const versions against eachother: https://godbolt.org/z/oaWxT73Yz |
This looks cool. Is the thinking that specifying the backing would become optional, or would it be prohibited? I'd prefer optional, where |
(Really I think I'd probably avoid the bare |
I agree with @jstarks - when I proposed the bare So, on that note: I would suggest disallowing the bare (and of course: keeping the existing syntax would be a good move regardless, in terms of making it easy to upgrade between versions) The only question becomes... what syntax to use for exotically-sized bitfields? Maybe something like Maybe something more explicit like Anyways, this is an area ripe for bikeshedding, and ultimately, I'm not too concerned with the specific syntax - just that it is opt-in, rather than implicit (as initially suggested) EDIT: oh, actually, I don't think the |
In general, I want to keep this library as simple as possible. Having two entirely different code generation methods, depending on whether the underlying type is an int or a byte array, is not ideal. I would instead fully commit to one of the two: either only integers or byte arrays. The decision of which backing type to use is not easy. Integers are limited but very simple, which is important for efficient code generation and proc macros, which often come with compile-time overheads. |
This is also the case for me. I primarily use this bitfield crate for x86 register abstractions (which are usually aligned accordingly) and packing data into atomic data types (like AtomicU64 with from/into). So keeping this common case as the default makes sense.
Maybe in these cases, |
A few thoughts:
Of course, this is your lib, and you're totally free to make whatever complexity vs. featureset vs. codegen tradeoffs you think are best, so if you think this feature falls on the wrong side of those tradeoffs, that's totally reasonable - i'll just have consider some alternatives. That being said, as someone who's been hacking on emulators and OS code for a while now, I've found that exotically sized bitfield structures - while rare - certainly do come up, and having a single ergonomic and featureful Rust crate to reach for when working with bitfield would be incredibly helpful 😉 |
Yes I agree that this general solution would be very cool. However, I'm willing to implement the code generation with the current bit copy and we can decide then if we are happy with it's performance. |
Const mutable references are not stable yet, thus we fall back on moving the buffers in and out directly. The downside is additinal copies, even for optimized code.
453a15d
to
ff76a9d
Compare
After being quite occupied the last few days, I was finally able to implement the code generation for the new bit_copy. The bitfield attribute has changed a little:
It seems functional so far, but I didn't benchmark it yet. What do you think? |
In terms of feature-set, I think you've nailed it! The only remaining questions to me are around the API itself, and how we might refine it:
Those are some of my initial thoughts after perusing the code. I'll try to find some time to clone down and play with the code in a more hands-on fashion as well, and see if I can glean any more insights there. Oh, and thanks again for taking a crack at this! It's starting to shape up really nicely, and I'm super excited to refactor some code using this new functionality 😄 |
It was easier to parse, but using
Specifying
Using
Yes, that was part of the idea for trying a different semantic other than Maybe we should support specifying either |
Indeed. Allowing bare
Well... I think there's still merit in allowing both Maybe you could consider something like the following? // scenario 1 (most common, same as today)
#[bitfield(u32)] // with no support for additional attrs
// scenario 2 (advanced)
#[bitfield(ty = u32, align = 1)] // all key = value pairs
// ...or
#[bitfield(bytes = 5, align = 2)] |
Yes, indeed.
We also have the So, in that case, only two versions seem a lot easier. #[bitfield(u32, align = 1, debug = false)]
#[bitfield(bytes = 3, align = 2, debug = false)] PS: Sorry for the late answer. |
Ahh, that's right, you already support i.e: require either And no need to apologize! It'll land when it lands :) |
Thanks for working on I believe it's due to the removal of the this from
Are there plans to make it |
no_std is definitely a priority for this crate. However, this branch still tests implementations and might contain bugs/oversights in addition to this one. |
Do you have an intention to merge this at some point? Really need this one as my struct is 48 bytes. Aside from that this crate looks perfect for my case :( |
@heroin-moose, I would love to, but I'm currently not happy with the design and the (in certain cases) non-optimal code generation. There are also certain nightly features like const mut refs that would make this a lot cleaner. Unfortunately, they are not ready yet. Also, I currently don't have much time for further experimentation, which is why this PR didn't got much attention lately. Edit: If you have any ideas for improvements, feel free to contribute them ;) |
Any updates? |
Not really, I'm afraid. |
I'm probably a bit late to this discussion, but we should probably use an integer type wherever possible and otherwise fallback to an array. Apart from that, we could check if the field we're accessing is byte aligned and the size a power of two and generate code depending on that. I'm afraid, we won't be able to generate more optimal code. In the end, we're probably building something that's close to the code, which calculates the layout for structs in LLVM. |
Would love to see this. I'm writing some code to parse fixed-length 80-byte chunks of data from a legacy binary data file format that will involve use of:
On and on. This package is almost perfect for me, except the constraint restricting me to only built-in integer types makes it almost totally useless, I think... (I still have to get into it in more depth, but scoping this out in advance is not looking promising at first glance.) |
Using
[u8; N]
as the backing store instead of integers has the following benefits.Changing the API to something like this, as suggested in #6.
The main difficulty is the more complex (and perhaps less efficient) code generation for the accessor functions.
However, this might be abstracted into a single
bit_copy
function used by the accessors.