Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Bytes::try_unsplit() #287

Open
abonander opened this issue Aug 23, 2019 · 20 comments
Open

Add Bytes::try_unsplit() #287

abonander opened this issue Aug 23, 2019 · 20 comments

Comments

@abonander
Copy link

I see APIs have been proposed like this before but all I want is a method to try to unsplit a Bytes as I think that should be possible in many cases, and if it fails the user can choose to perform a copy:

 impl Bytes {
    /// If `self` and `other` are adjacent in the underlying buffer, widen `self` to encompass both slices.
    pub fn try_unsplit(&mut self, other: Bytes) -> Result<(),  Bytes> { ... }
}

if let Err(other) = bytes.try_unsplit(other) {
    bytes.extend_from_slice(&other);
}

I'd like it to take self by mutable reference as that's generally more versatile and matches the signature of BytesMut::unsplit().

@DoumanAsh
Copy link
Contributor

DoumanAsh commented Aug 31, 2019

@abonander Shouldn't you be able to do similar thing by converting to Mut if it is possible and unsplitting?

Bytes should be kinda immutable.

@abonander
Copy link
Author

@DoumanAsh the idea is to avoid performing a copy which that path necessitates. You're not mutating the underlying bytes, just widening the slice which should be safe as long as immutable references remain to the whole buffer.

@CodesInChaos
Copy link

CodesInChaos commented Sep 14, 2019

@abonander

Taking the second bytes by reference would remove the need to return it in the error case.

@DoumanAsh

Shouldn't you be able to do similar thing by converting to Mut if it is possible and unsplitting?

In cases where you could unsplit (both Bytes being adjacent in a shared buffer) converting to BytesMut will always fail since the reference count is at least two.

@xTachyon
Copy link

xTachyon commented Nov 1, 2019

I feel like this should be easy to implement and would be pretty useful.

My proposed function prototype is:

fn try_unsplit(&self, other: &Bytes) -> Option<Bytes>;

And it'd check if the shared data is the same and if they are adjacent, and return a Bytes which begins at self and has the len = self.len() + other.len().

I could get a shot at trying to implement it if you want.

@Matthias247
Copy link
Contributor

I would have had a use for the API too. I think the API proposed in the original comment makes most sense - if the unsplit worked the original other buffer should no longer be accessible.

Regarding the implementation: With the new vtable based approach you would likely check whether both internal pointers are equivalent and the data regions are adjacent. Then adjust the data region in the first object and call the vtable drop function on the second buffer.

I'm unsure whether that would have any negative side effects on some buffer implementations, where the same data pointer does not actually indicate the same buffer but is more a handle to something else. If that would be the case then we would need vtable methods for split and try_unsplit.

@CodesInChaos
Copy link

Taking &mut self while returning a status indicator instead of the actual result saves one ref-count increment/decrement.

@Matthias247
Copy link
Contributor

Whatever API is chosen, there will be 1 refcount decrement if successful and 0 if not successful. A move of a value doesn't change the refcount - only a clone or drop would. The merge here is the equivalent of a drop of the unused half.

The thing which we merge into should be taken by &mut self since it will always stay intact but change its properties. The thing which gets merged should be taken by value, since it might no longer exist if the merge is successful. If the merge isn't successful that half gets returned in order to give it back to the user. This is exactly the API that the original post proposed.

@carllerche
Copy link
Member

@seanmonstar I think this would require an addition to the vtable?

@seanmonstar
Copy link
Member

Perhaps it would.

I'm glad we didn't make the vtable stuff public just yet, I really don't like that it doesn't have a way to add new optional methods like a trait does. I'm thinking of trying to adjust the design a little to not be a struct Vtable but some unsafe trait Share.

@Matthias247
Copy link
Contributor

@seanmonstar I think this would require an addition to the vtable?

I am not sure if it really does. data pointers being equivalent and byte ranges being adjacent or overlapping is a good enough check for all implementations of Bytes that I can think of at the moment. It could be documented that Bytes instances which fulfill that criteria must be joinable - in the same way as currently every Bytes object is spliitable - by calling clone instead of a more specific split function.

Although I guess a vtable function would be cleaner, since it provides a way out if we discover in the future that there are some Bytes implementations which really behave differently. Issues like these tell us that it might be a good idea to keep as much as possible flexible.

I think even with the vtable approach we can add optional methods: In order to do that, do not grant applications access to the vtable directly, but for force them to go throug a builder:

let vtable = VTableBuilder::new().set_clone(my_clone_fn).set_drop(my_drop_fn).build()`

That way you can default some functions which are not explicitly set in the same way as with traits.
One tricky thing is obviously that the the builder and generated vtable must run at compile time, because we need 'static. But I think builders which just set fields might already work as const fn.

And of course even with a some extension points there are only so much backward compatible changes we can make. E.g. adding a default implementation for try_unsplit and allowing users to overwrite it is strictly speaking not backward compatible, since it might crash for users where the default impl does not do the right thing.

@seanmonstar
Copy link
Member

I am not sure if it really does. data pointers being equivalent and byte ranges being adjacent or overlapping is a good enough check for all implementations of Bytes that I can think of at the moment.

I'm not sure this is a safe assumption. If you have two separate allocations that happen to be right next to each other in the address space, and tried to un-split them, you could erroneously think they point to a single owned allocation. I tried this out with Vec and Box, and it seemed the were was always between 16 and 32 addresses between, but that may be an allocator detail that shouldn't be relied on.

VTableBuilder

Oh yea, that could work too... I just don't actually see the benefit of the raw vtable approach over some *mut dyn Share.

adding a default implementation for try_unsplit and allowing users to overwrite it is strictly speaking not backward compatible, since it might crash for users where the default impl does not do the right thing.

I'd expect a default implementation would just be pessimistic and assume unsplitting failed, so that would be backwards compatible.

@Matthias247
Copy link
Contributor

I'm not sure this is a safe assumption. If you have two separate allocations that happen to be right next to each other in the address space, and tried to un-split them, you could erroneously think they point to a single owned allocation.

I think the contract is that two separate allocations also have two separate data pointers. Otherwise the underlying allocator has no chance to identify the allocation. An allocator can not use ptr to identify something, because it will not stay stable during slicing.

I think the only exception are "allocators" which use data not in order to store a pointer. E.g. the from_static vtable obviously doesn't need data and can set it to anything - in which case two adjacent ranges could indeed be merged. For things like these the vtable approach is imho a bit more sensible than trait objects, because the pointer doesn't really refer to an object. It can refer to nothing, or can be used as an index into a global array. Trait objects refer always to object instances, and making them work on "not so real instances" is a bit hacky as we had seen in the older trait object based Waker implementations.

I think if the documentation specifies that each unique byte range must have a unique data pointer it should be good as it is, and we can unsplit based on a unique pointer and adjacency.

@seanmonstar seanmonstar removed their assignment Dec 19, 2019
@dcreager
Copy link

Similar in spirit to this would be a Bytes::try_slice_ref method ­— instead of panicking if the sub slice isn't actually part of the buffer, it would return None. (Or maybe it should always return a Bytes, falling back on creating a copy of the sliced data?) The serde Deserialize implementation could then use that, which should allow the following to work in a zero-copy way when you happen to be deserializing from a Bytes instance:

#[derive(Deserialize)]
struct Data {
    wrapped: Bytes,
}

@carllerche carllerche added this to the v0.6 milestone Oct 16, 2020
@carllerche
Copy link
Member

Assigning this to the v0.6 milestone in order to determine if an API needs to change based on this discussion.

@carllerche
Copy link
Member

carllerche commented Oct 20, 2020

Ok, looking at this, because the vtable is still private, adding try_unsplit should be forwards compatible.

I'm removing the 0.6 milestone. I also opened #437 to track questions related to the vtable.

@rrichardson
Copy link
Contributor

I think the safest/easiest approach would be to add a try_resize method to Bytes/Mut and to the VTable.
If the new len is less than the capacity (which is currently tracked in the Shared struct), then it succeeds.
For mutable buffers, we'd need to also verify that the refcount == 1.

Does it solve all cases? No, but a consumer should be able to use it to achieve their goals.

@rrichardson
Copy link
Contributor

rrichardson commented Jan 20, 2023

To clarify: If someone says X.unsplit(Y), what they're really saying is "split_off" X's original buffer, starting at X.ptr with a len of X.len + Y.len. So clone X, then resize to X.len + Y.len as long as it's < X.capacity. .

The condition that @seanmonstar mentioned might be a real scenario, but there is no need to check for it. The overall operation would fail and we should return an error. This is because the clone + resize operation would fail because it would be impossible for X to have the capacity to support that.

As long we do the continuity check as implemented, AND we check that capacity exists in X, this should be a safe operation.

@rrichardson
Copy link
Contributor

@Matthias247 wrote:

The thing which we merge into should be taken by &mut self since it will always stay intact but change its properties. The thing which gets merged should be taken by value, since it might no longer exist if the merge is successful. If the merge isn't successful that half gets returned in order to give it back to the user. This is exactly the API that the original post proposed.

There is no guaranteeing that there aren't more clones of the "other" Bytes instance that is passed in. So, IMO, it doesn't matter if we destroy it or not. My interpretation of this is that it's really just saying: "Give me a buffer that starts at self and ends at other + other.len".

Passing other by ref leaves the choice to decrement/destroy to the caller, which seems like a nice thing to do.

For avoiding the extra increment, we can do it with just a change of len at the Bytes layer, instead of cloning and resizing through the VTable. However, we'll still need to consult the VTable to ensure that the capacity is enough to support the new len. So the VTable will need a new method: get_capacity(data) -> usize

@rrichardson
Copy link
Contributor

However, we'll still need to consult the VTable to ensure that the capacity is enough to support the new len. So the VTable will need a new method: get_capacity(data) -> usize

I was able to kill two birds with the try_resize method addition to the vtable (which needed to exist anyways to support the internal conversion of promotable instances)

@frankmcsherry
Copy link

frankmcsherry commented Nov 5, 2024

I headed here from #702, but dropping in to add support for a similar request. A try_unsplit method is the only missing feature that would allow me to use the crate, and move off of my own probably unsound version. I thought I'd walk through the "reason" it's important, and if nothing else perhaps there's an obvious pattern I could/should be using instead, in case that is clear to anyone (I'm guessing it's not public yet in part because you don't or haven't needed it yet).

So I have various worker threads, each hosts various producers and consumers (potentially thousands+). Each worker thread has a dedicated BytesMut for each other worker (this is a thread-per-core model) that its producers each write to the tips of, peel off, and hope to send to various consumers on other workers. It is not uncommon that when the time comes to move the written bytes into a queue for some other worker, we've written a bunch of sequential bytes, and would love to boxcar the messages together ("unspilt" them) to reduce the interactions with the shared queue. Moreover, it ends up being (or seeming) important to be able to unsplit with the tail of the queue (so, it's more an Arc<Mutex<VecDeque<BytesMut>>>).

There are three things that can happen, from best to worst:

  1. Contiguous Bytes or BytesMut (either works; they are uniquely owned read-only) are unsplit, and each queue shared between workers maintains a relatively bounded number of items (independent of the number of small writes).
  2. The buffers are not unsplit, and each inter-worker queue has the potential to grow a deal, and potentially need some reallocation under the mutex. Realistically, in this the queues would probably be reworked, because the mutex only exists to support unspliting.
  3. We use unsplit as it currently exists and potentially allocate and memcpy in the cases that things aren't contiguous (e.g. when we finish one BytesMut and move on to the next).

There's also some potential advantage to having the sending thread unsplit when possible, to perform contend-y operations all at once on a single thread (the sender), in order to avoid as many contend-y operations as possible done by other threads when they drop the large number of split bytes.

Anyhow, that's my story! I'm curious if you just don't have this problem. It may be specific to thread-per-core architectures where you pre-allocate buffers for each thread pair, and really do expect unsplit to often succeed, but not always. And of course, if you have other takes I'm curious. Current options (beyond pleading my case here) are 1. a SPSC queue with as many split bytes as folks create, at whatever cost, or 2. a wrapper around Bytes that tracks its provenance, just to know if it is safe to call unsplit or not.


The larger goal, in case this sparks any creative juices, is for each ordered pair of workers to communicate serialized data, unidirectionally, without using the allocator in steady state. Currently, this is a pool of BytesMut that the producers writes to, the worker ships, and the consumers read though not exactly in the same order. Each pair of workers are only lightly coupled, and one might want to write 1000 times before the other has read (preventing double-/triple-buffering sorts of solutions). You'd like to publish each producers output immediately, getting in the way of producer-side buffering. Maybe there's already a crate for this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants