You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(note that this is a somewhat large feature request with significant functionality additions)
While using the bytes crate, I noticed a use case that is not currently covered under the existing Buf type. Specifically, when returning non-contiguous chunks of memory, there is no ability to "look ahead" of the current chunk without also advancing the buffer and losing access to the current chunk. This is somewhat possible using the chunks_vectored method on Buf, but only in std environments and without a guarantee that it's implemented by the buffer (i.e. the default impl only allows partial access to the buffer).
In a network programming sense, it may be useful to scan further than the first chunk without consuming the original chunk of data, i.e. when parsing an HTTP header across a ring buffer (like VecDeque).
One way to implement this using the currently available interfaces is to have a Buf type that also implements Clone. The user could then simply advance a new, cloned buffer without advancing the original. But with this, the user is unaware whether Clone would clone the underlying buffer (an expensive operation) or just a cheap reference to a buffer. It is also not clear whether mutations to the new clone would alter the original (i.e. via an Arc). Buf does not provide any way to distinguish between an owned memory region and a reference to a memory region, so there is no way for an existing Buf to be iterated past the first chunk without possible mutations or copies for all existing implementations.
Any change to Buf to make it able to seek to any arbitrary point in the buffer would likely be backward-incompatible, since current implementations of Buf only provide access to the first chunk. So I propose adding a new parent type, SeekBuf, that also implements Buf but adds two new additional methods: chunk_from to get a chunk beginning at some offset in the buffer, and chunk_to to get a chunk ending at some offset in the buffer. Any type that implements Buf can also implement SeekBuf if it supports these features.
When a buffer type implements SeekBuf, we can also use a new BufCursor type to iterate and seek within the buffer without consuming its contents. This new cursor type can implement Iterator, Buf, and SeekBuf to provide full immutable access to the underlying buffer memory, useful for parsing, lookahead, search operations, etc. It can also be used to provide highly-tuned operations across any memory buffer, such as selective SIMD operations that provide search-like behavior (i.e. in HTTP parsing).
SeekBuf will be strictly a superset of the functionality of Buf and will not be automatically implemented for any type which already implements Buf, thus requiring opt-in support from consumers of this crate.
I have already prepared a PR for this feature and will attach it to this issue.
The text was updated successfully, but these errors were encountered:
Great PR, this is very much something I was looking for in bytes. If for whatever reason this isn't accepted, I would suggest it's worthwhile packaging it as a third-party "add-on" crate.
Regarding using BufCursor for parsing and lookahead - would it be possible to implement Clone for its iterator state?
This would allow passing cheap clones of a BufCursors to sub-functions in a main parsing loop, allowing us to eagerly peek at bytes to return a successful parse or continue on to the next parsing attempt, without disturbing the main cursor's position.
I note this could be achieved at present by checking cursor_position and re-slicing the main buffer into a new cursor each time, but in terms of elegance and transparency I feel like simply cloning the BufCursor would be far superior.
Edit: Okay, I just realised this is what subcursors are for, but it's still kind of suboptimal to deal with n levels of nested BufCursor<BufCursor<...>.
That's a great idea! And a very simple change (already added to the PR), thank you, it was definitely an oversight to not include a Clone impl for BufCursor.
(note that this is a somewhat large feature request with significant functionality additions)
While using the
bytes
crate, I noticed a use case that is not currently covered under the existingBuf
type. Specifically, when returning non-contiguous chunks of memory, there is no ability to "look ahead" of the current chunk without also advancing the buffer and losing access to the current chunk. This is somewhat possible using thechunks_vectored
method onBuf
, but only instd
environments and without a guarantee that it's implemented by the buffer (i.e. the default impl only allows partial access to the buffer).In a network programming sense, it may be useful to scan further than the first chunk without consuming the original chunk of data, i.e. when parsing an HTTP header across a ring buffer (like
VecDeque
).One way to implement this using the currently available interfaces is to have a
Buf
type that also implementsClone
. The user could then simply advance a new, cloned buffer without advancing the original. But with this, the user is unaware whetherClone
would clone the underlying buffer (an expensive operation) or just a cheap reference to a buffer. It is also not clear whether mutations to the new clone would alter the original (i.e. via anArc
).Buf
does not provide any way to distinguish between an owned memory region and a reference to a memory region, so there is no way for an existingBuf
to be iterated past the first chunk without possible mutations or copies for all existing implementations.Any change to
Buf
to make it able to seek to any arbitrary point in the buffer would likely be backward-incompatible, since current implementations ofBuf
only provide access to the first chunk. So I propose adding a new parent type,SeekBuf
, that also implementsBuf
but adds two new additional methods:chunk_from
to get a chunk beginning at some offset in the buffer, andchunk_to
to get a chunk ending at some offset in the buffer. Any type that implementsBuf
can also implementSeekBuf
if it supports these features.When a buffer type implements
SeekBuf
, we can also use a newBufCursor
type to iterate and seek within the buffer without consuming its contents. This new cursor type can implementIterator
,Buf
, andSeekBuf
to provide full immutable access to the underlying buffer memory, useful for parsing, lookahead, search operations, etc. It can also be used to provide highly-tuned operations across any memory buffer, such as selective SIMD operations that provide search-like behavior (i.e. in HTTP parsing).SeekBuf
will be strictly a superset of the functionality ofBuf
and will not be automatically implemented for any type which already implementsBuf
, thus requiring opt-in support from consumers of this crate.I have already prepared a PR for this feature and will attach it to this issue.
The text was updated successfully, but these errors were encountered: