-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce new deserialization API #1057
base: main
Are you sure you want to change the base?
Conversation
6038ee0
to
671bf1e
Compare
v1.0.1: clippy fixes |
See the following report for details: cargo semver-checks output
|
671bf1e
to
774e839
Compare
v1.0.2: docstrings fixes |
774e839
to
e5982ba
Compare
v1.0.3: tests fixes. |
e5982ba
to
088a9ab
Compare
v1.0.4: docstring fixes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Posting the comments that I already made so that they are not lost
scylla/src/transport/query_result.rs
Outdated
#[derive(Debug, Clone, Copy)] | ||
#[cfg_attr(test, derive(PartialEq, Eq))] | ||
pub struct TableSpecView<'res> { | ||
table_name: &'res str, | ||
ks_name: &'res str, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for only deriving PartialEq and Eq in test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually don't understand why this struct is introduced at all. TableSpec
already has Cow
's that can hold references, and so is parametrized by lifetime. What do we gain by introducing this struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As TableSpec
is defined in scylla-cql
, it must have public API (a public constructor and/or public fields), for scylla
crate to be able to access it. Hence, we don't want to expose TableSpec
to users at all; instead, it's better to only expose types defined in scylla
, because then we can truly control visibility using pub
and pub(crate)
.
This is a pattern that I believe we should employ commonly, because then we will be able to make code adjustments to inner types, even changing their names and/or types, without introducing breaking API changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a public contructor a problem in this case? I don't see why, but probably I don't understand something.
This patterns means basically duplicating a lot of code from scylla-cql. If we have to do this then maybe scylla-cql should not exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- scylla-cql was created specifically to allow code reuse for scylla-proxy. As long as scylla-proxy exists and we don't want to duplicate low-level ser/de code there, we need scylla-cql.
- You see this as duplicating code. I see this as introducing a new layer - stable abstraction. This is something that a stable driver should have, independently of its inner working. I have an impression that Rust driver has not yet developed such layer, and its API more-or-less reflects its inner implementation details. For the sake of future compatibility and more freedom to modify things after 1.0, I believe we need such layer, consisting of exposed yet extremely stable types. Even if in some cases it boils down to an identity-or-nearly-identity layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I hate the dual world of scylla-cql and scylla. I feel unsafe about those pub
s in scylla-cql.
Perhaps we could at least move some part of scylla-cql to scylla. It seems that scylla-proxy does not use the serialization nor deserialization framework anyway (even though I initially intended it to do so - to be capable of e.g. injecting arbitrary data).
scylla/src/transport/query_result.rs
Outdated
/// A view over specification of a column returned by the database. | ||
#[derive(Debug, Clone, Copy)] | ||
#[cfg_attr(test, derive(PartialEq, Eq))] | ||
pub struct ColumnSpecView<'res> { | ||
table_spec: TableSpecView<'res>, | ||
name: &'res str, | ||
typ: &'res ColumnType, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could put Cows in ColumnSpec instead of creating a new type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I answered this in the thread related to TableSpecView.
scylla/src/transport/query_result.rs
Outdated
/// A view over specification of columns returned by the database. | ||
#[derive(Debug, Clone, Copy)] | ||
pub struct ColumnSpecs<'res> { | ||
specs: &'res [ColumnSpec], | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that this type is introduced so that user can use its methods like get_by_index etc?
If so, it will also be redundant if we get rid of TableSpecView / ColumnSpecView.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type is intentionally introduced to provide a stable abstraction over possibly unstable inner types.
scylla/src/transport/iterator.rs
Outdated
/// Stream implementation for TypedRowStream. | ||
/// | ||
/// It only works with owned types! For example, &str is not supported. | ||
impl<RowT> Stream for TypedRowStream<RowT> | ||
where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So a user that uses execute_iter
won't be able to deserialize to non-owned types?
Is this limitation permanent or is it possible to lift it in the future (e.g. by getting rid of fetching Tokio task as we discussed offline one day)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is a permanent limitation then it is a severe one which is quite sad...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So a user that uses
execute_iter
won't be able to deserialize to non-owned types?
Fortunately, no!
The @piodul's version of the refactor indeed had that limitation. The reason is that Stream, analogously to Iterator, does not use GATs and hence does not permit lending from itself. Therefore, if we want to have Stream implemented, we can't return borrowed items.
But then I discovered that we can introduce two separate types:
TypedRowIterator
- does not implement Stream, implementsnext()
method that permits lending from itself;TypedRowStream
- implements Stream, but has a limitation for owned types only.
TypedRowIterator is convertible to TypedRowStream for free. This solves the problem.
088a9ab
to
a0e7f2f
Compare
v2.0:
|
a0e7f2f
to
aef9803
Compare
v2.1:
|
aef9803
to
93f3174
Compare
v2.1.1:
|
93f3174
to
f104336
Compare
This is a tiny refactor, which makes further commits easier on eyes.
01883f7
to
6b713c1
Compare
Introduces new structs for lazy deserialization of RESULT:Rows frames. Introduces `ResultMetadataHolder`, which is, unsurprisingly, a versatile holder for ResultMetadata. It allows 3 types of ownership: 1) borrowing it from somewhere, be it the RESULT:Rows frame or the cached metadata in PreparedStatement; 2) owning it after deserializing from RESULT:Rows; 3) sharing ownership of metadata cached in PreparedStatement. Introduces new structs representing the RESULT:Rows CQL frame body, in various phases of deserialization: - `RawRows` only deserializes (flags and) paging size in order to pass it directly to the user; keeps metadata and rows in serialized, unparsed form; - `RawRowsWithDeserializedMetadata` deserializes metadata, but keeps rows serialized; `RawRowsWithDeserializedMetadata` is lifetime-generic and can be deserialized from `RawRows` to borrowed or owned form by corresponding methods on `RawRows`. Co-authored-by: Wojciech Przytuła <[email protected]>
The iterator is analogous to RowIterator, but instead of borrowing from external frame bytes and metadata with 'frame lifetime, it owns them and lends them from itself. Thus, it cannot implement Iterator trait. It does not, however, prevent us from exposing a `next()` method on it. The iterator is going to be used in new iterator API for queries (i.e., the one connected to `{query,execute}_iter`), where borrowing is not suitable (or even possible) due to the design of that API. Tests of RowIterator are shared with the new RawRowsLendingIterator, by introducing a new LendingIterator trait using GATs. Due to a bug/limitation in the compiler, some leaking was needed in tests. This is not a problem IMO, however, because tiny structures are leaked, only constant amount of memory is leaked, and these are only tests.
Soon, a new QueryResult will be introduced with a slightly different API. The old one will be preserved as LegacyQueryResult, in order to make migration easier. This commit renames the existing QueryResult to LegacyQueryResult, as well as the query_result module to legacy_query_result. The new QueryResult will be introduced in later commits.
6b713c1
to
64da5ed
Compare
(Re)-introduces QueryResult. It is quite similar to the old (Legacy)QueryResult, but it keeps rows and metadata in an unparsed state (via RawRows) and has methods that allow parsing the contents using the new API. Helper method names are similar to what the old QueryResult had, just with the `_typed` suffix dropped - as now it is always required to provide the type of rows when parsing them, this suffix sounded redundant. There is one crucial change to the API. Motivation is as follows: 1) `QueryResult` can represent a non-Rows response, so every rows-related operation on `QueryResult` may return "NonRowsResponse" error, which is inconvenient; 2) `QueryResult` is an owned type, so it cannot deserialize metadata in the borrowed flavour (i.e., using strings from the frame bytes directly) and it must allocate metadata (mainly ColumnSpecs) on the heap. The solution for both is to extract a new struct, `RowsDeserializer`, which is parametrized by a lifetime and hence can borrow metadata from the frame. Moreover, one has to handle "NonRowsResponse" error only once, upon `RowsDeserializer` creation. All further methods (`rows(), `single_row()`, etc.) may no longer fail with that error, which provides a clean step of conversion from any Result frame to Result:Rows frame. The drawback is that now there is a new call required in the call chain to deserialize a result, namely `.row_deserializer()`. Co-authored-by: Wojciech Przytuła <[email protected]>
New QueryResult tests are going to require result metadata serialization capabilities, as RawRows keep result metadata in a serialized form.
This reverts commit 6e6e85ea60962512daf992d01872faf730958193.
After the deserialization refactor with lazy result metadata deserialization, we will no longer have access to rows serialized size and rows count in Session and in RowIteratorWorker. Therefore, traces can no longer record that information.
Now, `result::deser_rows` returns RawRows instead of Rows, postponing any actual deserialization of response contents to a later time. The RawRows are pushed down the call stack, converted to the new QueryResult at some point and only converted to LegacyQueryResult where the API requires it. Co-authored-by: Wojciech Przytuła <[email protected]>
Like in the case of LegacyQueryResult, TypedRowIterator will be replaced with a new one, and the old one preserved for compatibility reasons. As a first step, it is renamed to LegacyTypedRowIterator. Similarly, RowIterator is renamed to LegacyRowIterator to mark that it is the part of the old API. There won't be a new RowIterator, though. Co-authored-by: Wojciech Przytuła <[email protected]>
In order to make further commits easier on the eyes, contents of the LegacyRowIterator::poll_next method are moved to the new poll_next_internal method. Co-authored-by: Wojciech Przytuła <[email protected]>
The iterator module has a small number of functions which return Poll<Option<Result<T, E>>>, call other functions that return such a stacked type and are support to continue if value obtained is not of form Ready(Some(Ok(x))). This requires a multiline pattern match instruction, which is basically boilerplate. This commit introduces the ready_some_ok! macro which hides the aforementioned boilerplate. Conceptually, it is similar to the existing std::task::ready! macro. Co-authored-by: Wojciech Przytuła <[email protected]>
This commit refactors the part responsible for acquiring the next page by the LegacyRowIterator to a different function. This change is a preparation necessary to support the new deserialization interface - there will be a method that can return deserialized type that borrows from the current iterator, and - for "lifetimes reasons" - acquiring the next page must be put into a separate method. Co-authored-by: Wojciech Przytuła <[email protected]>
Now, worker-related code comes strictly before iterator/stream-related code. This locality aid readability.
This commit finishes the work related to adjusting the iterators module to the new deserialization framework. The equivalent of the old RowIterator is now RawIterator (notice the vowel change). Despite the name, it cannot actually be iterated on, as it does not have any information about the column types. After calling the `into_typed()` method it gets converted into TypedRowIterator that can actually be iterated on. Unfortunately, due to the limitations of the Stream trait, currently the TypedRowIterator cannot deserialize types that need to borrow from the serialized frame contents. Co-authored-by: Wojciech Przytuła <[email protected]>
It is no longer needed. For compatibility with LegacyQueryResult and LegacyRowIterator, higher-layer conversions suffice.
64da5ed
to
d25a3d8
Compare
Here it is, finally!
After #1004 with preparation, #970 with lower layer and #1024 with derive macros,
this PR introduces the upper layers of the new deserialization framework.
The Big Picture
Legacy
, deprecated and planned to be removed soon:QueryResult
->LegacyQueryResult
(Typed)RowIterator
->Legacy(Typed)RowIterator
RawRows
replaceRows
. They contain deserialized metadata, but rows are still in serialized form.RawIterator
,TypedRowIterator
andTypedRowStream
replaceLegacyRowIterator
andLegacyTypedRowIterator
.QueryResult
replacesLegacyQueryResult
.RawRowsLendingIterator
is added (see details below)QueryResult
's and paging iterators' API is changed wrt column specs inspection (see details below).Session
is made generic over the deserialization API kind, so that both old and new API can be used in the transition period by the users. The choice is done onSessionBuilder
's level, by choosing eitherbuild()
orbuild_legacy()
method.Connection
's API, especiallyquery_iter()
widely used intopology.rs
) are adjusted to use the new API.Collection
instead ofOption<Collection>
, even though the DB represents empty collections as nulls.Collection
instead ofOption<Collection>
.Details
RawRowsLendingIterator
It is added in tandem with the already present (the low level,
scylla_cql
one, not theiterator.rs
one!)RowIterator
. The difference is thatRawRowsLendingIterator
owns the frame and the metadata, whereasRowIterator
borrows it. The lending one is needed in session iterator interface, and the borrowing one is used withQueryResult
.Achieving old behaviour for dynamic deserialization purposes
If one still wants to deserialize arbitrary rows (as the old framework would do) and only then try to convert them dynamically to some proper types, this is doable by using the old
Row
struct as the target type, e.g.ColumnSpecs
The new
ColumnSpecs
struct is introduced that provides view over columns specification, but exposes onlyTableSpecView
andColumnSpecView
structs instead ofTableSpec
andColumnSpec
, allowing us to modify the latter in the future without introducing breaking changes. One planned change is movingtable_spec
out ofcolumn_spec
for reduced memory footprint (see #956), but as CQL protocol allows different table specs for different columns and we can't predict whether joins won't be introduced at some point in the future, we retain this flexibility by imposing a new layer of abstraction (the views).New paging iterators
Instead of
RowIterator
(returningRow
s) andTypedRowIterator
(returning instances of the target type) both implementingStream
, now we have the following:RawIterator
Stream
, because returnsColumnIterator
s that borrow from it,type_check()
andnext()
methods that can be used for low-level, manual deserialization (not recommended for ordinary users)&str
).TypedRowIterator
into_typed::<TargetType>()
onRawIterator
,&str
),Stream
in order to support borrowed types.TypedRowStream
into_stream()
onTypedRowIterator
,Stream
and hence does not support borrowed types.TODO
Fixes: #964
Fixes: #1001
Nearly_resolves: #462
Pre-review checklist
./docs/source/
.Fixes:
annotations to PR description.