-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(batch): support system column _rw_timestamp for tables #19232
base: main
Are you sure you want to change the base?
Conversation
@@ -219,7 +219,8 @@ impl Binder { | |||
.iter() | |||
.enumerate() | |||
.filter_map(|(i, c)| { | |||
(!c.is_generated()).then_some(InputRef::new(i, c.data_type().clone()).into()) | |||
c.can_dml() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean that we (previously) also allow DML on __row_id
column? Shall we ban this as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We allow DML like deletes on _row_id
column. I think no need to ban it.
// `get_row` doesn't support select `_rw_timestamp` yet. | ||
assert!(self.epoch_idx.is_none()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we extend the get
interface to also return the user key (thus epoch)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We do have a plan to add an extended interface (get_with_epoch) to return the epoch from the storage. Let's leave it for a later PR.
if !columns.contains(&rw_timestamp_column) { | ||
// Add system column `_rw_timestamp` to every table, but notice that this column is never persisted. | ||
columns.push(rw_timestamp_column); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a bit hack to me. Since it's never persisted, can we use a separate fields to avoid confusion with real persisted columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to persist this column because we have many already persisted tables in prod. Users won't want to rebuild their tables to use this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree with that. I'm just wondering if it's possible to not have this column in runtime TableCatalog
at all, as all other columns are persisted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I admit it is a bit hacky. If you take a look at how LogicalScan
constructs the schema, it uses table catalog columns directly. If we want to keep the table catalog the same as before, we need to add complexity to the table scan and change how it represents the schema, and I think TableScan
schema construction is already quite complex. So how about adding an additional field to ColumnDesc
to distinguish the system column from others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So how about adding an additional field to ColumnDesc to distinguish the system column from others
+1
if let Some(timer) = timer { | ||
timer.observe_duration() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the timer should be moved to the outer scope, below the if-else block, to capture the time it takes to get a row, whether from get_row
or from the iter interface.
.batch_chunk_iter_with_pk_bounds( | ||
epoch.into(), | ||
&pk_prefix, | ||
range_bounds, | ||
false, | ||
1, | ||
PrefetchOptions::new(false, false), | ||
) | ||
.await?; | ||
pin_mut!(iter); | ||
let chunk = iter.next().await.transpose().map_err(BatchError::from)?; | ||
if let Some(chunk) = chunk { | ||
let row = chunk.row_at(0).0.to_owned_row(); | ||
Ok(Some(row)) | ||
} else { | ||
Ok(None) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactoring this into a separate method like get_row_with_rw_timestamp
, and calling it here seems more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of change is caused by planner tests, the core change is about 200+ LOC.
Why _rw_timestamp
must appear in the LogicalScan. Can we prune it in the LogicalScan?
After column pruning, this column will be pruned. |
Ohhh SORRY for misreading |
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
rw_timestamp
for any tables #11629_rw_timestamp
(datatype is timestamptz) for tables. We will add a hidden column_rw_timestamp
to every table catalog when loading it from a proto, but we never persist this column info._rw_timestamp
in a batch query. Using it in a streaming query will cause an error.epoch_idx
to the storage table which indicates the position where we should put the epoch into._rw_timestamp
is selected.Example:
Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.
_rw_timestamp
system column for tables. Users can use a batch query to select this column to check the internal epoch/timestamp of each row which is useful when you want to know when the rows have been updated recently.