-
-
Notifications
You must be signed in to change notification settings - Fork 2k
MDEV-28730 Remove internal parser usage from InnoDB fts #4443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
|
In addition to the CI failures needing correcting, does this mean Great to see the parser going away. |
04ec1e9 to
ff6a64d
Compare
dr-m
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some quick initial comments.
b672350 to
53f237a
Compare
53f237a to
edabb01
Compare
dr-m
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some more comments. The error propagation is better now, but I would like to see some more effort to avoid the number of dict_sys.latch acquisitions. This should be tested as well, in a custom benchmark.
Even though we are adding quite a bit of code, I was pleasantly surprised that the size of a x86-64 CMAKE_BUILD_TYPE=RelWithDebInfo executable would increase by only 20 KiB. I believe that removing the InnoDB SQL parser (once some more code has been refactored) would remove more code than that.
storage/innobase/handler/i_s.cc
Outdated
| if (UNIV_LIKELY(error == DB_SUCCESS || | ||
| error == DB_RECORD_NOT_FOUND)) | ||
| { | ||
| fts_sql_commit(trx); | ||
| if (error == DB_RECORD_NOT_FOUND) error = DB_SUCCESS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for committing and re-starting the transaction after each iteration? Is it one transaction per fetched row?
Here, the second if had better be removed. A blind assignment error= DB_SUCCESS should be shorter and incur less overhead. It is basically just zeroing out a register.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not single word. If no starting word is specified, fetch all words.
Otherwise, fetch words starting from the given word. This supports pagination
when memory limits are exceeded. There could be more words in all auxiliary tables and doesn't make sense
to open transaction for too long time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to address this issue as a seperate one. Because this is what fulltext behaviour before this patch.
Yes, I agree that btr_cur_t iteration is enough to retrieve all the words from auxiliary table.
storage/innobase/handler/i_s.cc
Outdated
| fts_sql_rollback(trx); | ||
| if (error == DB_LOCK_WAIT_TIMEOUT) | ||
| { | ||
| ib::warn() << "Lock wait timeout reading FTS index. Retrying!"; | ||
| trx->error_state = DB_SUCCESS; | ||
| } | ||
| else | ||
| { | ||
| ib::error() << "Error occurred while reading FTS index: " << error; | ||
| break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which index and table are we reading? Why are we not disclosing the name of the index or the table?
Please, let’s avoid using ib::logger::logger in any new code, and invoke sql_print_error or sql_print_warning directly.
Is this code reachable? How would a lock wait timeout be possible?
Can this ever be a locking read? When and why would it need to be one? After all, as the code stands now, we are committing the transaction (and releasing any locks) after every successful iteration. Hence, there will be no consistency guarantees on the data that we are reading.
"Auxiliary table" in the function comment is inaccurate. Can we be more specific? Is this always reading entries from a partition of an inverted index? Which functions can write these tables? (What are the potential conflicts?)
Do we even need a transaction object here, or would a loop around btr_cur_t suffice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced ib:: with sql_print_. Yet to address how lock wait can happen.
9f34b2c to
166235e
Compare
dr-m
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to review the MVCC logic, but I did not fully understand it.
It would be much more convenient to review this if the tables or indexes on which the new code is expected to be invoked were prominently documented in debug assertions.
storage/innobase/row/row0query.cc
Outdated
| mem_heap_t* version_heap= nullptr; | ||
| mem_heap_t* offsets_heap= nullptr; | ||
| rec_offs* offsets= nullptr; | ||
| rec_offs* version_offsets= nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we not allocating offsets and version_offsets from the stack, and forcing memory to be allocated from the heap every single time?
Could we cope with a single mem_heap_t object here? Or none at all? This function is supposed to be run on some tables whose schema we know in advance, right? Can this ever be run on a user-defined table? Unfortunately, in none of the callers of QueryExecutor::process_record_with_mvcc() did I find any assertion on the table names or the schema.
| dberr_t QueryExecutor::process_record_with_mvcc( | ||
| dict_index_t *clust_index, const rec_t *rec, | ||
| RecordCallback &callback, dict_index_t *sec_index, | ||
| const rec_t *sec_rec) noexcept | ||
| { | ||
| ut_ad(m_mtr.trx); | ||
| ut_ad(srv_read_only_mode || m_mtr.trx->read_view.is_open()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are missing ut_ad(sec_index->is_normal_btree()) and possibly other assertions.
Is this known to be limited to some specific tables or indexes? For example, if the table is not a FTS_ internal table and not mysql.innodb_index_stats or mysql.innodb_table_stats, would we know that the sec_index is FTS_DOC_ID_INDEX(FTS_DOC_ID)? Documenting the intended usage with debug assertions would make it easier to review this and to suggest possible optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the assert in lookup_clustered_record() saying secondary index is only FTS_DOC_ID_IDX
storage/innobase/row/row0query.cc
Outdated
| if (!rec_get_deleted_flag(result_rec, | ||
| clust_index->table->not_redundant())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In row_sel_sec_rec_is_for_clust_rec() this condition would be checked for every record version that row_sel_get_clust_rec() or Row_sel_get_clust_rec_for_mysql::operator()() is processing, but here we do it only after fetching the clustered index record version. I think that this should be OK. But, I would have appreciated a source code comment that refers to the other implementation of this logic.
However, there seems to be a bigger problem that here we are checking at most one earlier version, instead of traversing all versions that are visible in the current read view. This might be OK if this function is only going to be invoked on some specific FTS_ tables. But then there should be debug assertions that would document the assumptions about the tables, as well as source code comments that explain why we only check for one older version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Row_sel_get_clust_rec_for_mysql::operator() directly. I thought better to use existing function
Introduce QueryExecutor to perform direct InnoDB record scans
with a callback interface and consistent-read handling.
Also handles basic DML operation on clustered index of the table
Newly Added file row0query.h & row0query.cc
QueryExecutor class the following apis
read(): iterate clustered index with RecordCallback
read_by_index(): scan secondary index and fetch clustered row
lookup_clustered_record(): resolve PK from secondary rec
process_record_with_mvcc(): build version via read view
and skip deletes
insert_record(): Insert tuple into table's clustered index
select_for_update(): Lock the record which matches with search_tuple
update_record(): Update the currently selected and X-locked
clustered record.
delete_record(): Delete the clustered record identified by tuple
delete_all(): Delete all clustered records in the table
replace_record(): Tries update via select_for_update() +
update_record(); if not found, runs insert_record
read(), read_by_index(): Offset computation happens only
when there is need to construct the previous version of
the record or when there is a clustered index lookup
Add FTSQueryExecutor class as a thin abstraction over QueryExecutor.
This class takes care of open, lock, read, insert, delete
for all auxiliary tables INDEX_[1..6], common FTS tables
(DELETED, DELETED_CACHE, BEING_DELETED, CONFIG..)
FTSQueryExecutor Class which has the following function:
Auxiliary table functions : insert_aux_record(),
delete_aux_record(),
read_aux(), read_aux_all()
FTS common table functions : insert_common_record(),
delete_common_record(),
delete_all_common_records(),
read_all_common()
FTS CONFIG table functions : insert_config_record(),
update_config_record(),
delete_config_record(),
read_config_with_lock()
Introduce CommonTableReader callback to collect doc_id_t from
fulltext common tables (DELETED, BEING_DELETED, DELETED_CACHE,
BEING_DELETED_CACHE). These table share the same schema strucutre.
extract_common_fields(): Extract the common table fields
Introduce ConfigReader callback to extract key, value from
fulltext config common table (CONFIG). This table has
<key, value> schema
extract_config_fields(): Reads the config table fields
from the record without creating offsets
Introduce AuxCompareMode and AuxRecordReader to scan FTS auxiliary
indexes with compare+process callbacks.
extract_aux_fields() : extracts the fields of the auxiliary table
from the record without creating offsets
Removed fts0sql.cc file, commented and unused fts funtions
Removed fts_table_t usage from fts_query_t and fts_optimize_t
fts_write_node(): Refined to use FTSQueryExecutor and fts_aux_data_t.
Change fts_select_index{,_by_range,_by_hash} return type
from ulint to uint8_t and simplify return flow.
fts_query(): fts_query code now uses
QueryExecutor::read(), read_by_index() with RecordCallback
fts_optimize_write_word(): To delete (or) insert via
FTSQueryExecutor::delete_aux_record()/insert_aux_record()
using fts_aux_data_t;
fts_optimize_table() : Assigns thd to transaction even it is
called via user_thread or fulltext optimize thread.
InnoDB tries to acquire all auxiliary table and common table
during fts_sync_table(), fts_optimize_table() and fts_query()
at once. This optimization avoids repetitively acquiring
dict_sys.latch
166235e to
8355e6a
Compare
Description
Remove internal parser/SQL-graph usage and migrate FTS paths to QueryExecutor
Introduced QueryExecutor (row0query.{h,cc}) and FTSQueryExecutor abstractions for
clustered, secondary scans and DML.
Refactored fetch/optimize code to use QueryExecutor::read(), read_by_index()
with RecordCallback, replacing SQL graph flows
Added CommonTableReader and ConfigReader callbacks for common/CONFIG tables
Implemented fts_index_fetch_nodes(trx, index, word, user_arg, FTSRecordProcessor, compare_mode)
and rewrote fts_optimize_write_word() to delete/insert via executor with fts_aux_data_t
Removed fts_doc_fetch_by_doc_id() and FTS_FETCH_DOC_BY_ID_* macros, updating callers to
fts_query_fetch_document()
Tightened fts_select_index{,_by_range,by_hash} return type to uint8_t;
Removed fts0sql.cc and eliminated fts_table_t from fts_query_t/fts_optimize_t.*
Release Notes
Removed the sql parser usage from fulltext subsystem
How can this PR be tested?
For QA purpose, Run RQG testing involving Fulltext subsystem
Basing the PR against the correct MariaDB version
mainbranch.PR quality check