-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Refactor] (inverted index) Refactor Inverted index file writer (#41625) #43527
Closed
csun5285
wants to merge
3,330
commits into
apache:master
from
csun5285:pick_41625_to_upstream_branch-3.0
Closed
[Refactor] (inverted index) Refactor Inverted index file writer (#41625) #43527
csun5285
wants to merge
3,330
commits into
apache:master
from
csun5285:pick_41625_to_upstream_branch-3.0
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
cherry pick from apache#42052 Co-authored-by: Tiewei Fang <[email protected]>
… to obtain a connection on FE apache#41735 (apache#42181) cherry pick from apache#41735 Co-authored-by: zy-kkk <[email protected]>
…ile` session var to skip checking acid version file in some hive envs. apache#42111 apache#42225 (apache#42226) bp apache#42111 apache#42225 --------- Co-authored-by: Qi Chen <[email protected]>
…te express is not slot apache#42113 (apache#42223) cherry pick from apache#42113 Co-authored-by: Socrates <[email protected]>
…pache#42188 (apache#42219) cherry pick from apache#41606 apache#42188 --------- Co-authored-by: Tiewei Fang <[email protected]> Co-authored-by: TieweiFang <[email protected]>
apache#42101) ## Proposed changes pick from master apache#41943 and apache#42093 <!--Describe your changes.-->
…e#41546) (apache#42205) pick: apache#41546 when match do not contains slot reference it would throw an exception when translate to original planner expr. this kind of message is not need to be recorded ## Proposed changes Issue Number: close #xxx <!--Describe your changes.-->
…ount on index (apache#41375) (apache#41767) ## Proposed changes pick from apache#41375
## Proposed changes revert apache#41222 Issue Number: close #xxx <!--Describe your changes.-->
## Proposed changes pick apache#41464 apache#40529 apache#40349 apache#39222 Issue Number: close #xxx <!--Describe your changes.-->
…gg (apache#42236) ## Proposed changes pick from master apache#39471 <!--Describe your changes.-->
…ged profile (branch-3.0) (apache#42254) ## Proposed changes pick apache#40361 Issue Number: close #xxx <!--Describe your changes.-->
…e#42275) cherry pick from apache#42217 Co-authored-by: daidai <[email protected]>
…text insert and fix reading null string bug apache#42200 (apache#42272) cherry pick from apache#42200 Co-authored-by: Socrates <[email protected]>
…data to S3 rather than local file system apache#42211 (apache#42271) cherry pick from apache#42211 Co-authored-by: Tiewei Fang <[email protected]>
…pache#42064 (apache#42248) cherry pick from apache#42064 Co-authored-by: Sun Chenyang <[email protected]>
…ader change (apache#41677) (apache#42290) pick (apache#41677) when master FE node restart, multi table load pause and can not resume: ``` ReasonOfStateChanged: ErrorReason{code=errCode = 2, msg='failed to get stream load plan, errCode = 2, detailMessage = the user is not granted permission to the compute group, ComputeGroupException: CURRENT_USER_NO_AUTH_TO_USE_ANY_COMPUTE_GROUP, you can contact the system administrator and request that they grant you the appropriate compute group permissions, use SQL `GRANT USAGE_PRIV ON COMPUTE GROUP {compute_group_name} TO {user}`'} ``` Due to the loss of cluster name after restart or leader change.
…meta service (apache#42148) (apache#42294) pick (apache#42148)
… (apache#42293) pick (apache#42058) There is too much routine load task log, five million logs were generated in 10 minutes. ``` grep 'consumer meet partition eof' be.INFO.log.20240930-164533 | wc -l 5369624 ```
…he#42048) (apache#42258) pick apache#42048 Use fragment ID to manage fragment context
…che#41951) (apache#42081) cherry-pick from master apache#41951 load failed where not set database in session, should use label's database if not set database in session LOAD LABEL test_db.label_111111 ( DATA INFILE("hdfs://hdfs01:9000/user/") INTO TABLE `test_load_tb`) WITH BROKER "broker" ( "username" = "user", "password" = ""); ERROR 1105 (HY000): errCode = 2, detailMessage = Current database is not set.
…routine load task (apache#42042) (apache#42292) pick (apache#42042) Routine load task timeout is max_batch_interval * 10, but load channel timeout is max_batch_interval * 2.
…transaction failed (apache#41946) (apache#42291) pick (apache#41946)
…pache#42289) pick (apache#41529) Add copy into regression test case.
…dir when index file writer open index apache#42207 (apache#42300) cherry pick from apache#42207
…mpaction (apache#42051) (apache#42285) ## Proposed changes bp apache#42051
…2303) ## Proposed changes pr: apache#40871 commitId: da6ac0c pr: apache#41145 commitId: 5e6e4bf
…#43401) pick apache#41818 from master 1. async deletion when do stale rowsets reclycle 2. minimize lock critical size 3. add cache lock held & wait time info for debug
…e#41607) (apache#43405) pick from master apache#41607 variable version: 000-100: doris-2.0.x 100-200: doris-2.1.x 200-300: doris-3.0.x update variables 000: nereids_timeout_second = 30 if original value is 5 100: enable_nereids_dml = true enable_nereids_dml_with_pipeline = true enable_nereids_planner = true enable_fallback_to_original_planner = true enable_pipeline_x_engine = true 200: enable_fallback_to_original_planner = false
…t to be analyzed (apache#43370) (apache#43374) pick from master apache#43370
…amed column (apache#43392) Cherry-picked from apache#43336 Co-authored-by: qiye <[email protected]>
…che#41996 (apache#43454) If users use different LOCAL_DORIS_PATH, their clusters' network maybe conflict. So let different user use different searched start subnet. cherry-pick: apache#41996
…e delete bitmap catch exception (apache#43088) PR Body: Now mow table lock is released on ms when doing commit txn, however if calculate delete bitmap failed before commiting txn, this lock will not release which will lead to another loading task hang on geting mow lock until this lock is expired on last txn. Cherry-picked from apache#41759 Co-authored-by: huanghaibin <[email protected]>
…typo (apache#43022) PR Body: ## Proposed changes DeletePredicatePB should be DeleteSubPredicatePB. Test case is too ambiguous to add, since this bug is triggered by a huge random test and failed to find the minimal case. However, this fix is verified under the wild test that it does works. Note that this problem may be triggered by another bug, cuz schema in delete predicate rowset should contain column referred in delete condition. Even if we don't have this fix, this error should never happend. But this error occurred under wild tests, means that schema in delete predicate rowset is not adaptable with delete condition. I think it is under some status that delete operation use BE tablet schema rather than schema from FE, and the former rename operation result in that status. But I failed to add a test case to reproduce, and think that by no way will it happend occurding to the related code. ``` (1105, 'errCode = 2, detailMessage = ([172.20.50.7](http://172.20.50.7/))[INTERNAL_ERROR]failed to initialize storage reader. tablet=78026, res=[INTERNAL_ERROR]column not found, name=loc1, table_id=-1, schema_version=2 \t0# doris::TabletSchema::column(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const at /home/zcp/repo_center/doris_master/doris/be/src/common/status.h:375 \t1# doris::Status doris::DeleteHandler::_parse_column_pred<doris::DeleteSubPredicatePB>(std::shared_ptr<doris::TabletSchema>, std::shared_ptr<doris::TabletSchema>, google::protobuf::RepeatedPtrField<doris::DeleteSubPredicatePB> const&, doris::DeleteConditions*) at /home/zcp/repo_center/doris_master/doris/be/src/util/expected.hpp:1986 \t2# doris::DeleteHandler::init(std::shared_ptr<doris::TabletSchema>, std::vector<std::shared_ptr<doris::RowsetMeta>, std::allocator<std::shared_ptr<doris::RowsetMeta> > > const&, long) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:701 \t3# doris::TabletReader::_init_delete_condition(doris::TabletReader::ReaderParams const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:701 \t4# doris::TabletReader::_init_params(doris::TabletReader::ReaderParams const&) at /home/zcp/repo_center/doris_master/doris/be/src/common/status.h:499 \t5# doris::TabletReader::init(doris::TabletReader::ReaderParams const&) at /home/zcp/repo_center/doris_master/doris/be/src/common/status.h:499 \t6# doris::vectorized::BlockReader::init(doris::TabletReader::ReaderParams const&) at /home/zcp/repo_center/doris_master/doris/be/src/common/status.h:499 \t7# doris::vectorized::NewOlapScanner::open(doris::RuntimeState*) at /home/zcp/repo_center/doris_master/doris/be/src/common/status.h:499 \t8# doris::vectorized::ScannerScheduler::_scanner_scan(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>) at /home/zcp/repo_center/doris_master/doris/be/src/common/status.h:388 \t9# std::_Function_handler<void (), doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>)::$_1::operator()() const::{lambda()apache#1}>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:701 \t10# doris::ThreadPool::dispatch_thread() at /home/zcp/repo_center/doris_master/doris/be/src/util/threadpool.cpp:0 \t11# doris::Thread::supervise_thread(void*) at /var/local/ldb-toolchain/bin/../usr/include/pthread.h:562 \t12# ? \t13# ? , backend=[172.20.50.7](http://172.20.50.7/)') ``` ```cpp auto tablet_schema = std::make_shared<TabletSchema>(); tablet_schema->copy_from(*tablet->tablet_schema()); if (!request.columns_desc.empty() && request.columns_desc[0].col_unique_id >= 0) { tablet_schema->clear_columns(); // TODO(lhy) handle variant for (const auto& column_desc : request.columns_desc) { tablet_schema->append_column(TabletColumn(column_desc)); } } RowsetSharedPtr rowset_to_add; // writes res = _convert_v2(tablet, &rowset_to_add, tablet_schema, push_type); if (!res.ok()) { LOG(WARNING) << "fail to convert tmp file when realtime push. res=" << res << ", failed to process realtime push." << ", tablet=" << tablet->tablet_id() << ", transaction_id=" << request.transaction_id; Status rollback_status = _engine.txn_manager()->rollback_txn(request.partition_id, *tablet, request.transaction_id); // has to check rollback status to ensure not delete a committed rowset if (rollback_status.ok()) { _engine.add_unused_rowset(rowset_to_add); } return res; } // add pending data to tablet if (push_type == PushType::PUSH_FOR_DELETE) { rowset_to_add->rowset_meta()->set_delete_predicate(std::move(del_preds.front())); del_preds.pop(); } ``` Cherry-picked from apache#42260 Co-authored-by: Siyang Tang <[email protected]>
…jobs (apache#43404) Cherry-picked from apache#43262 Signed-off-by: freemandealer <[email protected]> Co-authored-by: zhengyu <[email protected]>
…g delete bitmap fail and retry (apache#43457) pr apache#43261 doesn't clear GetDeleteBitmapResponse correctly, this pr fix this problem pick pr apache#43358 Co-authored-by: huanghaibin <[email protected]>
…che#43060) (apache#43343) example ``` CREATE STORAGE VAULT IF NOT EXISTS demo_vault PROPERTIES ( "type"="S3", ... "use_path_style" = "true" ); ```
…ildcard query (apache#43399) Cherry-picked from apache#41176 Co-authored-by: qiye <[email protected]> Co-authored-by: airborne12 <[email protected]>
… to regression test (apache#43240) (apache#43417)
…gression test(apache#43123) (apache#43416) (cherry picked from commit a3e8de7)
…he#41625) ## Proposed changes 1. After the normal segment is flushed, the `close_inverted_index` is directly called to write the final composite file. 2. During compaction, in the first step, the `segment writer `writes the `bkd index` while writing normal data. In the second step, the` index compaction` writes the `string index`. In the third step, `close_inverted_index` is uniformly called for all indexes to write the final files. 3. The rowset writer uses `InvertedIndexFileCollection` to store all inverted index file writers, ensuring their lifecycle exists throughout the entire writing or compaction process. 4. When the rowset writer generates the final rowset through `build(rowset)`, it can retrieve the index file sizes from the `InvertedIndexFileCollection` and record them in the rowset meta.
csun5285
requested review from
dataroaring,
morningman,
yiguolei,
xiaokang,
CalvinKirs,
BiteTheDDDDt,
platoneko and
gavinchou
as code owners
November 8, 2024 12:24
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
pick from mater #41625