Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] (inverted index) Refactor Inverted index file writer (#41625) #43527

Closed
wants to merge 3,330 commits into from

Conversation

csun5285
Copy link
Contributor

@csun5285 csun5285 commented Nov 8, 2024

pick from mater #41625

morningman and others added 30 commits October 21, 2024 21:30
…ile` session var to skip checking acid version file in some hive envs. apache#42111 apache#42225 (apache#42226)

bp apache#42111 apache#42225

---------

Co-authored-by: Qi Chen <[email protected]>
apache#42101)

## Proposed changes

pick from master apache#41943 and
apache#42093

<!--Describe your changes.-->
…e#41546) (apache#42205)

pick: apache#41546
when match do not contains slot reference it would throw an exception
when translate to original planner expr.
this kind of message is not need to be recorded

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
## Proposed changes
revert  apache#41222
Issue Number: close #xxx

<!--Describe your changes.-->
## Proposed changes
pick apache#41464 apache#40529 apache#40349 apache#39222
Issue Number: close #xxx

<!--Describe your changes.-->
…gg (apache#42236)

## Proposed changes

pick from master apache#39471

<!--Describe your changes.-->
…ged profile (branch-3.0) (apache#42254)

## Proposed changes
pick apache#40361 
Issue Number: close #xxx

<!--Describe your changes.-->
…text insert and fix reading null string bug apache#42200 (apache#42272)

cherry pick from apache#42200

Co-authored-by: Socrates <[email protected]>
…data to S3 rather than local file system apache#42211 (apache#42271)

cherry pick from apache#42211

Co-authored-by: Tiewei Fang <[email protected]>
…ader change (apache#41677) (apache#42290)

pick (apache#41677)

when master FE node restart, multi table load pause and can not resume:
```
ReasonOfStateChanged: ErrorReason{code=errCode = 2, msg='failed to get stream load plan, errCode = 2, detailMessage = the user is not granted permission to the compute group, ComputeGroupException: CURRENT_USER_NO_AUTH_TO_USE_ANY_COMPUTE_GROUP, you can contact the system administrator and request that they grant you the appropriate compute group permissions, use SQL `GRANT USAGE_PRIV ON COMPUTE GROUP {compute_group_name} TO {user}`'}
```

Due to the loss of cluster name after restart or leader change.
… (apache#42293)

pick (apache#42058)

There is too much routine load task log, five million logs were
generated in 10 minutes.

```
grep 'consumer meet partition eof' be.INFO.log.20240930-164533 | wc -l 
5369624
```
…che#41951) (apache#42081)

cherry-pick from master apache#41951

load failed where not set database in session, should use label's
database if not set database in session

LOAD LABEL test_db.label_111111 ( DATA
INFILE("hdfs://hdfs01:9000/user/") INTO TABLE `test_load_tb`) WITH
BROKER "broker" ( "username" = "user", "password" = "");

ERROR 1105 (HY000): errCode = 2, detailMessage = Current database is not set.
…routine load task (apache#42042) (apache#42292)

pick (apache#42042)

Routine load task timeout is max_batch_interval * 10, but load channel
timeout is max_batch_interval * 2.
freemandealer and others added 20 commits November 7, 2024 15:53
…#43401)

pick apache#41818 from master

1. async deletion when do stale rowsets reclycle
2. minimize lock critical size
3. add cache lock held & wait time info for debug
…e#41607) (apache#43405)

pick from master apache#41607

variable version:
000-100: doris-2.0.x
100-200: doris-2.1.x
200-300: doris-3.0.x

update variables

000:
nereids_timeout_second = 30 if original value is 5

100:
enable_nereids_dml = true
enable_nereids_dml_with_pipeline = true
enable_nereids_planner = true
enable_fallback_to_original_planner = true
enable_pipeline_x_engine = true

200:
enable_fallback_to_original_planner = false
…che#41996 (apache#43454)

If users use different LOCAL_DORIS_PATH, their clusters' network maybe
conflict. So let different user use different searched start subnet.

cherry-pick: apache#41996
…e delete bitmap catch exception (apache#43088)

PR Body: Now mow table lock is released on ms when doing commit txn,
however if calculate delete bitmap failed before commiting txn, this
lock will not release which will lead to another loading task hang on
geting mow lock until this lock is expired on last txn.

 
 Cherry-picked from apache#41759

Co-authored-by: huanghaibin <[email protected]>
…typo (apache#43022)

PR Body: ## Proposed changes

DeletePredicatePB should be DeleteSubPredicatePB.

Test case is too ambiguous to add, since this bug is triggered by a huge
random test and failed to find the minimal case. However, this fix is
verified under the wild test that it does works.

Note that this problem may be triggered by another bug, cuz schema in
delete predicate rowset should contain column referred in delete
condition. Even if we don't have this fix, this error should never
happend.

But this error occurred under wild tests, means that schema in delete
predicate rowset is not adaptable with delete condition. I think it is
under some status that delete operation use BE tablet schema rather than
schema from FE, and the former rename operation result in that status.
But I failed to add a test case to reproduce, and think that by no way
will it happend occurding to the related code.
```
(1105, 'errCode = 2, detailMessage = ([172.20.50.7](http://172.20.50.7/))[INTERNAL_ERROR]failed to initialize storage reader. tablet=78026, res=[INTERNAL_ERROR]column not found, name=loc1, table_id=-1, schema_version=2

\t0#  doris::TabletSchema::column(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const at /home/zcp/repo_center/doris_master/doris/be/src/common/status.h:375
\t1#  doris::Status doris::DeleteHandler::_parse_column_pred<doris::DeleteSubPredicatePB>(std::shared_ptr<doris::TabletSchema>, std::shared_ptr<doris::TabletSchema>, google::protobuf::RepeatedPtrField<doris::DeleteSubPredicatePB> const&, doris::DeleteConditions*) at /home/zcp/repo_center/doris_master/doris/be/src/util/expected.hpp:1986
\t2#  doris::DeleteHandler::init(std::shared_ptr<doris::TabletSchema>, std::vector<std::shared_ptr<doris::RowsetMeta>, std::allocator<std::shared_ptr<doris::RowsetMeta> > > const&, long) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:701
\t3#  doris::TabletReader::_init_delete_condition(doris::TabletReader::ReaderParams const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:701
\t4#  doris::TabletReader::_init_params(doris::TabletReader::ReaderParams const&) at /home/zcp/repo_center/doris_master/doris/be/src/common/status.h:499
\t5#  doris::TabletReader::init(doris::TabletReader::ReaderParams const&) at /home/zcp/repo_center/doris_master/doris/be/src/common/status.h:499
\t6#  doris::vectorized::BlockReader::init(doris::TabletReader::ReaderParams const&) at /home/zcp/repo_center/doris_master/doris/be/src/common/status.h:499
\t7#  doris::vectorized::NewOlapScanner::open(doris::RuntimeState*) at /home/zcp/repo_center/doris_master/doris/be/src/common/status.h:499
\t8#  doris::vectorized::ScannerScheduler::_scanner_scan(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>) at /home/zcp/repo_center/doris_master/doris/be/src/common/status.h:388
\t9#  std::_Function_handler<void (), doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>)::$_1::operator()() const::{lambda()apache#1}>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:701
\t10# doris::ThreadPool::dispatch_thread() at /home/zcp/repo_center/doris_master/doris/be/src/util/threadpool.cpp:0
\t11# doris::Thread::supervise_thread(void*) at /var/local/ldb-toolchain/bin/../usr/include/pthread.h:562
\t12# ?
\t13# ?
, backend=[172.20.50.7](http://172.20.50.7/)')
```

```cpp
auto tablet_schema = std::make_shared<TabletSchema>();
    tablet_schema->copy_from(*tablet->tablet_schema());
    if (!request.columns_desc.empty() && request.columns_desc[0].col_unique_id >= 0) {
        tablet_schema->clear_columns();
        // TODO(lhy) handle variant
        for (const auto& column_desc : request.columns_desc) {
            tablet_schema->append_column(TabletColumn(column_desc));
        }
    }
    RowsetSharedPtr rowset_to_add;
    // writes
    res = _convert_v2(tablet, &rowset_to_add, tablet_schema, push_type);
    if (!res.ok()) {
        LOG(WARNING) << "fail to convert tmp file when realtime push. res=" << res
                     << ", failed to process realtime push."
                     << ", tablet=" << tablet->tablet_id()
                     << ", transaction_id=" << request.transaction_id;

        Status rollback_status = _engine.txn_manager()->rollback_txn(request.partition_id, *tablet,
                                                                     request.transaction_id);
        // has to check rollback status to ensure not delete a committed rowset
        if (rollback_status.ok()) {
            _engine.add_unused_rowset(rowset_to_add);
        }
        return res;
    }

    // add pending data to tablet

    if (push_type == PushType::PUSH_FOR_DELETE) {
        rowset_to_add->rowset_meta()->set_delete_predicate(std::move(del_preds.front()));
        del_preds.pop();
    }
``` 
 Cherry-picked from apache#42260

Co-authored-by: Siyang Tang <[email protected]>
…jobs (apache#43404)

Cherry-picked from apache#43262

Signed-off-by: freemandealer <[email protected]>
Co-authored-by: zhengyu <[email protected]>
…g delete bitmap fail and retry (apache#43457)

pr apache#43261 doesn't clear
GetDeleteBitmapResponse correctly, this pr fix this problem
pick pr apache#43358

Co-authored-by: huanghaibin <[email protected]>
…che#43060) (apache#43343)

example 
```
  CREATE STORAGE VAULT IF NOT EXISTS demo_vault
    PROPERTIES (
      "type"="S3",
      ...
      "use_path_style" = "true"
  );
```
…ildcard query (apache#43399)

Cherry-picked from apache#41176

Co-authored-by: qiye <[email protected]>
Co-authored-by: airborne12 <[email protected]>
…he#41625)

## Proposed changes

1. After the normal segment is flushed, the `close_inverted_index` is
directly called to write the final composite file.
2. During compaction, in the first step, the `segment writer `writes the
`bkd index` while writing normal data. In the second step, the` index
compaction` writes the `string index`. In the third step,
`close_inverted_index` is uniformly called for all indexes to write the
final files.
3. The rowset writer uses `InvertedIndexFileCollection` to store all
inverted index file writers, ensuring their lifecycle exists throughout
the entire writing or compaction process.
4. When the rowset writer generates the final rowset through
`build(rowset)`, it can retrieve the index file sizes from the
`InvertedIndexFileCollection` and record them in the rowset meta.
@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@csun5285 csun5285 closed this Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.