Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update proxy to TiKV(2f3f32d9a0de2ebdb987c00b2419761cfbda4556) #412

Merged
merged 80 commits into from
Jan 20, 2025

Conversation

CalvinNeo
Copy link
Member

@CalvinNeo CalvinNeo commented Jan 13, 2025

What is changed and how it works?

Issue Number: Close #xxx

The commit before is 396724d.

What's Changed:


Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Release note


SpadeA-Tang and others added 30 commits October 31, 2024 02:12
ref tikv#16141

rearrange parts of metrics panel

Signed-off-by: SpadeA-Tang <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#16141

Add test to simulate insertion of 200MB (logical size) of TiDB unqiue
index and secondary index records and measure SkiplistEngine memory
usage.
Test results:
* For secondary index
  * The key-value encoding amplification is approximately 3.10
  * SkiplistEngine amplification is approximately 7.66
* For unique index
  * The key-value encoding amplification is approximately 3.38
  * SkiplistEngine amplification is approximately 8.19

Signed-off-by: Neil Shen <[email protected]>
…v#17629)

ref tikv#17459

Track the number of locks of large txns in resolver

Signed-off-by: ekexium <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
)

close tikv#12587, fix tikv#16001

To fix the issue where slow region destruction can block snapshot
generation, this PR moves the snapshot generation logic out of the
region worker. A new worker is added to handle snap gen requests but it 
reuses the existing snap generator pool, so the change doesn't 
introduce any new threads.   

This is a simpler approach than the earlier attempt because it doesn't 
deal with the interactions between snapshot apply and destroy. Since 
snapshot generation has always been an independent task handled by its 
own thread pool, this change does not add significant complexity.

Signed-off-by: Bisheng Huang <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#15990

Add fsm schedule related metrics

Signed-off-by: Connor <[email protected]>
Signed-off-by: Connor1996 <[email protected]>

Co-authored-by: Bisheng Huang <[email protected]>
close tikv#12371

* switch kms to aws_sdk lib
* switch s3 to aws_sdk lib

Signed-off-by: Andrey Koshchiy <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…kv#17747)

ref tikv#16141

use stop-load-threshold for loading new regions

Signed-off-by: SpadeA-Tang <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#17711

Deprecate write_global_seq, since it is by default false.

Signed-off-by: Yang Zhang <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…17730)

close tikv#17728

Use min_lock_ts-1 as the candidate of resolved-ts, to ensure resolved_ts < lock.min_commit_ts( <= commit_ts).

Signed-off-by: ekexium <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Co-authored-by: you06 <[email protected]>
ref tikv#16141, close tikv#17762

Let in_memory_engine's config`evict-threshold` and `stop-load-threshold`
default value generated from `capacity`.

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…v#17771)

close tikv#17767

IME observes all peer destroy events to timely evict regions. By adding
a new peer, the old and uninitialized peer will be destroyed and IME
must not panic in this situation.

Signed-off-by: Neil Shen <[email protected]>
close tikv#17572

Signed-off-by: RidRisR <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#15990

Raft waterfall metrics track the duration of individual requests, all 
beginning from the same starting point (when the async write request is 
scheduled) but ending at various stages of the write process. Previous 
descriptions did not make that clear and may confuse the readers. This 
commit improves the grafana descriptions for clarity.

Signed-off-by: Bisheng Huang <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
)

close tikv#17696

* take cdc tasks into memory quota to prevent the TiKV OOM caused by too many pending tasks

Signed-off-by: Neil Shen <[email protected]>
Signed-off-by: 3AceShowHand <[email protected]>

Co-authored-by: Neil Shen <[email protected]>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…tikv#17643)

close tikv#17363

Allow leader transfer if conf change applied on transferee.

Signed-off-by: hhwyt <[email protected]>

Co-authored-by: Bisheng Huang <[email protected]>
…ikv#17765)

close tikv#17383, close tikv#17760

To address the corner case where a read thread encounters a panic due to reading with a stale index from the `Memtable` in raft-engine, which has been updated by a background thread that has already purged the stale logs.

Signed-off-by: lucasliang <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#17788

Avoid can `on_gc_finished` when a new GC task is not run because there is another unfinished task.

Signed-off-by: glorv <[email protected]>
…ikv#17515)

ref tikv#16141

This commit adjusts the following in-memory-engine defaults:

* `capacity`: Now IME uses 10% of the block cache and takes an equal
  amount of memory from the system. This is based on tests showing that
  the IME rarely fills its full capacity.
* `mvcc_amplification_threshold`: Change from 100 to 10 which benefit
  common workloads like TPCc (50 warehouse), saving approximately 20%
  of unified read pool CPU usage.

Also, it addresses two security issues:

* Remove ignore of RUSTSEC-2024-0006, as vulnerable shlex 0.1.1 is
  removed by tikv#13814
* Upgrade hashbrown from yanked 0.15.0 to 0.15.1

Signed-off-by: Neil Shen <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…d twice (tikv#17798)

close tikv#17797

If the last call `prepare_for_region` returns `NotInCache`,
`clear_written_regions` can be called twice in both `write_impl` and
`clear`, which will cause panic. This pr changes `clear_written_regions`
to consume `self.written_regions`to avoid this kind of duplicate clear.

Signed-off-by: glorv <[email protected]>
ref tikv#16141

handle error when getting regions info

Signed-off-by: SpadeA-Tang <[email protected]>
close tikv#17701

add write batch limit for raft command batch

Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
close tikv#17631

Added a new crate named `compact-log-backup`. Now it can merge some log files generated by log backup and make them become SSTs.

Signed-off-by: hillium <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#17836

This commit adds metrics to track Raft snapshots that are dropped 
during sending or receiving due to concurrency limits. These metrics 
help identify bottlenecks during scaling.

Signed-off-by: Bisheng Huang <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…kv#17625)

close tikv#12410

This pr make the `campaign` of the newly splitted regions triggered in time, when the leadership of the parent region is stable after `on_role_changed`.

Signed-off-by: lucasliang <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…17841)

close tikv#17840

Skip handling remain raft messages after peer fsm is stopped. This can avoid potential panic if the raft message need to read raft log from raft engine.

Signed-off-by: glorv <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#17852

expr: fix panic when using radians and degree

Signed-off-by: gengliqi <[email protected]>
)

close tikv#17830

Signed-off-by: joccau <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…ax_batch_size. (tikv#17821)

close tikv#17101

Increase the default raft_client_queue_size and raft_msg_max_batch_size.

This PR addresses an issue where too many Raft messages can delay 
sending, increasing the commit log duration and the heartbeat latency. 
The delayed heartbeats can lead to leader drops, especially during PD 
restarts that trigger a surge of hibernated regions. About this scenario, 
see more details at: tikv#17101.

We increased the raft_client_queue_size to prevent Raft messages from 
being dropped when the RaftClient queue becomes full under too many 
message workloads. Additionally, we increased the raft_msg_max_batch_size 
to improve the efficiency of Raft message sending.

Signed-off-by: hhwyt <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…sts (tikv#17500) (tikv#17870)

close tikv#17394

lock_manager: Skip updating lock wait info for non-fair-locking requests

This is a simpler and lower-risky fix of the OOM issue tikv#17394 for released branches, as an alternative solution to tikv#17451 .
In this way, for acquire_pessimistic_lock requests without enabling fair locking, the behavior of update_wait_for will be a noop. So that if fair locking is globally disabled, the behavior will be equivalent to versions before 7.0.

Signed-off-by: MyonKeminta <[email protected]>
glorv and others added 14 commits December 26, 2024 13:23
…by (tikv#18061)

close tikv#18060

Use regex expression in panel seriesOverrides to let it compatible with the optional "additional_groupby" alias.

Signed-off-by: glorv <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…f invalid max-ts update (tikv#18057)

close tikv#18055

concurrency_manager: double check via PD TSO before reporting error of invalid max-ts update

Signed-off-by: ekexium <[email protected]>
close tikv#17618

Fix a bug that wrongly truncates the string when the charset is gbk/gb18030

Signed-off-by: cbcwestwolf <[email protected]>
…tered (tikv#18066)

close tikv#18065

Print more information in logs when default not found error is encounterred.

Signed-off-by: cfzjywxk <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#15990

Export the number of currently running background jobs to help diagnose
potential compaction bottlenecks.

Signed-off-by: Neil Shen <[email protected]>

Co-authored-by: Bisheng Huang <[email protected]>
…ction (tikv#18085)

close tikv#18084

`min_input_ts` and `max_input_ts` will present in a log files compaction.

Signed-off-by: hillium <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#15990

Fixed a typo: `Migartion` -> `Migration`.

Signed-off-by: hillium <[email protected]>
ref tikv#18055

When validating max-ts updates, do not report error or panic unless confirmed by PD TSO.
This reduces both false positive and false negative cases.

Signed-off-by: ekexium <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#17894

build: update Dockerfile for build and test

Signed-off-by: wuhuizuo <[email protected]>

Co-authored-by: Ti Chi Robot <[email protected]>
close tikv#18026

Added a new RPC endpoint `flush_now` for the service `LogBackup`.

Signed-off-by: 山岚 <[email protected]>
Signed-off-by: hillium <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#18105, ref pingcap/tidb#58238

Adapt ignore rules to make the download can skip some keys larger then specify timestamp

Signed-off-by: 3pointer <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…tial risk to affect data correctness (tikv#18092)

close tikv#18091

gc_worker: Do not do delete_files_in_range on lock cf which has potential risk to affect data correctness

Signed-off-by: MyonKeminta <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#17989

If tso fetch fails, skip updating last_pd_tso.

Signed-off-by: ekexium <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
@ti-chi-bot ti-chi-bot bot added the size/XXL label Jan 13, 2025
@CLAassistant
Copy link

CLAassistant commented Jan 13, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
12 out of 27 committers have signed the CLA.

✅ ekexium
✅ SpadeA-Tang
✅ 3AceShowHand
✅ glorv
✅ joccau
✅ gengliqi
✅ CbcWestwolf
✅ Defined2014
✅ wshwsh12
✅ wuhuizuo
✅ CalvinNeo
✅ ti-chi-bot
❌ hbisheng
❌ overvenus
❌ Connor1996
❌ v01dstar
❌ RidRisR
❌ hhwyt
❌ Tristan1900
❌ YuJuncen
❌ MyonKeminta
❌ cfzjywxk
❌ akoshchiy
❌ hazel1225
❌ 3pointer
❌ LykxSassinator
❌ hicqu
You have signed the CLA already but the status is still pending? Let us recheck it.

Signed-off-by: Calvin Neo <[email protected]>
@CalvinNeo
Copy link
Member Author

/retest

@ti-chi-bot ti-chi-bot bot added the lgtm label Jan 20, 2025
Copy link

ti-chi-bot bot commented Jan 20, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JaySon-Huang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

ti-chi-bot bot commented Jan 20, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-01-20 08:12:11.505727714 +0000 UTC m=+82058.836647119: ☑️ agreed by JaySon-Huang.

@ti-chi-bot ti-chi-bot bot added the approved label Jan 20, 2025
@ti-chi-bot ti-chi-bot bot merged commit 6b553bd into pingcap:raftstore-proxy Jan 20, 2025
2 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.