Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster: support IngoreInitConfigComps #1987

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

Smityz
Copy link
Contributor

@Smityz Smityz commented Jul 14, 2022

What problem does this PR solve?

If we have a large number of tikv-server, it will cost a lot of time(several hours) to generate config

What is changed and how it works?

./tiup-cluster scale-in Kvstore_UAT_0 --node <IP:port> -y --ignore-config-roles tikv

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Code changes

  • Has exported function/method change
  • Has exported variable/fields change
  • Has interface methods change
  • Has persistent data change

Side effects

  • Possible performance regression
  • Increased code complexity
  • Breaking backward compatibility

Related changes

  • Need to cherry-pick to the release branch
  • Need to update the documentation

Release notes:

NONE

@ti-chi-bot
Copy link
Member

[REVIEW NOTIFICATION]

This pull request has not been approved.

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@CLAassistant
Copy link

CLAassistant commented Jul 14, 2022

CLA assistant check
All committers have signed the CLA.

@ti-chi-bot ti-chi-bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jul 14, 2022
@codecov-commenter
Copy link

codecov-commenter commented Jul 14, 2022

Codecov Report

Base: 56.31% // Head: 50.94% // Decreases project coverage by -5.37% ⚠️

Coverage data is based on head (ad17bd3) compared to base (9e2e464).
Patch coverage: 100.00% of modified lines in pull request are covered.

❗ Current head ad17bd3 differs from pull request most recent head 286e65e. Consider uploading reports for the commit 286e65e to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1987      +/-   ##
==========================================
- Coverage   56.31%   50.94%   -5.37%     
==========================================
  Files         313      312       -1     
  Lines       33492    33481      -11     
==========================================
- Hits        18858    17055    -1803     
- Misses      12415    14212    +1797     
+ Partials     2219     2214       -5     
Flag Coverage Δ
tiup 16.17% <ø> (ø)
unittest ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/cluster/operation/operation.go 80.65% <ø> (ø)
components/cluster/command/prune.go 59.09% <100.00%> (ø)
components/cluster/command/scale_in.go 75.00% <100.00%> (ø)
components/cluster/command/scale_out.go 74.29% <100.00%> (ø)
pkg/cluster/manager/builder.go 67.20% <100.00%> (ø)
components/dm/ansible/worker.go 0.00% <0.00%> (-100.00%) ⬇️
pkg/meta/err.go 0.00% <0.00%> (-76.19%) ⬇️
pkg/cluster/api/error.go 0.00% <0.00%> (-75.00%) ⬇️
pkg/crypto/rand/passwd.go 0.00% <0.00%> (-75.00%) ⬇️
pkg/telemetry/node_info.go 0.00% <0.00%> (-70.73%) ⬇️
... and 53 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@Smityz
Copy link
Contributor Author

Smityz commented Jul 15, 2022

  - Download blackbox_exporter: (linux/amd64) ... Done
  - Download node_exporter: (linux/amd64) ... Done
  - Download alertmanager: (linux/amd64) ... Error
Error: read manifest from mirror(https://tiup-mirrors.pingcap.com/) failed: invalid signature for file root.json: not enough signatures (2) for threshold 3 in root.json

I can't fix this unit test, can someone help me?

@nexustar
Copy link
Collaborator

I think we should resolve that it takes a lot of time(several hours) to generate config, rather than provide a flag to ignore it

@Smityz
Copy link
Contributor Author

Smityz commented Jul 15, 2022

I think we should resolve that it takes a lot of time(several hours) to generate config, rather than provide a flag to ignore it

I don't know why we should generate configs in all nodes when scale in. The config seems unchanged. Could you please explain it to me?

@Smityz
Copy link
Contributor Author

Smityz commented Aug 3, 2022

PTAL

@AstroProfundis
Copy link
Contributor

If any PD node is scaled in, we should re-generate configs for all TiKV and TiDB nodes as they are in the startup scripts. And the Prometheus config is always updated if any node is added or removed from the cluster.

I agree that we don't have to regenerate configs for all nodes in some cases, but that could be quite complex to implement, the current approach is a reasonable workaround.

Could you rename the --ignore-components argument to something like --ignore-config-roles to show that it is for configs? And I think it could be better to mark it as hidden as well.

@Smityz
Copy link
Contributor Author

Smityz commented Sep 1, 2022

Thanks for your explanation @AstroProfundis
I think maybe we can disable regenerate configs when TiDB/TiKV scale in/out or prune? I think this operation is safe.

@AstroProfundis
Copy link
Contributor

Sorry for the delay...

I think maybe we can disable regenerate configs when TiDB/TiKV scale in/out or prune?

I agree, and I think TiFlash is also safe to be ignored, but I'm not 100% sure about that...

@Smityz
Copy link
Contributor Author

Smityz commented Oct 15, 2022

In our production environment(only TiKV cluster), I have used this code many times and found nothing unusual, how about adding this feature as optional? Our cluster has hundreds of nodes, and init config for every node when scaling is really slow.

@AstroProfundis
Copy link
Contributor

I agree that adding it as an optional switch for users to decide what components should be ignored when updating configs could be reasonable.

Could you rename the --ignore-components argument to something like --ignore-config-roles to show that it is for configs? And I think it could be better to mark it as hidden as well.

How about like this?

Copy link
Contributor Author

@Smityz Smityz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have updated name @AstroProfundis
And I don't know what's the meaning of mark it as hidden

@Smityz
Copy link
Contributor Author

Smityz commented Nov 24, 2022

@AstroProfundis It's been a long time, Do you still interested in this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants