Skip to content

Conversation

corbantek
Copy link
Contributor

@corbantek corbantek commented Aug 20, 2025

Description

Why

  • At HubSpot we use TLS GRPC for all our communications
  • We are in the process of upgrading to Vitess 23+ and were unable to bringup vtadmin without vtctldclient TLS
  • This change adds this functionallity

What

  • Reused same utilities from vtctlclient
  • Registers vtadmin to use grpcclientcommon
  • Updates vtadmin proxy dial options to use grpc secure dial options when possible

Related Issue(s)

Feature Request: expose TLS related flags for vtadmin config

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Deployment Notes

- Reused same utilities from vtctlclient
- Registers vtadmin to use `grpcclientcommon`
- Updates vtadmin proxy dial options to use grpc secure dial options when possible

Signed-off-by: Kyle Johnson <[email protected]>
Copy link
Contributor

vitess-bot bot commented Aug 20, 2025

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Aug 20, 2025
@github-actions github-actions bot added this to the v23.0.0 milestone Aug 20, 2025
Copy link

codecov bot commented Aug 20, 2025

Codecov Report

❌ Patch coverage is 78.57143% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.55%. Comparing base (8d90d26) to head (0e4aaad).
⚠️ Report is 25 commits behind head on main.

Files with missing lines Patch % Lines
go/cmd/vtadmin/main.go 57.14% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #18556      +/-   ##
==========================================
+ Coverage   67.49%   67.55%   +0.06%     
==========================================
  Files        1607     1607              
  Lines      263104   263337     +233     
==========================================
+ Hits       177569   177889     +320     
+ Misses      85535    85448      -87     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@dbussink
Copy link
Contributor

@corbantek Can you also add a test for this?

@corbantek
Copy link
Contributor Author

corbantek commented Aug 25, 2025

@corbantek Can you also add a test for this?

@dbussink Do you have any specifics on what you'd like to be tested? I honestly didn't change anything other than what's already being used by Vitess in vtctldclient and others...

Let me poke around in the meantime, but any advice it appreciated.

go/cmd/vtadmin/main_test.go
- Tests for the vtadmin main.go changes
- Validates that TLS-related flags are properly registered when grpcclientcommon.RegisterFlags() is called
- Tests flag registration functionality and command structure

go/vt/vtadmin/vtctldclient/proxy_test.go
- Added tests for the proxy dial method changes
- Tests TLS configuration via grpcclientcommon.SecureDialOption
- Tests credential handling with the new TLS option ordering
- Tests error scenarios for SecureDialOption
- default behavior uses insecure connection: Tests that the new grpcclientcommon.SecureDialOption() approach works the same as the original insecure.NewCredentials() approach when no TLS flags are configured
- backward compatibility with existing cluster configurations: Creates a real server connection using the standard New() function to verify existing vtadmin cluster configurations continue to work
- insecure connection behavior is preserved: Tests that grpcclientcommon.SecureDialOption() returns valid dial options with default parameters
- dial works with default configuration: Verifies the dial method works exactly the same as before the changes, ensuring backward compatibility for existing deployments
- SecureDialOption maintains compatibility: Confirms that the replacement function works as expected in the default case (no TLS configuration provided)

go/vt/vtctl/grpcclientcommon/dial_option_test.go
- Tests for the grpcclientcommon package changes
- Tests flag registration functionality
- Tests SecureDialOption() function behavior
- Tests the vtadmin registration in the init function
- Tests global variable handling and flag integration
- default behavior is insecure for backward compatibility: Tests that when TLS flags are empty (default state), SecureDialOption() returns insecure credentials, maintaining backward compatibility
- vtadmin registration in init function: Documents and tests that vtadmin was properly added to the init function to enable TLS flag registration

go/vt/vtadmin/vtctldclient/grpc_integration_test.go
- Integration tests for the complete flow
- Tests the integration between vtadmin's proxy and grpcclientcommon.SecureDialOption
- Tests dial option ordering (TLS first, then credentials, then resolver)
- Tests error handling scenarios

Signed-off-by: Kyle Johnson <[email protected]>
@corbantek
Copy link
Contributor Author

@corbantek Can you also add a test for this?

@dbussink Do you have any specifics on what you'd like to be tested? I honestly didn't change anything other than what's already being used by Vitess in vtctldclient and others...

Let me poke around in the meantime, but any advice it appreciated.

Nevermind, I tried sticking an agent on this and had it create/add tests for everything I changed (even added some backwards compatible tests for non-TLS)

Signed-off-by: Kyle Johnson <[email protected]>
@dbussink
Copy link
Contributor

Nevermind, I tried sticking an agent on this and had it create/add tests for everything I changed (even added some backwards compatible tests for non-TLS)

This generally means you need to clean things up there. It has the tendency to add way too much, violate formatting etc. So it can be a useful starting point, but let's make sure to reduce it then to a minimal useful test. I now see a big wall of text that does way more than it needs to.

Do any of the other binaries have tests to verify the minimal necessary thing here?

@corbantek
Copy link
Contributor Author

Nevermind, I tried sticking an agent on this and had it create/add tests for everything I changed (even added some backwards compatible tests for non-TLS)

This generally means you need to clean things up there. It has the tendency to add way too much, violate formatting etc. So it can be a useful starting point, but let's make sure to reduce it then to a minimal useful test. I now see a big wall of text that does way more than it needs to.

Do any of the other binaries have tests to verify the minimal necessary thing here?

Actually re-reviewing some of these test and tracing them, they are crap. Let me look at this again tomorrow and see if I can figure out a way to actually test.

Some of these tests looked like they were building up mocks but when I actually traced they were just testing themselves...

I find testing my change to proxy.go the most difficult...

@corbantek corbantek force-pushed the add-tls-grpc-to-vtadmin-external-pr branch from 761d754 to d7bedd8 Compare August 25, 2025 21:39
t.Run("grpc tls flags are registered", func(t *testing.T) {
certFlag := testCmd.Flags().Lookup("vtctld-grpc-cert")
require.NotNil(t, certFlag, "vtctld-grpc-cert flag should be registered")
assert.Equal(t, "", certFlag.DefValue, "vtctld-grpc-cert should have empty default value")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style wise we generally don't add all the string descriptions to testify assertions, since they are very often superfluous. Only if they really provide very explicit value, we should add them. This applies here to all the assertions we have.


// Test that grpcclient.SecureDialOption fails with these files
_, err = grpcclient.SecureDialOption(tmpCert.Name(), tmpKey.Name(), "", "", "")
assert.Error(t, err, "SecureDialOption should fail with invalid cert/key files")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this testing SecureDialOption here? That seems not really a concern for testing in this package?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If SecureDialOption somehow changed this would catch that here so we don't have a bogus test below that always passes...

I can remove if you feel its not needed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@corbantek Testing for SecureDialOption needs to be in the package where that function lives really. Not outside of it here. I'm not really sure how this test is also then related to the original issue fixed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only added changed these lines in the original PR:

    tlsOpt, err := grpcclientcommon.SecureDialOption()
	if err != nil {
		return err
	}

You asked me to write unit tests for this PRs changes...

The current unit tests for proxy.go already covered this entire file EXCEPT

	if err != nil {
		return err
	}

This test builds up a scenario where the arguments to SecureDialOption are invalid causing it throw an error and reach return err (the only line not covered by tests)

- Remove unneeded assert text
- Remove safe-guard for grpcclient.SecureDialOption error possibly changing

Signed-off-by: Kyle Johnson <[email protected]>
@corbantek corbantek requested a review from dbussink August 27, 2025 20:32
@corbantek corbantek changed the title Add TLS support to vtadmin GRPC Add vtctldclient TLS support to vtadmin GRPC Aug 27, 2025
@corbantek
Copy link
Contributor Author

This is ready for re-review. All comments addressed and all updated lines covered by unit tests.

@corbantek
Copy link
Contributor Author

@dbussink Bumping this for review. Anything else you'd like me to address?

@dbussink dbussink added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: VTAdmin VTadmin interface Component: vtctldclient and removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Sep 2, 2025
@corbantek
Copy link
Contributor Author

I'm not able to re-run that [onlineddl_vrepl_suite] test suite. I feel its just flakey because I didn't change anything in that area of code.

Could someone please re-run and merge on sucess?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: VTAdmin VTadmin interface Component: vtctldclient Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants