-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support NATS TLS Communication #4686
Conversation
This commit add the ability to enable TLS for NATS communication in a cluster. It supports custom CA as well as regular signed certificate by known CA. To Enable TLS for the NATS Server on the orchestrator node and compute node, please check the integration tests in this commit as they have detailed coverage for multiple usecases as well as detailed configuration files. Linear: https://linear.app/expanso/issue/ENG-305/plan-and-implement-nats-tls-enabling-in-core-bacalhau Github: #4681
WalkthroughThis pull request introduces several changes, primarily focused on enhancing TLS (Transport Layer Security) capabilities within the Bacalhau project. Key modifications include the addition of new configuration fields for TLS in various structs, updates to related validation logic, and the introduction of new test suites for validating TLS functionality. Additionally, several new certificate and key files are included to support secure communications. Changes to scripts and configuration files improve usability and flexibility in testing and deployment scenarios. Changes
Assessment against linked issues
Possibly related issues
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 27
🧹 Outside diff range and nitpick comments (41)
test_integration/common_assets/nodes_configs/7_tls_enabled_compute.yaml (1)
1-2
: Document TLS configuration details.While the comment mentions CA cert being embedded in the trust store, it would be helpful to:
- Document the expected CA certificate location in the trust store
- Specify whether client certificate authentication is required
- Document any TLS version or cipher suite requirements
test_integration/common_assets/nodes_configs/8_tls_enabled_compute_with_custom_ca.yaml (1)
8-9
: Security concern: Hardcoded authentication token.While this is a test configuration, using a hardcoded token that suggests it's a secret ("i_am_very_secret_token") could be misleading. Consider:
- Using a more obvious test token name (e.g., "test_token_123")
- Adding a comment indicating this is for testing only
Apply this diff:
Auth: - Token: "i_am_very_secret_token" + # Note: This is a test-only token + Token: "test_token_123"test_integration/common_assets/nodes_configs/7_tls_enabled_orchestrator.yaml (3)
1-3
: Document the port choice rationale.While using port 1234 is fine for testing, it would be helpful to add a comment explaining why this specific port was chosen and whether it has any significance for the tests.
15-17
: Document the purpose of test labels.Consider adding a comment explaining the purpose of these labels in the test configuration and how they are used in the integration tests.
1-17
: Add security requirements documentation.Consider adding a comment block at the top of the file that documents:
- Required certificate formats and key sizes
- Expected file permissions
- Validation requirements for production use
- Links to security best practices for NATS TLS configuration
This will help users understand the security implications when adapting this test configuration for production use.
test_integration/common_assets/nodes_configs/8_tls_enabled_orchestrator_with_custom_ca.yaml (3)
1-3
: Consider making the API port configurableWhile the port 1234 is valid for testing, consider making it configurable through environment variables to avoid potential conflicts during integration tests.
NameProvider: "uuid" API: - Port: 1234 + Port: ${TEST_API_PORT:-1234}
16-18
: Use more descriptive label namesConsider using label names that reflect their purpose in the test scenario (e.g.,
test_scenario
,tls_enabled
).Labels: - label1: label1Value - label2: label2Value + test_scenario: tls_custom_ca + tls_enabled: "true"
1-18
: Consider implementing a configuration validation mechanismTo ensure robust testing, consider implementing:
- A configuration validator that checks for required TLS files
- Environment variable substitution for flexible testing
- A schema definition for this YAML configuration to catch misconfigurations early
This will help prevent test failures due to misconfiguration and make the test suite more maintainable.
test_integration/common_assets/dockerfiles/Dockerfile-JumpboxNode (1)
10-12
: Optimize Dockerfile by combining RUN commands.Multiple RUN commands create unnecessary layers in the Docker image. Consider combining related commands to reduce the image size.
Here's an optimized version:
-# Install Nats CLI -RUN curl -sf https://binaries.nats.dev/nats-io/natscli/[email protected] | sh -RUN mv /app/nats /usr/local/bin/ +# Install Nats CLI +RUN curl -sf https://binaries.nats.dev/nats-io/natscli/[email protected] | sh && \ + mv /app/nats /usr/local/bin/.pre-commit-config.yaml (1)
11-11
: Document the rationale for excluding certificate files.Consider adding a comment explaining why certificate files (including private keys) are excluded from the private key detection, to help future maintainers understand the security implications.
- exclude: 'testdata/.*|test_integration/common_assets/certificates/.*|docker-compose-deployment/certificates/.*' + # Exclude test certificates and keys needed for TLS testing + exclude: 'testdata/.*|test_integration/common_assets/certificates/.*|docker-compose-deployment/certificates/.*'test_integration/README.md (1)
41-46
: Consider adding TLS-specific documentation.Since this PR introduces NATS TLS support, it would be helpful to:
- Add examples of running TLS-specific tests
- Document any environment prerequisites needed for TLS tests (certificates, keys, etc.)
- Explain test naming conventions to help users identify TLS-related tests
Example addition:
./run_locally.sh -c # Compile bacalhau binary only, do not runs tests ./run_locally.sh -s -t TestName # Skip compilation and run specific test + +# Running TLS-specific tests: +./run_locally.sh -t TestNATSTLS # Run TLS communication tests + +# Note: TLS tests require valid certificates and keys in test/integration/common_assets/certs/pkg/config/types/compute.go (1)
14-15
: Enhance field documentation for clarity.The current comment could be more specific about the TLS configuration capabilities.
- // NATS specifies the NATS related configuration on the compute node. + // NATS specifies the NATS configuration for the compute node, including TLS settings + // for secure communication with the NATS server.test_integration/utils/dynamic_json.go (1)
10-20
: Add input validation to prevent processing empty JSON strings.While the implementation is good, consider adding input validation to handle edge cases.
func ParseToDynamicJSON(jsonStr string) (*DynamicJSON, error) { + if len(strings.TrimSpace(jsonStr)) == 0 { + return nil, fmt.Errorf("empty JSON string") + } var data interface{} if err := json.Unmarshal([]byte(jsonStr), &data); err != nil { return nil, fmt.Errorf("failed to parse JSON: %w", err) } return &DynamicJSON{data: data}, nil }test_integration/common_assets/scripts/compute_node_with_start_cmd.sh (3)
3-14
: Enhance script documentation and error handling.The usage example could be more specific to NATS TLS configuration since this script is part of the TLS support PR. Additionally, consider adding signal handling for graceful shutdown.
# Function to display usage and exit usage() { echo "Usage: $0 -c \"command with args\" (Required)" - echo "Example: $0 -c \"bacalhau server --config=...\"" + echo "Example: $0 -c \"bacalhau server --config=/path/to/config.yaml --nats-tls-ca=/path/to/ca.crt\"" exit 1 } +# Handle cleanup on script exit +cleanup() { + echo "Cleaning up..." + pkill dockerd + exit "${1:-0}" +} + +# Set up signal traps +trap 'cleanup 1' SIGTERM SIGINT
52-53
: Consider making Docker daemon path configurable.The script assumes
dockerd-entrypoint.sh
is in PATH. Consider making this configurable for different environments.+# Docker daemon entrypoint script +DOCKER_ENTRYPOINT="${DOCKER_ENTRYPOINT:-dockerd-entrypoint.sh}" + # Start the first docker daemon -dockerd-entrypoint.sh & +${DOCKER_ENTRYPOINT} &
1-81
: Add security considerations for TLS testing.Since this script is part of the NATS TLS support implementation, consider adding validation for TLS-related parameters in the custom command to prevent misconfiguration during testing:
- Verify that TLS certificate paths exist when TLS is enabled
- Check certificate permissions
- Validate certificate expiration dates
Would you like me to provide an implementation for these security checks?
test_integration/run_locally.sh (4)
11-18
: Adjust exit code for help display.The usage function should exit with code 0 when help is explicitly requested via -h, as this is not an error condition. Exit code 1 should be reserved for error scenarios.
usage() { echo "Usage: ${0} [-s] [-t TEST_NAME] [-c]" echo " -s: Skip compilation" echo " -t TEST_NAME: Run specific test by name" echo " -c: Compile only (don't run tests)" echo " -h: Show this help message" - exit 1 + # Exit 0 if help was requested, 1 for invalid usage + if [ "${1:-}" = "help" ]; then + exit 0 + else + exit 1 + fi }
21-39
: Document the shellcheck disable directive.While the command-line parsing is well-implemented, please add a comment explaining why SC2249 is disabled. This helps maintainers understand the reasoning behind bypassing the shellcheck warning.
- # shellcheck disable=SC2249 + # shellcheck disable=SC2249 # Explain: This rule suggests using case instead of if/elif for pattern matching, + # but we're already using case here. This might be a false positive.
68-77
: Add error handling and improve test output.The test running function could benefit from better error handling and more informative output.
run_tests() { + local test_exit_code + if [[ -n "${SPECIFIC_TEST}" ]]; then - echo "Running specific test: ${SPECIFIC_TEST}" + echo "=== Running specific test: ${SPECIFIC_TEST} ===" go test -v -run "${SPECIFIC_TEST}" + test_exit_code=$? else - echo "Running all tests" + echo "=== Running all tests ===" go test -v -count=1 ./... + test_exit_code=$? fi + + if [ ${test_exit_code} -ne 0 ]; then + echo "=== Tests failed with exit code: ${test_exit_code} ===" + return ${test_exit_code} + fi + echo "=== Tests completed successfully ===" }
Line range hint
1-97
: Consider organizing tests into focused suites.Given that this PR introduces NATS TLS support, consider organizing the integration tests into focused suites (e.g., TLS tests, non-TLS tests) to allow for more targeted testing scenarios. This would make it easier to validate specific functionality and maintain test coverage as features evolve.
test_integration/4_orchestrator_config_override_and_flag_and_config_flag_suite_test.go (2)
76-91
: Consider adding WebUI TLS configuration validationWhile the test effectively validates basic WebUI configuration, consider adding assertions for TLS-related settings if the WebUI supports secure communication. This would align with the PR's objective of implementing TLS support.
Example additions:
// Validate TLS configuration if applicable webUITLSEnabled, err := agentConfigJSON.Query("$.WebUI.TLS.Enabled") s.Require().NoError(err) // Add appropriate assertion based on expected configuration
Line range hint
1-91
: Consider expanding test coverage for TLS configurationsWhile the test suite effectively validates configuration overrides and the changes improve code quality, consider expanding it to include comprehensive TLS configuration testing scenarios:
- Validation of TLS certificate paths and configurations
- Testing of TLS fallback behavior
- Verification of TLS-only mode
- Testing invalid certificate scenarios
This would ensure robust coverage of the new TLS functionality being introduced.
test_integration/6_jobs_basic_runs_scenarios_suite_test.go (2)
11-11
: Remove unnecessary empty line change.This change appears to be unintentional and doesn't affect functionality.
Line range hint
42-50
: Consider adding TLS-specific test scenarios.Given that this PR introduces TLS support for NATS communication, consider adding test cases that specifically verify:
- Job execution with TLS enabled
- Job execution with custom CA certificates
- Error handling for invalid certificates
Would you like me to help create additional test cases for TLS scenarios?
test_integration/5_orchestrator_no_config_suite_test.go (1)
Line range hint
54-74
: Consider adding TLS configuration assertions.Given that this PR introduces NATS TLS support, this test should verify the default TLS behavior when no configuration is provided. Consider adding assertions to verify:
- The default non-TLS NATS configuration
- The absence of TLS-related fields in the default configuration
Add these assertions after line 74:
apiPort := int(unmarshalledOutputMap["API"].(map[string]interface{})["Port"].(float64)) s.Require().Equalf(1234, apiPort, "Incorrect API.Port, got: %d", apiPort) + + // Verify NATS TLS configuration defaults + natsConfig, exists := unmarshalledOutputMap["NATS"].(map[string]interface{}) + s.Require().True(exists, "NATS configuration should exist") + + // Verify TLS is disabled by default + tlsConfig, exists := natsConfig["TLS"].(map[string]interface{}) + s.Require().False(exists, "TLS configuration should not exist in default config")test_integration/8_nats_tls_custom_ca_suite_test.go (3)
16-25
: Consider adding documentation for the test suiteWhile the implementation is correct, adding documentation comments would help explain the purpose of this test suite and its relationship to the NATS TLS feature.
Add documentation like this:
+// NatsTLSCustomCASuite tests NATS TLS functionality with custom CA certificates. +// It verifies that: +// - TLS connections with custom CA certificates work correctly +// - Non-TLS connections are rejected +// - Regular operations work with TLS enabled type NatsTLSCustomCASuite struct { BaseDockerComposeTestSuite }
27-48
: Add config file existence validationWhile the setup is well-structured, it's recommended to validate the existence of configuration files before using them.
Consider adding validation:
func (s *NatsTLSCustomCASuite) SetupSuite() { rawDockerComposeFilePath := "./common_assets/docker_compose_files/orchestrator-and-compute-custom-startup.yml" + // Validate file existence + files := []string{ + rawDockerComposeFilePath, + s.commonAssets("nodes_configs/8_tls_enabled_orchestrator_with_custom_ca.yaml"), + s.commonAssets("nodes_configs/8_tls_enabled_compute_with_custom_ca.yaml"), + } + for _, file := range files { + if _, err := os.Stat(file); os.IsNotExist(err) { + s.T().Fatalf("Required config file not found: %s", file) + } + }
50-71
: Consider environment-specific timeout configurationThe 30-second timeout might be too short for slower CI environments or resource-constrained systems.
Consider making the timeout configurable:
+ // Default timeout that can be overridden via environment variable + timeout := 30 * time.Second + if envTimeout := os.Getenv("BACALHAU_TEST_TIMEOUT"); envTimeout != "" { + if d, err := time.ParseDuration(envTimeout); err == nil { + timeout = d + } + } - _, err = s.waitForJobToComplete(jobID, 30*time.Second) + _, err = s.waitForJobToComplete(jobID, timeout)pkg/config/types/orchestrator.go (3)
14-15
: Enhance NATS field documentation to mention TLS capabilities.The current comment could be more specific about the TLS capabilities being introduced.
Consider updating the comment to:
- // NATS specifies the NATS related configuration on the orchestrator. + // NATS specifies the NATS related configuration on the orchestrator, including TLS settings for secure communication.
28-31
: Enhance security documentation for TLS key and cert.The comments for TLS key and certificate should include security recommendations.
Consider updating the comments:
- // ServerTLSKey specifies the private key file path given to NATS server to serve TLS connections. + // ServerTLSKey specifies the private key file path for TLS connections. + // The file should be in PEM format and readable only by the orchestrator process. ServerTLSKey string `yaml:"ServerTLSKey,omitempty" json:"ServerTLSKey,omitempty"` - // ServerTLSCert specifies the certificate file path given to NATS server to serve TLS connections. + // ServerTLSCert specifies the certificate file path for TLS connections. + // The certificate should be in PEM format and signed by a trusted CA. ServerTLSCert string `yaml:"ServerTLSCert,omitempty" json:"ServerTLSCert,omitempty"`
35-36
: Clarify CA certificate requirements and chain of trust.The CACert field comment should provide more details about the chain of trust and certificate requirements.
Consider updating the comment:
- // CACert specifies the CA file path that the orchestrator node trusts when connecting to NATS server. + // CACert specifies the Certificate Authority (CA) file path for verifying client certificates. + // If not specified, the system's default trust store will be used. + // The CA certificate should be in PEM format and contain the complete chain of trust.pkg/publicapi/client/v2/options.go (3)
Line range hint
70-76
: LGTM! Consider adding more documentation.The updated comment better clarifies that this is specifically for server TLS CA certificates. This aligns well with the PR's objective of implementing NATS TLS support.
Consider adding examples in the comment to show typical usage patterns, such as:
// Example: // client.WithCACertificate("/path/to/ca.crt")
Line range hint
134-164
: Consider enhancing error handling and security measures.While the TLS implementation is solid, there are a few areas for improvement:
- The panic in
getTLSTransport
when reading the CA certificate could be handled more gracefully.- Consider adding validation for the CACert file path before attempting to read it.
- The implementation could be extended to support client certificates for mutual TLS authentication.
Here's a suggested improvement for the error handling:
func getTLSTransport(config *Config) *http.Transport { tr := &http.Transport{} if !config.TLS.UseTLS { return tr } if config.TLS.CACert != "" { + // Validate file exists and is readable + if _, err := os.Stat(config.TLS.CACert); err != nil { + // Log error and fall back to system certificates + return tr + } + caCert, err := os.ReadFile(config.TLS.CACert) if err != nil { - // unreachable: we already checked that the file exists at CLI startup - // if it has gone missing in the meantime then something is very wrong - newErr := errors.Wrap(err, fmt.Sprintf("Error: unable to read CA certificate: %s", config.TLS.CACert)) - panic(newErr.Error()) + // Log error and fall back to system certificates + return tr } caCertPool := x509.NewCertPool() - caCertPool.AppendCertsFromPEM(caCert) + if !caCertPool.AppendCertsFromPEM(caCert) { + // Log error: invalid certificate + return tr + }
Line range hint
25-39
: Consider adding TLS configuration validation.The
Config
struct could benefit from a validation method to ensure TLS settings are consistent and secure.Consider adding a validation method:
// Validate checks if the configuration is valid func (c *Config) Validate() error { if c.TLS.UseTLS && c.TLS.CACert != "" { if _, err := os.Stat(c.TLS.CACert); err != nil { return fmt.Errorf("CA certificate file not accessible: %w", err) } } return nil }pkg/config_legacy/types/node.go (1)
68-68
: LGTM! Consider adding documentation.The rename from
CACert
toServerTLSCACert
improves clarity by explicitly indicating its purpose for server TLS CA certificates.Consider adding a comment to document that this field specifies the CA certificate used to verify the server's TLS certificate:
+ // ServerTLSCACert specifies the path to the CA certificate used to verify + // the server's TLS certificate. If not specified, the system's default + // trust store will be used. CACert string `yaml:"ServerTLSCACert"`pkg/nats/transport/nats_config_test.go (1)
168-181
: Document the default TLS timeout value.While the test correctly validates the default timeout behavior, consider adding a comment explaining why 10 seconds was chosen as the default timeout value. This helps future maintainers understand the rationale behind this specific duration.
func (suite *NATSTransportConfigSuite) TestNATSTransportDefaultTLSTimeoutConfig() { + // When ServerTLSTimeout is set to 0, it should default to 10 seconds + // which is a reasonable timeout for most TLS handshakes while preventing + // indefinite waiting in case of connection issues natsConfig := NATSTransportConfig{test_integration/utils/test_utils.go (1)
54-60
: LGTM! Good addition of image verification.The added verification step improves test reliability by ensuring the Docker image was successfully created. The error message is clear and includes all necessary details.
Consider wrapping the error in a more specific error type to help with error handling in test cases:
if err != nil { - return fmt.Errorf("unable to find image %s:%s. Error: %v", imageName, imageTag, err) + return fmt.Errorf("image build verification failed for %s:%s: %w", imageName, imageTag, err) }This change:
- Uses
%w
to wrap the original error for better error handling- Improves the error message clarity
- Makes it easier to handle specific error cases in tests
pkg/nats/transport/nats.go (4)
62-62
: Typo in comment: 'Use' should be 'Used'The comment on line 62 has a typo. It should read 'Used by the Nats Client when node acts as compute'.
Apply this diff to fix the typo:
- // Use by the Nats Client when node acts as compute + // Used by the Nats Client when node acts as compute
87-88
: Use idiomatic string comparison for empty checksIn Go, it's more conventional to check if a string is empty using
== ""
rather thanlen(s) > 0
. Consider updating the checks on lines 87 and 88 for clarity and readability.Apply this diff to make the change:
- serverCertProvided := len(c.ServerTLSCert) > 0 - serverKeyProvided := len(c.ServerTLSKey) > 0 + serverCertProvided := c.ServerTLSCert != "" + serverKeyProvided := c.ServerTLSKey != ""
104-106
: Set default TLS timeout for non-positive valuesCurrently, the default
ServerTLSTimeout
is set when the value is zero, but an error is raised if it's negative. To enhance robustness, consider setting the default timeout for any non-positive values.Apply this diff to adjust the validation:
if serverCertProvided && serverKeyProvided && c.ServerTLSTimeout <= 0 { - c.ServerTLSTimeout = NATSServerDefaultTLSTimeout + c.ServerTLSTimeout = NATSServerDefaultTLSTimeout }
269-278
: Review conditional logic for TLS client configurationThe conditions in
CreateClient
(lines 271-278) might not cover all TLS configuration scenarios, especially when some TLS fields are missing. To ensure all possible configurations are handled correctly, consider refactoring the logic for clarity and completeness.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
test_integration/go.sum
is excluded by!**/*.sum
📒 Files selected for processing (40)
.cspell/custom-dictionary.txt
(1 hunks).gitignore
(1 hunks).pre-commit-config.yaml
(1 hunks)pkg/config/types/compute.go
(1 hunks)pkg/config/types/generated_constants.go
(2 hunks)pkg/config/types/generated_descriptions.go
(2 hunks)pkg/config/types/orchestrator.go
(2 hunks)pkg/config_legacy/types/generated_descriptions.go
(2 hunks)pkg/config_legacy/types/node.go
(1 hunks)pkg/nats/transport/nats.go
(6 hunks)pkg/nats/transport/nats_config_test.go
(2 hunks)pkg/node/node.go
(1 hunks)pkg/publicapi/client/v2/options.go
(1 hunks)test_integration/1_orchestrator_basic_config_suite_test.go
(1 hunks)test_integration/4_orchestrator_config_override_and_flag_and_config_flag_suite_test.go
(1 hunks)test_integration/5_orchestrator_no_config_suite_test.go
(1 hunks)test_integration/6_jobs_basic_runs_scenarios_suite_test.go
(1 hunks)test_integration/7_nats_tls_default_truststore_suite_test.go
(1 hunks)test_integration/8_nats_tls_custom_ca_suite_test.go
(1 hunks)test_integration/README.md
(1 hunks)test_integration/base_suite.go
(2 hunks)test_integration/common_assets/certificates/generated_assets/bacalhau-orchestrator-node.crt
(1 hunks)test_integration/common_assets/certificates/generated_assets/bacalhau-orchestrator-node.key
(1 hunks)test_integration/common_assets/certificates/nats_custom/nats-server.crt
(1 hunks)test_integration/common_assets/certificates/nats_custom/nats-server.key
(1 hunks)test_integration/common_assets/certificates/nats_custom/nats_root_ca.crt
(1 hunks)test_integration/common_assets/certificates/nats_custom/nats_root_ca.key
(1 hunks)test_integration/common_assets/docker_compose_files/orchestrator-and-compute-custom-startup.yml
(1 hunks)test_integration/common_assets/dockerfiles/Dockerfile-JumpboxNode
(1 hunks)test_integration/common_assets/nodes_configs/7_tls_enabled_compute.yaml
(1 hunks)test_integration/common_assets/nodes_configs/7_tls_enabled_orchestrator.yaml
(1 hunks)test_integration/common_assets/nodes_configs/8_tls_enabled_compute_with_custom_ca.yaml
(1 hunks)test_integration/common_assets/nodes_configs/8_tls_enabled_orchestrator_with_custom_ca.yaml
(1 hunks)test_integration/common_assets/scripts/compute_node_with_start_cmd.sh
(1 hunks)test_integration/go.mod
(2 hunks)test_integration/init_test.go
(1 hunks)test_integration/run_locally.sh
(1 hunks)test_integration/utils/dynamic_json.go
(1 hunks)test_integration/utils/test_utils.go
(2 hunks)test_integration/utils/yaml_template_engine.go
(2 hunks)
✅ Files skipped from review due to trivial changes (7)
- .cspell/custom-dictionary.txt
- .gitignore
- test_integration/common_assets/certificates/generated_assets/bacalhau-orchestrator-node.crt
- test_integration/common_assets/certificates/nats_custom/nats-server.crt
- test_integration/common_assets/certificates/nats_custom/nats_root_ca.crt
- test_integration/init_test.go
- test_integration/utils/yaml_template_engine.go
🧰 Additional context used
🪛 Gitleaks
test_integration/common_assets/certificates/generated_assets/bacalhau-orchestrator-node.key
1-52: Identified a Private Key, which may compromise cryptographic security and sensitive data encryption.
(private-key)
test_integration/common_assets/certificates/nats_custom/nats-server.key
1-52: Identified a Private Key, which may compromise cryptographic security and sensitive data encryption.
(private-key)
test_integration/common_assets/certificates/nats_custom/nats_root_ca.key
1-52: Identified a Private Key, which may compromise cryptographic security and sensitive data encryption.
(private-key)
🔇 Additional comments (38)
test_integration/common_assets/nodes_configs/8_tls_enabled_compute_with_custom_ca.yaml (2)
1-3
: LGTM: Basic configuration looks appropriate.
The UUID name provider and API port configuration are suitable for integration testing purposes.
10-13
: Verify the CA certificate path and add validation checks.
The CA certificate path is hardcoded to a specific location within the container. Consider:
- Adding a comment about the expected certificate format
- Verifying the certificate exists during test setup
Run the following script to verify the certificate path and format:
Consider adding a comment about the expected certificate format:
Nats:
# The paths for these certs are within the container after
# the assets has been copied during image creation
+ # Expected format: PEM-encoded X.509 certificate
CACert: "/bacalhau_integration_tests/common_assets/certificates/nats_custom/nats_root_ca.crt"
✅ Verification successful
Certificate path and format have been verified successfully
The verification script confirmed that:
- The certificate file exists at the expected location in test assets
- The file contains a valid PEM-encoded certificate (starts with "BEGIN CERTIFICATE")
The suggested comment about the certificate format would still be valuable for documentation purposes, but the core concern about certificate validation has been addressed.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify CA certificate existence and format
# Test 1: Check if the certificate file exists in the test assets
CERT_PATH="common_assets/certificates/nats_custom/nats_root_ca.crt"
if ! fd -t f "nats_root_ca.crt" test_integration/; then
echo "Error: Certificate file not found in test assets"
exit 1
fi
# Test 2: Verify it's a valid certificate file (should start with BEGIN CERTIFICATE)
if ! head -n 1 "test_integration/$CERT_PATH" | grep -q "BEGIN CERTIFICATE"; then
echo "Error: Invalid certificate format"
exit 1
fi
Length of output: 300
test_integration/common_assets/nodes_configs/7_tls_enabled_orchestrator.yaml (2)
13-13
: Review the TLS timeout value.
The ServerTLSTimeout of 15 seconds seems high for a local test environment. Consider:
- Documenting why 15 seconds was chosen
- Making this value configurable through environment variables
- Adding a comment about recommended values for different environments
#!/bin/bash
# Description: Check for other timeout configurations to ensure consistency
# Expect: Similar timeout values in related configurations
rg "TLSTimeout:"
4-7
: Add a warning comment about the test-only token.
The hardcoded token "i_am_very_secret_token" should be clearly marked as a test-only value to prevent accidental use in production.
test_integration/common_assets/nodes_configs/8_tls_enabled_orchestrator_with_custom_ca.yaml (1)
9-14
: Verify TLS certificate paths and consider timeout configuration
The TLS configuration looks good, but consider:
- Adding runtime validation to ensure certificate files exist
- Making the timeout configurable for different test environments
✅ Verification successful
Let me gather more information about the certificate validation and test setup.
TLS configuration is properly set up and validated
The initial concerns have been addressed by the evidence:
- All required certificates (crt, key, and root CA) are present in the assets directory
- The certificates are used consistently across test configurations
- The 15-second timeout is standardized across TLS-enabled configurations and suitable for test environments
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Verify that the certificate files exist in other test configurations
rg -l "nats_custom/nats-server\.(crt|key)" test_integration/
rg -l "nats_custom/nats_root_ca\.crt" test_integration/
Length of output: 443
Script:
#!/bin/bash
# Check if there's any certificate validation logic in the test files
rg -A 5 "nats_root_ca\.crt" test_integration/8_nats_tls_custom_ca_suite_test.go
# Check if certificates are actually present in the assets directory
fd . test_integration/common_assets/certificates/nats_custom/
# Look for any timeout configuration patterns
rg "ServerTLSTimeout" test_integration/
Length of output: 866
.pre-commit-config.yaml (1)
11-11
: LGTM! Verify the broader exclusion pattern.
The updated exclusion pattern for the detect-private-key
hook makes sense for TLS testing purposes.
Let's verify what additional files would be excluded with this change:
✅ Verification successful
The broader certificate exclusion pattern is safe and appropriate
The verification confirms this is a safe change because:
- The certificates are strictly for testing purposes, located in the test directory
- The README confirms these are test certificate generation scripts
- No usage of these certificates was found outside of test directories
- The broader exclusion pattern is necessary to cover both generated and custom NATS certificates used in TLS testing
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check what new files would be excluded by the broader pattern
# that weren't covered by the original pattern
# List all files under the certificates directory
fd . "test_integration/common_assets/certificates" \
--exclude "generated_assets" \
--type f
Length of output: 561
Script:
#!/bin/bash
# Let's check the contents of the README to understand the purpose and usage of these certificates
cat test_integration/common_assets/certificates/README.md
# Also check if these certificates are used only in tests
rg -l "nats-server\.(crt|key)" --type-add 'yaml:*.{yaml,yml}' -g '!test*' -g '!testdata'
rg -l "nats_root_ca\.(crt|key)" --type-add 'yaml:*.{yaml,yml}' -g '!test*' -g '!testdata'
Length of output: 595
test_integration/README.md (1)
41-46
: LGTM! Clear documentation of command-line options.
The new command-line options are well documented with clear examples showing each use case.
test_integration/common_assets/docker_compose_files/orchestrator-and-compute-custom-startup.yml (1)
17-17
: Justify the use of privileged mode.
Running containers in privileged mode grants extensive permissions and poses security risks. Please document why privileged mode is necessary for the orchestrator node.
pkg/config/types/compute.go (2)
14-26
: Implementation aligns well with TLS support objectives.
The addition of NATS TLS configuration capabilities is well-structured and follows existing patterns in the codebase. The implementation provides a clean way for compute nodes to specify their TLS settings while maintaining backward compatibility through the use of omitempty
tags.
23-26
: Consider adding path validation and documentation.
While the implementation is correct, consider these security and usability improvements:
- Add validation for the CA certificate file path to ensure it exists and is readable
- Document the expected format and requirements for the CA certificate file
Let's check how certificate paths are handled elsewhere in the codebase:
✅ Verification successful
Let me gather more information about certificate validation patterns in the codebase.
Certificate path validation is already handled appropriately in the codebase
Based on the findings:
- The codebase has a
validateFile
function incmd/util/flags/cliflags/config.go
that handles file validation - The actual CA certificate validation is performed at runtime in
pkg/publicapi/client/v2/options.go
where the certificate is read and verified - Similar CACert fields are consistently used across the codebase (in both compute and orchestrator configurations) without additional validation at the struct level
The current implementation aligns with the codebase's patterns for handling certificate paths, where validation occurs during the operational phase rather than at the configuration level. This is a reasonable approach as certificate paths may not exist during configuration parsing but only need to be valid when establishing connections.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for similar certificate path handling patterns
rg -g '!{vendor,third_party}/**' -A 3 'CACert.*string.*`yaml:"'
# Look for certificate validation patterns
ast-grep --pattern 'func $_(path string) error {
$$$
os.Open($path)
$$$
}'
Length of output: 883
Script:
#!/bin/bash
# Search for certificate validation functions
rg -g '!{vendor,third_party}/**' -B 2 -A 5 'func.*[Vv]alidat.*[Cc]ertificate'
# Look for file validation patterns
rg -g '!{vendor,third_party}/**' -B 2 -A 5 'func.*[Vv]alidat.*[Ff]ile'
# Search for actual usage of CACert fields
rg -g '!{vendor,third_party}/**' -B 2 -A 5 '\.[Cc][Aa][Cc]ert'
Length of output: 11367
test_integration/utils/dynamic_json.go (1)
1-9
: LGTM! Clean package structure and imports.
The package name and imports are well-organized and appropriate for the utility's purpose.
test_integration/run_locally.sh (1)
5-10
: LGTM! Clear and well-structured variable initialization.
The default values are properly initialized and follow shell scripting best practices.
test_integration/7_nats_tls_default_truststore_suite_test.go (5)
1-15
: LGTM! Clean imports organization.
The imports are well-organized and all are being used in the code.
45-48
: LGTM! Clean teardown implementation.
The teardown properly delegates to the base class after logging.
73-75
: LGTM! Standard test suite runner.
The test runner follows the standard testify suite pattern.
27-43
: Consider making configuration paths more flexible.
While the implementation is solid, consider making the paths to configuration files and Docker compose files configurable through environment variables or test parameters for better flexibility.
#!/bin/bash
# Verify the existence of referenced configuration files
echo "Checking for required configuration files..."
ls -l test_integration/common_assets/docker_compose_files/orchestrator-and-compute-custom-startup.yml \
test_integration/common_assets/nodes_configs/7_tls_enabled_orchestrator.yaml \
test_integration/common_assets/nodes_configs/7_tls_enabled_compute.yaml
16-25
: Verify the GlobalRunIdentifier variable.
The code references globalTestExecutionId
which should be defined elsewhere in the package.
✅ Verification successful
Let me try a different search pattern to locate the variable definition.
globalTestExecutionId
is properly defined and used across test suites
The variable is correctly defined in init_test.go
and is used consistently across all test suites for test run identification. The initialization happens at the package level, making it available to all test suites in the package.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for globalTestExecutionId definition
rg "globalTestExecutionId.*=|var globalTestExecutionId|const globalTestExecutionId" test_integration/
Length of output: 743
Script:
#!/bin/bash
# Search for globalTestExecutionId with broader pattern
rg "globalTestExecutionId" test_integration/
# Also search for test suite initialization patterns
rg "func init|var global" test_integration/
Length of output: 1499
test_integration/4_orchestrator_config_override_and_flag_and_config_flag_suite_test.go (2)
57-73
: LGTM! Improved JSON handling and validation
The changes enhance the code by:
- Using a more robust JSON parsing approach
- Implementing proper error handling
- Using clean JSON query syntax for field access
- Validating configuration override behavior
47-54
: Verify TLS configuration validation
Given that this PR implements NATS TLS support, consider adding test cases to verify that TLS-related flags and configurations are properly handled by the node list
command.
test_integration/6_jobs_basic_runs_scenarios_suite_test.go (1)
Line range hint 1-108
: Verify TLS configuration in orchestrator config.
The test suite references an orchestrator config file at nodes_configs/6_orchestrator_config.yaml
. Since this PR adds TLS support for NATS, we should verify that this configuration properly tests both TLS and non-TLS scenarios.
test_integration/5_orchestrator_no_config_suite_test.go (1)
54-60
: LGTM! Command execution improvement.
The change from using curl
to directly using the bacalhau
CLI is a good improvement that:
- Reduces external dependencies
- Provides more consistent and reliable test execution
- Aligns with the command usage in other test suites
test_integration/8_nats_tls_custom_ca_suite_test.go (3)
1-15
: LGTM: Clean imports structure
The imports are well-organized and all dependencies are properly utilized in the code.
73-84
: LGTM: Good negative test case
The test properly verifies that non-TLS connections are rejected with the expected error message.
100-102
: LGTM: Standard suite runner
The suite runner follows the standard testify/suite pattern.
pkg/config_legacy/types/node.go (1)
Line range hint 1-1
: Consider migration strategy for legacy package.
The file resides in pkg/config_legacy
, but implements core TLS functionality. Consider creating a migration plan to move this configuration to the main config package, ensuring a clean separation between legacy and current code.
Let's check if there are any new TLS configurations in the main config package:
test_integration/1_orchestrator_basic_config_suite_test.go (1)
43-66
: LGTM on the refactoring, but consider adding TLS-specific validations.
The refactoring to use convertStringToDynamicJSON
and specific JSON path queries improves the test's robustness. However, given this PR's objective to add NATS TLS support, consider adding validations to ensure nodes are communicating over a secure channel when TLS is enabled.
Let's verify if there are any TLS-specific tests elsewhere:
pkg/nats/transport/nats_config_test.go (2)
116-150
: LGTM! Comprehensive TLS configuration validation tests.
The new test cases effectively validate critical TLS configuration requirements:
- Ensuring both certificate and key are provided together
- Preventing negative timeout values
These tests align well with security best practices for TLS implementation.
183-202
: Consider adding path validation tests.
The happy path test verifies field preservation but doesn't validate path existence or accessibility. Consider adding test cases that:
- Use temporary test certificates/keys
- Test with both valid and invalid file paths
- Verify file permission requirements
This would ensure robust handling of TLS file configurations in production environments.
pkg/config/types/generated_constants.go (2)
27-27
: LGTM: Compute node CA certificate configuration.
The constant follows the established naming convention and is appropriately placed among other compute-related configurations.
88-91
: LGTM: Comprehensive NATS TLS configuration for orchestrator.
The constants provide a complete set of TLS configuration options including CA certificate, server certificate, private key, and timeout settings. They follow the established naming convention and are properly placed among other orchestrator-related configurations.
Let's verify these constants are used consistently in the configuration structs and their corresponding YAML/env var bindings:
✅ Verification successful
NATS TLS configuration constants are properly integrated and consistent
The verification confirms that:
- The constants are properly mapped to struct fields in
OrchestratorNats
- Each constant has corresponding field with matching YAML tags and proper documentation
- The fields cover all TLS aspects: CA cert, server cert, private key, and timeout
- The descriptions in
generated_descriptions.go
match the struct field documentation
The configuration structure is complete and consistent with the established patterns in the codebase.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the new constants are properly used in configuration structs and bindings
# Test 1: Check for corresponding struct fields in compute.go and orchestrator.go
echo "Checking for corresponding struct fields..."
rg -A 2 "type (Compute|Orchestrator)Nats struct" pkg/config/types/
# Test 2: Check for YAML tag bindings
echo "Checking for YAML tag bindings..."
rg "yaml:.*NATS\.(CACert|ServerTLSCert|ServerTLSKey|ServerTLSTimeout)" pkg/config/types/
# Test 3: Check for environment variable bindings
echo "Checking for environment variable bindings..."
rg "env:.*NATS_(CA_CERT|SERVER_TLS)" pkg/config/types/
Length of output: 1081
Script:
#!/bin/bash
# Let's check the complete struct definitions and their fields
echo "Checking OrchestratorNats struct definition..."
rg -A 10 "type OrchestratorNats struct" pkg/config/types/
echo -e "\nChecking usage of the new constants..."
rg "OrchestratorNATSServerTLS|OrchestratorNATSCACert" pkg/config/types/
echo -e "\nChecking YAML/env bindings with broader pattern..."
rg "yaml:\".*TLS.*\".*env:" pkg/config/types/
Length of output: 2815
test_integration/base_suite.go (1)
8-11
: LGTM: Required imports reinstated.
The reinstated imports are necessary for both existing functionality and the new convertStringToDynamicJSON
method.
test_integration/go.mod (1)
6-6
: LGTM on the new dependencies.
The addition of github.com/PaesslerAG/jsonpath
and its indirect dependency github.com/PaesslerAG/gval
are appropriate for the integration tests. These libraries will help in validating JSON responses and configurations in the new NATS TLS integration tests.
Also applies to: 24-24
pkg/config_legacy/types/generated_descriptions.go (3)
Line range hint 1-150
: Add descriptions for NATS TLS configuration entries.
Based on the PR objectives which introduce NATS TLS support, this configuration map appears to be missing entries for the new NATS TLS configuration options (e.g., ComputeNATSCACertKey, OrchestratorNATSCACertKey, etc.) that were mentioned in the AI summary. These descriptions are essential for users to understand how to configure TLS for NATS communication.
#!/bin/bash
# Search for NATS-related configuration keys in the codebase
echo "Searching for NATS configuration keys..."
rg "ComputeNATS|OrchestratorNATS"
# Search for NATS-related constants
echo "Searching for NATS-related constants..."
ast-grep --pattern 'const $_ = "ComputeNATS" | const $_ = "OrchestratorNATS"'
Line range hint 1-3
: Update the source generator instead of this file.
This is a generated file (indicated by "Code generated by go generate; DO NOT EDIT"). Any changes to descriptions should be made in the source generator rather than directly in this file.
#!/bin/bash
# Find the generator file
echo "Searching for go:generate directives related to descriptions..."
rg "//go:generate.*description"
# Find potential source files
echo "Searching for potential source files..."
fd -e go -e gotmpl "description|config.*gen"
11-11
: Verify TLS auto-enabling conditions.
The descriptions state that TLS is "Auto enabled if Insecure or ServerTLSCACert are specified". This seems to differ from the AI summary which mentioned "CACert" was changed to "ServerTLSCACert". Let's verify this change is consistent with the actual implementation.
Also applies to: 21-21
✅ Verification successful
The TLS auto-enabling description is accurate and consistent with implementation
Based on the code evidence:
- The struct definition in
pkg/config_legacy/types/node.go
confirms that TLS is "Auto enabled if Insecure or CACert are specified" - The implementation in
pkg/publicapi/client/v2/options.go
shows bothWithCACertificate
andWithInsecureTLS
options that can trigger TLS usage - The description in
generated_descriptions.go
correctly states "Auto enabled if Insecure or ServerTLSCACert are specified", which matches the actual field nameServerTLSCACert
used in the codebase
The description accurately reflects the implementation and the name change from CACert
to ServerTLSCACert
is consistently applied throughout the codebase.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for TLS auto-enabling logic in the codebase
echo "Searching for TLS auto-enabling conditions..."
rg -A 5 "func.*UseTLS|TLS.*enabled"
# Search for CACert/ServerTLSCACert usage
echo "Searching for CACert/ServerTLSCACert usage..."
rg "CACert|ServerTLSCACert"
Length of output: 596
Script:
#!/bin/bash
# Let's try a different approach to find TLS-related code
# Search for TLS configuration in the codebase
echo "Searching for TLS configuration..."
rg -l "TLS.*Config|UseTLS"
# Look for the specific files that might contain the implementation
echo -e "\nSearching for specific TLS implementation..."
ast-grep --pattern 'if $_ || $_ {
$$.UseTLS = true
$$$
}'
# Search for ServerTLSCACert in configuration
echo -e "\nSearching for ServerTLSCACert in configuration..."
rg "ServerTLSCACert" -A 3 -B 3
# Look for any TLS auto-enabling logic
echo -e "\nSearching for TLS auto-enabling logic..."
ast-grep --pattern 'func $_($$) bool {
$$$
UseTLS $$$
$$$
}'
Length of output: 8536
Script:
#!/bin/bash
# Let's check the actual implementation of TLS auto-enabling logic
# Search for TLS configuration struct and its usage
echo "Searching for TLS configuration implementation..."
rg -A 10 "type.*TLS.*Config struct"
# Look for places where UseTLS is set
echo -e "\nSearching for UseTLS assignments..."
ast-grep --pattern '$$.UseTLS = $_'
# Check if there's any logic in client initialization
echo -e "\nSearching for client initialization logic..."
rg -A 10 "func.*WithCACertificate|func.*WithInsecure"
Length of output: 3694
pkg/config/types/generated_descriptions.go (3)
29-29
: LGTM: Clear description for compute node's NATS CA certificate configuration.
The description accurately explains the purpose of the CA certificate file path for compute nodes connecting to the NATS server.
32-32
: LGTM: Improved clarity in analytics configuration description.
The description now clearly explains the purpose and impact of disabling analytics.
90-93
: LGTM: Comprehensive NATS TLS configuration descriptions for orchestrator.
The descriptions for orchestrator's NATS TLS configuration are clear and cover all necessary aspects:
- CA certificate for trust verification
- Server certificate for TLS connections
- Private key for TLS connections
- TLS timeout configuration
Compute: | ||
Enabled: true | ||
Orchestrators: | ||
- nats://bacalhau-orchestrator-node:4222 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update NATS URL scheme for TLS.
The URL scheme nats://
typically indicates non-TLS communication. For TLS-enabled connections, consider using the tls://
scheme to explicitly indicate secure communication.
- - nats://bacalhau-orchestrator-node:4222
+ - tls://bacalhau-orchestrator-node:4222
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
- nats://bacalhau-orchestrator-node:4222 | |
- tls://bacalhau-orchestrator-node:4222 |
Auth: | ||
Token: "i_am_very_secret_token" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider using environment variables for sensitive values.
Storing authentication tokens directly in configuration files poses a security risk. Consider using environment variables or secure secrets management.
Auth:
- Token: "i_am_very_secret_token"
+ Token: ${COMPUTE_AUTH_TOKEN}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
Auth: | |
Token: "i_am_very_secret_token" | |
Auth: | |
Token: ${COMPUTE_AUTH_TOKEN} |
Compute: | ||
Enabled: true | ||
Orchestrators: | ||
- nats://bacalhau-orchestrator-node:4222 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
TLS configuration needs to be added for NATS connection
The compute node configuration in 7_tls_enabled_compute.yaml
is missing the required TLS configuration for connecting to the orchestrator. Based on the evidence:
- The orchestrator node (in other config files) uses TLS with server certificates and CA certificates
- The compute node with custom CA (
8_tls_enabled_compute_with_custom_ca.yaml
) includes NATS TLS configuration - The current config only has a plain NATS URL without TLS settings
Add the following TLS configuration under the Compute
section:
Nats:
CACert: "/bacalhau_integration_tests/common_assets/certificates/nats_custom/nats_root_ca.crt"
🔗 Analysis chain
Verify TLS configuration with orchestrator.
The compute node configuration needs to match the orchestrator's TLS settings. Let's verify the compatibility:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check orchestrator's TLS configuration to ensure compatibility
# Look for orchestrator TLS configuration
echo "Searching for orchestrator TLS settings..."
rg -l "tls:" --type yaml
# Check for any TLS-related environment variables in deployment files
echo "Checking for TLS environment variables..."
rg -l "TLS_" --type yaml
# Look for certificate paths in deployment configurations
echo "Checking for certificate configurations..."
rg -l "cert.*path|ca.*path" --type yaml
Length of output: 611
Script:
#!/bin/bash
# Let's examine the TLS configurations in the found files and related orchestrator settings
# Check the content of deployment files for TLS settings
echo "=== Checking docker-compose deployment TLS settings ==="
rg -A 5 -B 5 "TLS_" test_integration/common_assets/docker_compose_files/deployment-with-minio-and-registry.yml docker-compose-deployment/docker-compose.yml
# Check orchestrator configurations in test files
echo "=== Checking orchestrator configurations in test files ==="
fd -e yaml -e yml . test_integration/common_assets/nodes_configs/ -x cat {}
# Look for any NATS configuration with TLS
echo "=== Checking NATS configurations ==="
rg -A 5 -B 5 "nats://" --type yaml
Length of output: 10121
Compute: | ||
Enabled: true | ||
Orchestrators: | ||
- nats://bacalhau-orchestrator-node:4222 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verify NATS URL scheme for TLS.
The NATS URL should use the tls://
scheme instead of nats://
when TLS is enabled to explicitly indicate secure communication.
Apply this diff:
Orchestrators:
- - nats://bacalhau-orchestrator-node:4222
+ - tls://bacalhau-orchestrator-node:4222
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
Compute: | |
Enabled: true | |
Orchestrators: | |
- nats://bacalhau-orchestrator-node:4222 | |
Compute: | |
Enabled: true | |
Orchestrators: | |
- tls://bacalhau-orchestrator-node:4222 |
ServerTLSCert: "/bacalhau_integration_tests/common_assets/certificates/generated_assets/bacalhau-orchestrator-node.crt" | ||
ServerTLSKey: "/bacalhau_integration_tests/common_assets/certificates/generated_assets/bacalhau-orchestrator-node.key" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Certificate files need stricter permissions in test environment
The verification revealed that the TLS certificate and key files in the test environment have overly permissive permissions:
- Key files (.key) are readable by others (644) instead of required 600
- All files are owned by jailuser:jailuser
While these are test certificates, it's recommended to:
- Set key files to 600 permissions (owner read/write only)
- Set certificate files to 644 permissions (world readable)
This maintains good security practices even in test environments and prevents accidental key exposure.
🔗 Analysis chain
Verify certificate path permissions in container.
The TLS certificate and key paths are within the container. Ensure that:
- The files have appropriate permissions (600 for key, 644 for cert)
- Only necessary users have access to these directories
The previous search might have failed due to special characters. Let's try a more focused search to understand the certificate usage and verify it's only in test contexts.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any other references to these certificate paths
# to ensure they're only used in test contexts
rg "/bacalhau_integration_tests/common_assets/certificates/generated_assets/bacalhau-orchestrator-node" | grep -v "test"
Length of output: 618
Script:
#!/bin/bash
# Let's check for certificate paths in test contexts and their usage
rg -l "/bacalhau_integration_tests/common_assets/certificates/generated_assets"
# Check file permissions in the certificate directory
fd . "test_integration/common_assets/certificates/generated_assets" -t f -x ls -l {}
Length of output: 1099
// TODO: Unfortunately the "bacalhau agent config" command is | ||
// TODO: unpredictable and returns weird first characters. | ||
// TODO: For now, Trim the first character if it is not "{" | ||
cleanedJsonString := strings.TrimLeftFunc(agentConfigOutput, func(r rune) bool { | ||
return r != '{' | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Unpredictable command output needs proper handling.
The TODO comments indicate that the bacalhau agent config
command produces unpredictable output with "weird first characters". This is concerning because:
- It could affect parsing of TLS configurations
- The current workaround of trimming characters is fragile
- The issue needs proper fixing in the command implementation
Consider:
- Opening an issue to track fixing the command output
- Adding proper error handling for malformed output
- Adding test cases for various malformed outputs
output, err := s.convertStringToDynamicJSON(cleanedJsonString) | ||
s.Require().NoError(err) | ||
|
||
orchestratorEnabled := marshalledOutput.(map[string]interface{})["config"].(map[string]interface{})["Orchestrator"].(map[string]interface{})["Enabled"].(bool) | ||
s.Require().Truef(orchestratorEnabled, "Expected orchestrator to be enabled, got: %t", orchestratorEnabled) | ||
orchestratorEnabled, err := output.Query("$.Orchestrator.Enabled") | ||
s.Require().NoError(err) | ||
s.Require().True(orchestratorEnabled.Bool()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add TLS configuration validation.
Since this PR adds TLS support, this test should verify that the agent configuration includes the expected TLS settings when enabled.
Add TLS-specific checks:
output, err := s.convertStringToDynamicJSON(cleanedJsonString)
s.Require().NoError(err)
orchestratorEnabled, err := output.Query("$.Orchestrator.Enabled")
s.Require().NoError(err)
s.Require().True(orchestratorEnabled.Bool())
+
+// Verify TLS configuration when enabled
+tlsEnabled, err := output.Query("$.NATS.TLS.Enabled")
+s.Require().NoError(err)
+if tlsEnabled.Bool() {
+ // Verify required TLS fields are present
+ certPath, err := output.Query("$.NATS.TLS.CertificatePath")
+ s.Require().NoError(err)
+ s.Require().NotEmpty(certPath.String())
+
+ keyPath, err := output.Query("$.NATS.TLS.KeyPath")
+ s.Require().NoError(err)
+ s.Require().NotEmpty(keyPath.String())
+}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
output, err := s.convertStringToDynamicJSON(cleanedJsonString) | |
s.Require().NoError(err) | |
orchestratorEnabled := marshalledOutput.(map[string]interface{})["config"].(map[string]interface{})["Orchestrator"].(map[string]interface{})["Enabled"].(bool) | |
s.Require().Truef(orchestratorEnabled, "Expected orchestrator to be enabled, got: %t", orchestratorEnabled) | |
orchestratorEnabled, err := output.Query("$.Orchestrator.Enabled") | |
s.Require().NoError(err) | |
s.Require().True(orchestratorEnabled.Bool()) | |
output, err := s.convertStringToDynamicJSON(cleanedJsonString) | |
s.Require().NoError(err) | |
orchestratorEnabled, err := output.Query("$.Orchestrator.Enabled") | |
s.Require().NoError(err) | |
s.Require().True(orchestratorEnabled.Bool()) | |
// Verify TLS configuration when enabled | |
tlsEnabled, err := output.Query("$.NATS.TLS.Enabled") | |
s.Require().NoError(err) | |
if tlsEnabled.Bool() { | |
// Verify required TLS fields are present | |
certPath, err := output.Query("$.NATS.TLS.CertificatePath") | |
s.Require().NoError(err) | |
s.Require().NotEmpty(certPath.String()) | |
keyPath, err := output.Query("$.NATS.TLS.KeyPath") | |
s.Require().NoError(err) | |
s.Require().NotEmpty(keyPath.String()) | |
} |
func (s *BaseDockerComposeTestSuite) convertStringToDynamicJSON(jsonString string) (*utils.DynamicJSON, error) { | ||
// Cleanup the Json output. Unfortunate that the CLI prints extra | ||
// characters at the beginning and at the end | ||
cleanedJsonString := strings.TrimLeftFunc(jsonString, func(r rune) bool { | ||
return r != '{' && r != '[' | ||
}) | ||
cleanedJsonString = strings.TrimRightFunc(cleanedJsonString, func(r rune) bool { | ||
return r != '}' && r != ']' | ||
}) | ||
|
||
parsedDynamicJSON, err := utils.ParseToDynamicJSON(cleanedJsonString) | ||
if err != nil { | ||
return nil, err | ||
} | ||
|
||
return parsedDynamicJSON, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve code organization and documentation.
The implementation is solid but could benefit from the following improvements:
- Extract the duplicated JSON cleaning logic (shared with
unmarshalJSONString
) into a private helper method:
+// cleanJSONString removes non-JSON characters from the beginning and end of the string
+func (s *BaseDockerComposeTestSuite) cleanJSONString(jsonString string) string {
+ cleaned := strings.TrimLeftFunc(jsonString, func(r rune) bool {
+ return r != '{' && r != '['
+ })
+ return strings.TrimRightFunc(cleaned, func(r rune) bool {
+ return r != '}' && r != ']'
+ })
+}
+// convertStringToDynamicJSON converts a JSON string to a DynamicJSON object.
+// It handles both object and array JSON types and cleans any non-JSON characters
+// from the input string before parsing.
func (s *BaseDockerComposeTestSuite) convertStringToDynamicJSON(jsonString string) (*utils.DynamicJSON, error) {
+ if strings.TrimSpace(jsonString) == "" {
+ return nil, fmt.Errorf("empty JSON string provided")
+ }
- cleanedJsonString := strings.TrimLeftFunc(jsonString, func(r rune) bool {
- return r != '{' && r != '['
- })
- cleanedJsonString = strings.TrimRightFunc(cleanedJsonString, func(r rune) bool {
- return r != '}' && r != ']'
- })
+ cleanedJsonString := s.cleanJSONString(jsonString)
parsedDynamicJSON, err := utils.ParseToDynamicJSON(cleanedJsonString)
if err != nil {
return nil, err
}
return parsedDynamicJSON, nil
}
- Update
unmarshalJSONString
to use the new helper:
func (s *BaseDockerComposeTestSuite) unmarshalJSONString(jsonString string, expectedType JSONResponseType) (interface{}, error) {
- cleanedJsonString := strings.TrimLeftFunc(jsonString, func(r rune) bool {
- return r != '{' && r != '['
- })
- cleanedJsonString = strings.TrimRightFunc(cleanedJsonString, func(r rune) bool {
- return r != '}' && r != ']'
- })
+ cleanedJsonString := s.cleanJSONString(jsonString)
Committable suggestion skipped: line range outside the PR's diff.
@@ -3,6 +3,7 @@ module bacalhau/integration_tests | |||
go 1.23 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix incorrect Go version.
The specified Go version 1.23
is invalid as it does not exist yet. The latest stable version is 1.22
.
Apply this diff to fix the version:
-go 1.23
+go 1.22
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
go 1.23 | |
go 1.22 |
pkg/node/node.go
Outdated
ServerTLSCACert: cfg.BacalhauConfig.Orchestrator.NATS.CACert, | ||
ServerTLSCert: cfg.BacalhauConfig.Orchestrator.NATS.ServerTLSCert, | ||
ServerTLSKey: cfg.BacalhauConfig.Orchestrator.NATS.ServerTLSKey, | ||
ServerTLSTimeout: cfg.BacalhauConfig.Orchestrator.NATS.ServerTLSTimeout, | ||
ClientTLSCACert: cfg.BacalhauConfig.Compute.NATS.CACert, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add validation and documentation for TLS configurations.
While the TLS field additions look good, consider the following improvements:
-
Add validation to ensure:
- If
ServerTLSCert
is provided,ServerTLSKey
must also be provided ServerTLSTimeout
has a reasonable value (not too short or too long)- Certificates are valid and not expired
- If
-
Add documentation comments explaining:
- The purpose of each TLS field
- Required format and paths for certificates
- Expected timeout duration units
Here's a suggested implementation:
config := &nats_transport.NATSTransportConfig{
NodeID: cfg.NodeID,
Host: cfg.BacalhauConfig.Orchestrator.Host,
Port: cfg.BacalhauConfig.Orchestrator.Port,
AdvertisedAddress: cfg.BacalhauConfig.Orchestrator.Advertise,
AuthSecret: cfg.BacalhauConfig.Orchestrator.Auth.Token,
Orchestrators: cfg.BacalhauConfig.Compute.Orchestrators,
StoreDir: storeDir,
ClusterName: cfg.BacalhauConfig.Orchestrator.Cluster.Name,
ClusterPort: cfg.BacalhauConfig.Orchestrator.Cluster.Port,
ClusterPeers: cfg.BacalhauConfig.Orchestrator.Cluster.Peers,
ClusterAdvertisedAddress: cfg.BacalhauConfig.Orchestrator.Cluster.Advertise,
IsRequesterNode: cfg.BacalhauConfig.Orchestrator.Enabled,
+ // Path to the CA certificate used to verify the server's certificate
ServerTLSCACert: cfg.BacalhauConfig.Orchestrator.NATS.CACert,
+ // Path to the server's TLS certificate (required if ServerTLSKey is provided)
ServerTLSCert: cfg.BacalhauConfig.Orchestrator.NATS.ServerTLSCert,
+ // Path to the server's TLS private key (required if ServerTLSCert is provided)
ServerTLSKey: cfg.BacalhauConfig.Orchestrator.NATS.ServerTLSKey,
+ // Timeout duration for TLS handshake (e.g. "30s", "1m")
ServerTLSTimeout: cfg.BacalhauConfig.Orchestrator.NATS.ServerTLSTimeout,
+ // Path to the CA certificate used to verify the client's certificate
ClientTLSCACert: cfg.BacalhauConfig.Compute.NATS.CACert,
}
+// Validate TLS configuration
+if config.ServerTLSCert != "" || config.ServerTLSKey != "" {
+ if config.ServerTLSCert == "" || config.ServerTLSKey == "" {
+ return nil, fmt.Errorf("both ServerTLSCert and ServerTLSKey must be provided")
+ }
+}
+
+// Validate TLS timeout
+if config.ServerTLSTimeout != "" {
+ timeout, err := time.ParseDuration(config.ServerTLSTimeout)
+ if err != nil {
+ return nil, fmt.Errorf("invalid ServerTLSTimeout format: %w", err)
+ }
+ if timeout < 5*time.Second || timeout > 5*time.Minute {
+ return nil, fmt.Errorf("ServerTLSTimeout must be between 5s and 5m")
+ }
+}
Committable suggestion skipped: line range outside the PR's diff.
💡 Codebase verification
Based on the search results, I can now provide a final response. The code shows that certificates are generated with a hardcoded 1-year expiration (NotAfter: time.Now().AddDate(1, 0, 0)) but there's no active validation of certificate expiration during runtime.
Add runtime certificate expiration validation
The current implementation only sets certificate expiration during creation but doesn't validate it during usage. Consider:
- Adding a certificate validation helper in
pkg/lib/crypto/certificate.go
to check expiration - Implementing periodic certificate expiration checks in the NATS transport layer
- Adding expiration validation when loading certificates in
pkg/nats/transport/nats.go
Example locations needing validation:
pkg/lib/crypto/certificate.go
: Add expiration check when certificates are loaded/usedpkg/nats/transport/nats.go
: Add validation when setting up TLS configpkg/node/node.go
: Add validation when loading certificates from config
🔗 Analysis chain
Verify TLS certificate expiration.
Consider adding certificate expiration checks to prevent using expired certificates.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if the TLS certificate expiration check is implemented elsewhere in the codebase.
# Test: Search for certificate expiration checks
rg -A 5 "certificate.*expir|expir.*certificate|x509.*Not.*After|cert.*valid"
# Test: Search for certificate validation functions
ast-grep --pattern 'func $_(cert $_) {
$$$
expir
$$$
}'
Length of output: 142
Script:
#!/bin/bash
# Let's try a more comprehensive search for TLS certificate validation
# Search for TLS certificate related code
rg -i "tls.*cert|cert.*valid|cert.*check" -A 5
# Search for x509 certificate handling
rg "x509\." -A 5
# Search for specific NATS TLS configuration and validation
rg "nats.*tls|tls.*nats" -A 5
# Look for any certificate loading or verification functions
ast-grep --pattern 'func $_($_) {
$$$
LoadCert
$$$
}'
ast-grep --pattern 'func $_($_) {
$$$
VerifyCert
$$$
}'
Length of output: 66564
Script:
#!/bin/bash
# Let's search for certificate expiration validation in the crypto package
# and any certificate validation functions
# Search for certificate validation and expiration checks in crypto package
rg -A 10 "NotAfter|NotBefore|Verify|Valid" pkg/lib/crypto/
# Look for any certificate validation in NATS transport
rg -A 10 "cert.*valid|valid.*cert" pkg/nats/transport/
# Search for any certificate loading and validation patterns
ast-grep --pattern 'func $_($_) {
$$$
cert := &x509.Certificate{
$$$
NotAfter: $_
$$$
}
$$$
}'
# Look for any TLS config validation
ast-grep --pattern 'func $_($_) {
$$$
tls.Config{
$$$
}
$$$
}'
Length of output: 3496
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Just minor comments
// Used by the Nats Client when node acts as orchestrator | ||
ServerTLSCACert string | ||
|
||
// Use by the Nats Client when node acts as compute | ||
ClientTLSCACert string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if you are running a hybrid node? do you think this differentiating is helpful here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intentionally separated these 2 to make it explicit separation, with configuration coming from 2 different places: Orchestrator and Compute. For hybrid, I would keep it as is, even if we defined it is 2 places, since technically we have a "compute node" and an "orchestrator node" concept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (10)
test_integration/common_assets/nodes_configs/8_tls_enabled_compute_with_custom_ca.yaml (1)
2-3
: Consider using a less common port number.While port 1234 works for testing, it's a commonly probed port. For production deployments, consider using a less predictable port number.
test_integration/common_assets/nodes_configs/7_tls_enabled_orchestrator.yaml (3)
6-7
: Use a more secure token format even in test configurations.While this is a test configuration, using obvious test tokens can lead to bad practices. Consider using a more secure format that mirrors production requirements (e.g., a UUID or a sufficiently long random string).
Auth: - Token: "i_am_very_secret_token" + Token: "2fd4e1c67a2d28fced849ee1bb76e7391b93eb12" # Example secure format
15-17
: Document the purpose of labels.The current labels appear generic. Consider:
- Adding comments to explain their purpose in testing
- Using more descriptive names that reflect their testing context
Labels: + # Label used for node identification in integration tests label1: label1Value + # Label used for test scenario classification label2: label2Value
2-3
: Document the significance of port 1234.If this port number is significant for testing purposes, please add a comment explaining why this specific port was chosen.
API: + # Port 1234 is used for integration testing to avoid conflicts with default ports Port: 1234
test_integration/common_assets/nodes_configs/8_tls_enabled_orchestrator_with_custom_ca.yaml (2)
2-3
: Consider using a less common port number for testsUsing well-known port numbers like 1234 could lead to port conflicts in test environments. Consider using a port number in the ephemeral range (49152-65535) to minimize potential conflicts.
API: - Port: 1234 + Port: 54321
16-18
: Use more descriptive test labelsThe current labels are generic and don't indicate their purpose in tests. Consider using labels that reflect their test context.
Labels: - label1: label1Value - label2: label2Value + test_environment: integration + tls_mode: custom_capkg/nats/transport/nats.go (4)
54-63
: LGTM! Well-structured TLS configuration fields.The separation of TLS configuration fields into server, orchestrator client, and compute client contexts provides clear boundaries for different security roles. This design allows for flexible TLS configurations while maintaining clear separation of concerns.
Consider documenting these fields with comments explaining their specific usage contexts and requirements, especially for hybrid node scenarios where both orchestrator and compute roles are present.
147-166
: Improve TLS configuration error handling.The server TLS configuration is secure using NATS's built-in
GenTLSConfig
. However, the error handling could be more specific to help with troubleshooting.serverTLSConfig, err := server.GenTLSConfig(serverTLSConfigOpts) if err != nil { - log.Error().Msgf("failed to configure NATS server TLS: %v", err) - return nil, err + return nil, fmt.Errorf("failed to configure NATS server TLS - ensure valid certificate and key files: %w", err) }
265-274
: Consider refactoring client TLS configuration logic.While the current implementation works and handles both orchestrator and compute roles, the logic could be simplified for better maintainability.
Consider introducing a role-based configuration approach:
-// We need to do this logic since the Nats Transport Layer does not differentiate -// between orchestrator mode and compute mode -if config.ServerTLSCert == "" && config.ClientTLSCACert != "" { - // this client is for a compute node - clientOptions = append(clientOptions, nats.RootCAs(config.ClientTLSCACert)) -} else if config.ServerTLSCert != "" && config.ServerTLSCACert != "" { - // this client is for a orchestrator node - clientOptions = append(clientOptions, nats.RootCAs(config.ServerTLSCACert)) -} +// Determine the appropriate CA certificate based on node role +var caCert string +switch { +case config.ServerTLSCert != "": + // Orchestrator role + caCert = config.ServerTLSCACert +case config.ClientTLSCACert != "": + // Compute role + caCert = config.ClientTLSCACert +} + +if caCert != "" { + clientOptions = append(clientOptions, nats.RootCAs(caCert)) +}This refactor:
- Makes the role-based logic more explicit
- Reduces nesting
- Simplifies the conditions
- Maintains the same functionality
25-25
: Document TLS timeout security implications.The default TLS timeout of 10 seconds should be documented with its security implications and recommendations for different deployment scenarios.
Add a comment explaining the timeout's purpose and recommendations:
-const NATSServerDefaultTLSTimeout = 10 +// NATSServerDefaultTLSTimeout is the default timeout in seconds for TLS handshake. +// A shorter timeout (e.g., 5s) may be preferred in low-latency environments, +// while a longer timeout might be needed in high-latency or resource-constrained environments. +const NATSServerDefaultTLSTimeout = 10
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (10)
pkg/config/types/compute.go
(1 hunks)pkg/config/types/generated_constants.go
(2 hunks)pkg/config/types/generated_descriptions.go
(2 hunks)pkg/config/types/orchestrator.go
(2 hunks)pkg/nats/transport/nats.go
(6 hunks)pkg/nats/transport/nats_config_test.go
(1 hunks)pkg/node/node.go
(1 hunks)test_integration/common_assets/nodes_configs/7_tls_enabled_orchestrator.yaml
(1 hunks)test_integration/common_assets/nodes_configs/8_tls_enabled_compute_with_custom_ca.yaml
(1 hunks)test_integration/common_assets/nodes_configs/8_tls_enabled_orchestrator_with_custom_ca.yaml
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- pkg/config/types/compute.go
- pkg/config/types/generated_descriptions.go
- pkg/config/types/orchestrator.go
- pkg/nats/transport/nats_config_test.go
- pkg/node/node.go
🔇 Additional comments (6)
test_integration/common_assets/nodes_configs/8_tls_enabled_compute_with_custom_ca.yaml (2)
6-7
: NATS URL scheme needs updating for TLS.
A previous review has already identified this issue.
10-13
: Verify CA certificate availability.
Let's ensure the CA certificate exists in the specified path and has appropriate permissions.
✅ Verification successful
CA certificate path is valid and accessible
The certificate exists at the correct relative path test_integration/common_assets/certificates/nats_custom/nats_root_ca.crt
and is properly referenced in the configuration. The path structure aligns with the integration test setup, and the certificate is part of a complete TLS setup including server certificates and keys.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify CA certificate existence and permissions
# Expected: File should exist and be readable
# Check if the certificate file exists
fd -t f "nats_root_ca.crt" | grep -q "certificates/nats_custom" || echo "CA certificate not found"
# Check file permissions if found
fd -t f "nats_root_ca.crt" -x stat -c "%a %n" {} \; | grep "certificates/nats_custom"
Length of output: 184
Script:
#!/bin/bash
# Let's try a different approach using ripgrep to locate the certificate
# and check its existence in the repository
# Search for references to the certificate file
rg -l "nats_root_ca.crt"
# Look for the actual certificate file
find . -name "nats_root_ca.crt" -type f
# Check the directory structure
ls -R test_integration/common_assets/certificates 2>/dev/null || echo "Directory not found"
Length of output: 932
test_integration/common_assets/nodes_configs/7_tls_enabled_orchestrator.yaml (2)
11-12
: Verify certificate path consistency across configurations.
Ensure these certificate paths are consistently used across all test configurations and documentation.
#!/bin/bash
# Check for consistency in certificate path usage
rg "/bacalhau_integration_tests/common_assets/certificates/generated_assets/bacalhau-orchestrator-node" --type yaml
13-13
: Verify if the TLS timeout value is sufficient.
The 15-second timeout might need adjustment based on your deployment environment and NATS connection patterns. Consider:
- Network latency in your target environment
- TLS handshake overhead
- Connection pool behavior
✅ Verification successful
Let me gather more information about timeout configurations across the codebase.
Let me search for NATS server configuration and any documentation about timeout settings.
Based on the search results, I can see that the 15-second ServerTimeout is used consistently in test configurations and is properly defined in the codebase. Let me verify if there are any default timeout values or documentation about recommended timeout settings.
Let me check specifically for TLS server timeout configurations and any related documentation.
Based on the search results, I can now provide a definitive response about the TLS timeout value:
The 15-second ServerTimeout setting is appropriate for NATS TLS connections.
The codebase shows that:
- NATS has a default TLS timeout of 10 seconds (
NATSServerDefaultTLSTimeout = 10
) - The 15-second value provides a reasonable buffer above the default
- The timeout is only used for the TLS handshake phase, not for ongoing connections
- The value is properly validated to be positive (
ServerTLSTimeout must be a positive number
)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for other timeout configurations in the codebase for consistency
rg -A 2 "ServerTimeout.*15" --type yaml
Length of output: 510
Script:
#!/bin/bash
# Search for any ServerTimeout configurations in yaml files
rg "ServerTimeout:" --type yaml -A 2
# Search for timeout related configurations in Go files that might set defaults
rg "ServerTimeout|DefaultTimeout" --type go -A 2
Length of output: 2494
Script:
#!/bin/bash
# Search for NATS server configuration and documentation
rg -g "*.md" -g "*.go" "NATS.*config|TLS.*timeout" -A 3
# Look for any comments or documentation about timeouts in the codebase
fd README.md | xargs cat
Length of output: 86057
Script:
#!/bin/bash
# Search for any default timeout values in Go files
rg "DefaultTimeout|default.*timeout" --type go -A 2
# Search for any timeout related documentation
rg -g "*.md" "timeout|TLS" -A 2
Length of output: 8161
Script:
#!/bin/bash
# Search for TLS server timeout configurations and documentation
rg "ServerTimeout|TLS.*timeout" -g "*.go" -g "*.md" -A 3
# Look for any NATS TLS configuration documentation
rg "NATS.*TLS|TLS.*config" -g "*.md" -g "*.go" -A 3
Length of output: 7952
test_integration/common_assets/nodes_configs/8_tls_enabled_orchestrator_with_custom_ca.yaml (1)
8-14
: 🛠️ Refactor suggestion
Enhance TLS configuration robustness
Several improvements can be made to the TLS configuration:
- The comment about container paths should be more explicit about the required directory structure
- The ServerTimeout lacks units specification (seconds/minutes)
- Consider adding validation for certificate existence
Let's verify if the certificate paths exist in the repository:
Suggested improvements:
TLS:
- # The paths for these certs are within the container after
- # the assets has been copied during image creation
+ # These paths are relative to the container's root directory.
+ # Required directory structure:
+ # /bacalhau_integration_tests/
+ # common_assets/
+ # certificates/
+ # nats_custom/
+ # - nats-server.crt
+ # - nats-server.key
+ # - nats_root_ca.crt
ServerCert: "/bacalhau_integration_tests/common_assets/certificates/nats_custom/nats-server.crt"
ServerKey: "/bacalhau_integration_tests/common_assets/certificates/nats_custom/nats-server.key"
CACert: "/bacalhau_integration_tests/common_assets/certificates/nats_custom/nats_root_ca.crt"
- ServerTimeout: 15
+ # Timeout in seconds for TLS handshake
+ ServerTimeout: 15
pkg/config/types/generated_constants.go (1)
28-28
: LGTM! Constants follow consistent naming patterns.
The new TLS-related constants are well-structured and complete, providing configuration keys for both compute nodes (client-side) and orchestrator (server-side) TLS setup. Note that this is a generated file (from pkg/config/types/gen/generate.go
), so any changes should be made to the generator rather than directly to this file.
Let's verify the generator file exists and contains the corresponding entries:
Also applies to: 95-98
✅ Verification successful
Constants correctly reflect the TLS configuration structure
The generated constants accurately map to the TLS-related fields defined in the codebase:
ComputeTLSCACertKey
maps toComputeTLS.CACert
OrchestratorTLSServerCertKey
maps toOrchestratorTLS.ServerCert
OrchestratorTLSServerKeyKey
maps toOrchestratorTLS.ServerKey
OrchestratorTLSServerTimeoutKey
maps toOrchestratorTLS.ServerTimeout
OrchestratorTLSCACertKey
maps toOrchestratorTLS.CACert
The generator is working as expected, creating constants from the struct field definitions while maintaining the proper naming patterns.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Verify the generator file exists and contains TLS-related entries
rg -A 2 "TLS" pkg/config/types/gen/generate.go
Length of output: 46
Script:
#!/bin/bash
# First, let's check if the generator file exists and its content
fd "generate.go" pkg/config/types/
# If the file exists, let's look for any TLS-related content with case-insensitive search
fd "generate.go" pkg/config/types/ --exec cat {} | rg -i "tls"
# Let's also check where these constants are actually used
rg "ComputeTLSCACertKey|OrchestratorTLSCACertKey|OrchestratorTLSServerCertKey|OrchestratorTLSServerKeyKey|OrchestratorTLSServerTimeoutKey" -A 2
Length of output: 2827
Script:
#!/bin/bash
# Let's check the generator file content to see how these constants are defined
cat pkg/config/types/gen/generate.go
# Also check for any config schema or definition files that might be the source
fd -e yaml -e json . pkg/config/types/gen/ --exec cat {}
Length of output: 8267
Script:
#!/bin/bash
# Let's check the Bacalhau config struct to see where these TLS fields are defined
rg -A 5 "type Bacalhau struct" pkg/config/types/
Length of output: 653
Script:
#!/bin/bash
# Let's check for TLS-related structs and fields in the config types
rg -A 5 "type (Compute|Orchestrator) struct" pkg/config/types/
# Also check for any TLS-related struct definitions
rg -B 2 -A 5 "TLS.*struct" pkg/config/types/
Length of output: 3128
This PR adds the ability to enable TLS for NATS communication in a cluster.
It supports custom CA as well as regular signed certificate by known CA. If the TLS configuration was not set in the config, it will default back to non-tls. The NATS server within the orchestrator node dictates if NATS TLS is enabled or not. When enabled, only TLS communication is allowed.
Sample Orchestrator Config:
Sample Compute Node Config:
For details on how to enable TLS for the NATS Server on the orchestrator node and compute node, please check the integration tests in the commits within this PR as they have detailed coverage for multiple use-cases as well as detailed configuration files.
Linear: https://linear.app/expanso/issue/ENG-305/plan-and-implement-nats-tls-enabling-in-core-bacalhau
Github: #4681
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Documentation
run_locally.sh
script with new command-line options for improved usability.Chores
go.mod
to include new libraries for JSON path evaluation.