Separate StrictModeConfig output structure#6114
Conversation
📝 WalkthroughWalkthroughThe changes update the strict mode configuration handling across the API and related code. In the OpenAPI specification, several schema names have been updated to use an "Output" suffix, with adjustments made to properties such as rate limits and maximum counts. In the gRPC conversion module, new implementations for converting between the original strict mode configuration types and their newly named output types have been added, including handling related to sparse and multivector settings. These modifications are also propagated to the collection module, where the strict mode configuration field in the collection configuration struct is updated to use the new output type, and conversion logic is adjusted accordingly. New structures for handling strict mode configurations, specifically for sparse and multivector data, have also been introduced, with conversion traits implemented to ensure proper data mapping. Possibly related PRs
Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (4)
⏰ Context from checks skipped due to timeout of 90000ms (12)
🔇 Additional comments (16)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (4)
lib/api/src/grpc/conversions.rs (1)
2004-2047: ConvertingStrictModeConfigOutputinto RPC-levelStrictModeConfig.Downcasting from
usizetou32or fromf64tof32may lead to minor precision or range constraints. If there is a possibility of very large numbers or high precision, consider documenting or validating the permissible ranges.lib/segment/src/types.rs (3)
736-742: Consider reintroducing validation
If you want to ensure no invalid values are exposed (e.g., minimum size constraints), you could add a validation attribute here (similar to the originalStrictModeSparse).
810-816: Revisit optional validation
If you want to enforce a minimum number of vectors for consistency, consider adding a validation check in this output struct as well.
953-1078: Enhance consistency with other output structs
You might consider adding#[schemars(deny_unknown_fields)]here, as done in the sparse and multivector output structs, to maintain uniform schema rules for strict output objects. Otherwise, everything else looks fine.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
docs/redoc/master/openapi.json(9 hunks)lib/api/src/grpc/conversions.rs(4 hunks)lib/collection/src/operations/conversions.rs(2 hunks)lib/collection/src/operations/types.rs(3 hunks)lib/segment/src/types.rs(4 hunks)
🔇 Additional comments (25)
lib/collection/src/operations/conversions.rs (2)
24-24: Import update to support the new StrictModeConfigOutput typeThe import statement has been updated to include
StrictModeConfigOutputinstead of the previousStrictModeConfig, supporting the restructuring needed for Python REST client compatibility.
2096-2096: Updated strict mode config conversion to use new output typeThis change correctly implements the PR objective of restructuring the
StrictModeConfigused withinCollectionInfo. By replacing the earlierStrictModeConfig::fromwithStrictModeConfigOutput::from, the code now uses the new output structure that allows the Python REST client to recognize these types as capable of accepting additional keys during deserialization.This implementation follows the same pattern that was previously used for
OptimizerConfigandOptimizerConfigDiffto maintain backward compatibility between Python client versions.docs/redoc/master/openapi.json (8)
6707-6707: Schema reference updated to use the new output type.The reference in
CollectionConfighas been updated to use the newStrictModeConfigOutputschema instead of the originalStrictModeConfig. This is in line with the PR objective to restructure theStrictModeConfigfor better handling by the Python REST client.
7256-7385: New schemaStrictModeConfigOutputcreated for backward compatibility.This new schema structure mirrors the original
StrictModeConfigbut with adjusted minimum values to provide more flexibility for client deserialization. The changes to minimum values (from 1 to 0) in several properties allow the fields to accept a wider range of values, which helps maintain backward compatibility between different client versions.Key changes:
read_rate_limit: minimum changed from 1 to 0write_rate_limit: minimum changed from 1 to 0max_points_count: minimum changed from 1 to 0These changes align with similar approaches previously implemented for
OptimizerConfigandOptimizerConfigDiffas mentioned in the PR objectives.
7386-7392: Renamed schema toStrictModeMultivectorConfigOutputwith updated reference.This change is part of the overall restructuring pattern, creating dedicated output schemas with references to their corresponding output components, in this case
StrictModeMultivectorOutput.
7392-7403: New schemaStrictModeMultivectorOutputreplacesStrictModeMultivector.Similar to the main
StrictModeConfigOutputchange, this specialized multivector configuration output schema has been created with a relaxed minimum value formax_vectors(changed from 1 to 0), providing more flexibility for client deserialization.
7404-7410: Renamed schema toStrictModeSparseConfigOutputwith updated reference.This follows the consistent pattern of creating dedicated output schemas with references to their corresponding output components, in this case
StrictModeSparseOutput.
7410-7421: New schemaStrictModeSparseOutputreplacesStrictModeSparse.This specialized sparse configuration output schema has been created with a relaxed minimum value for
max_length(changed from 1 to 0), maintaining the consistent pattern of providing more flexibility in the output schemas.
9601-9766: OriginalStrictModeConfigschema maintained for backward compatibility.The original
StrictModeConfigschema is retained alongside the new output schemas. This dual schema approach is a good practice for maintaining backward compatibility:
- Input validation can still use the stricter requirements of the original schema
- Output serialization can use the more flexible Output version
- Existing client code can continue to work with the original schema
The original schema maintains the stricter minimum values (1 instead of 0) for:
read_rate_limitwrite_rate_limitmax_points_countmax_vectors(inStrictModeMultivector)max_length(inStrictModeSparse)
9839-9847:UpdateCollectioncorrectly references the originalStrictModeConfig.For update operations, the reference to the original
StrictModeConfigschema is maintained. This is appropriate since updates should still adhere to the stricter validation rules.lib/collection/src/operations/types.rs (3)
34-35: Import usage looks consistent.The addition of
StrictModeConfigOutputin these import lines aligns with the subsequent changes in this file where the new type is leveraged. No issues found.
201-201: Renaming the field to useStrictModeConfigOutput.Switching to
Option<StrictModeConfigOutput>instead ofOption<StrictModeConfig>is logically consistent with the updated conversion logic. Ensure callers properly handle any backward compatibility concerns if needed.
223-223: Correct mapping forstrict_mode_config.Referencing the
StrictModeConfigOutputthrough.map(StrictModeConfigOutput::from)properly aligns with the updated struct. This ensures the new schema is consistently applied.lib/api/src/grpc/conversions.rs (6)
1-1: Additional imports introduced.The inclusion of
BTreeMapandHashSetindicates usage in the newly added conversion functions. No immediate issues identified.
1985-2003: Conversion fromStrictModeSparseConfigOutputtoStrictModeSparseConfig.This mapping is straightforward, translating
max_lengthinto au64field. Confirm that the final type accommodates all use cases, especially regarding potential overflow or architecture differences forusize.
2049-2095: Reverse conversion fromStrictModeConfigtoStrictModeConfigOutput.The code neatly mirrors the prior conversion. The approach is consistent, ensuring that the output aligns well with the new strict mode schema.
2096-2111: ConvertingStrictModeMultivectorConfigtoStrictModeMultivectorConfigOutput.The mapping from
u64tousizeformax_vectorsis logically consistent across the codebase. No further issues noted.
2131-2148: TranslatingStrictModeMultivectorConfigOutputback intoStrictModeMultivectorConfig.This is the symmetrical conversion for multivector fields. The iteration over the config looks good, and key-value pairs are mapped correctly.
2150-2164: GeneratingStrictModeSparseConfigOutputfromStrictModeSparseConfig.The conversion loop and assignment to the
BTreeMapare clear. Everything appears consistent with the rest of the strict mode transformations.lib/segment/src/types.rs (6)
729-735: Looks good for flattened output structure
No concerns here; the overall approach of flattening looks correct.
743-755: Straightforward conversion logic
Implementation correctly iterates over each entry and converts to the output type without side effects.
756-762: Clear and concise transformation
TheFromimplementation neatly copies the relevant field. No issues identified.
790-796: Same pattern, no concerns
Mirrors the sparse config output structure exactly, providing a consistent approach.
797-809: Conversion logic appears correct
Each key-value pair is mapped nicely to the new output.
817-823: Simple and correct
The direct conversion fromStrictModeMultivectortoStrictModeMultivectorOutputis properly handled.
c24575c to
237a972
Compare
| wal_config: Some(wal_config), | ||
| quantization_config, | ||
| strict_mode_config, | ||
| strict_mode_config: strict_mode_config.map(StrictModeConfigOutput::from), |
There was a problem hiding this comment.
CollectionConfigInternal is used in collection telemetry, and, therefore, should either be splitted into Output structure, or contain StrictModeConfigOutput
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (9)
lib/api/src/grpc/conversions.rs (5)
2004-2047: Watch out for potential truncation when castingusizetou32orf32.Several fields (e.g.,
max_query_limit,max_timeout,search_max_oversampling) are converted fromOption<usize>into narrower types. If these fields can exceedu32::MAXor require precise float handling, consider clamping or adding logic to avoid unexpected overflow or truncation.
2049-2093: Be cautious with cross-architecture differences amid integer widening.Here, the code converts
Option<u32>and others intoOption<usize>. On typical platforms, this is fine, but keep in mind that ifusizeis smaller or larger than expected (rare architectures), values could wrap. Adding boundary checks or documentation might help future maintainers.
2096-2110: Check for safe bounds when convertingu64tousize.The loop inserts
max_vectorsby casting it fromOption<u64>toOption<usize>. While standard 64-bit systems typically handle this seamlessly, a robust check or clamp for extremely large values can provide safer failover in edge cases.
2150-2164: Ensure consistency handlingmax_lengthacross expansions/conversions.This final block casts from
Option<u64>toOption<usize>, mirroring the earlier pattern. Consider a consistent approach across all conversions—like clamps, error returns, or thorough documentation—to handle unusual or invalid data safely.
1985-2002:Details
❓ Verification inconclusive
Consider potential integer conversion edge cases when casting
max_lengthfromusizetou64.Although this conversion is valid on common 64-bit architectures, be mindful that extremely large
usizevalues or a hypothetical 128-bit platform scenario could cause overflow. A quick check or clamp might be prudent if untrusted or large inputs are expected.Below is a script you can use to locate all references to
StrictModeSparseConfigOutputacross the codebase to confirm usage and ensure safe handling of large values:
🏁 Script executed:
#!/bin/bash # Description: Verify usage of StrictModeSparseConfigOutput to check for potential large or untrusted inputs rg -A 10 'StrictModeSparseConfigOutput'Length of output: 7132
Action Required: Revisit the casting of
max_lengthfor potential overflow risks.The current code performs a direct cast from
usizetou64while mapping the configuration:max_length: config.max_length.map(|i| i as u64)This is safe on standard 64-bit architectures. However, as noted in the OpenAPI schema (e.g., in
docs/redoc/master/openapi.json), if an unexpectedly large value is fed—or if running on a hypothetical 128-bit system—the conversion might overflow. To mitigate any risks when dealing with potentially untrusted or large inputs, consider inserting a check or clamp that ensuresmax_lengthdoesn’t exceed the maximum value representable byu64.
- File:
lib/api/src/grpc/conversions.rs(lines 1985–2002)- Suggestion: Add bounds-checking or clamping before casting, and ensure that the reverse conversion (casting back to
usize) remains safe and consistent with API expectations.lib/segment/src/types.rs (4)
730-736: Validate the no-op anonymization logic.This implementation copies the
configfield verbatim instead of actually removing any identifying data. If the intent is to truly anonymize fields, consider masking or removing sensitive values.
745-751: Check intended anonymization policy.This method returns the same value for
max_length, making the anonymization logic effectively a pass-through. Verify if this is the expected functionality when “anonymizing.”
790-795: No-op anonymization concern.Similar to the sparse output, this leaves
max_vectorsas-is. Confirm whether retaining the original value meets your anonymization requirements.
986-1009: Superficial anonymization pass.All fields (besides nested configs) are directly copied instead of being masked—confirm if you need deeper anonymization for sensitive fields.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
docs/redoc/master/openapi.json(9 hunks)lib/api/src/grpc/conversions.rs(4 hunks)lib/collection/src/operations/conversions.rs(2 hunks)lib/collection/src/operations/types.rs(3 hunks)lib/segment/src/types.rs(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- lib/collection/src/operations/conversions.rs
- lib/collection/src/operations/types.rs
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test-consensus-compose
- GitHub Check: test (macos-latest)
- GitHub Check: test (windows-latest)
- GitHub Check: test-consensus
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (22)
docs/redoc/master/openapi.json (8)
6707-6708: Schema reference update aligns with PR objective.The change from
StrictModeConfigtoStrictModeConfigOutputin the CollectionConfig schema is consistent with the PR objective of restructuring the StrictModeConfig to allow Python clients to accept additional keys during deserialization.
7256-7385: New StrictModeConfigOutput schema appropriately supports the backward compatibility goal.This new schema definition replaces the previous StrictModeConfig in the public API. The structure remains the same but with more lenient minimum values (changed from 1 to 0) for rate limits and counts, which provides better flexibility for client applications, particularly the Python REST client mentioned in the PR objectives.
7324-7325: Appropriate adjustment of minimum values to improve compatibility.Changing the minimum values from 1 to 0 for
read_rate_limit,write_rate_limit, andmax_points_countallows for more flexible configuration options. This modification is particularly important for maintaining backward compatibility with Python client version 1.13.2 as mentioned in the PR objectives.Also applies to: 7331-7332, 7345-7346
7386-7421: Well-structured nested output schemas for multivector and sparse configurations.The new output schemas for multivector and sparse configurations maintain the same structure as their non-output counterparts but with adjusted minimum values. This approach is consistent with similar changes made for
OptimizerConfigandOptimizerConfigDiffas mentioned in the PR objectives, ensuring a consistent pattern throughout the API.
7399-7400: Appropriate relaxation of nested schema constraints.Changing the minimum values from 1 to 0 for
max_vectorsandmax_lengthin the nested schemas provides more flexibility while maintaining type safety. This approach allows for configurations where these values might not be explicitly set or could be set to 0 to indicate "no limit" or "use default value".Also applies to: 7417-7418
9601-9766: Original StrictModeConfig schema retained for compatibility.Maintaining the original
StrictModeConfigschema alongside the newStrictModeConfigOutputis a good practice for ensuring backward compatibility. This dual-schema approach allows the system to handle both incoming (client to server) and outgoing (server to client) data structures appropriately, which is crucial when evolving APIs that need to maintain compatibility with existing clients.
7279-7280: Updated field descriptions improve clarity.The descriptions for
unindexed_filtering_retrieveandunindexed_filtering_updatehave been refined to be more precise and informative, which enhances the documentation quality of the API.Also applies to: 7284-7285
7366-7383: Properly updated references to nested schemas.The schema references have been consistently updated to point to the new
...Outputschema variants, ensuring coherence throughout the API structure. This attention to detail prevents reference errors and maintains the logical structure of the related schemas.Also applies to: 7377-7383
lib/api/src/grpc/conversions.rs (1)
2131-2148: Add tests for largemax_vectorsvalues.This conversion shifts
Option<usize>toOption<u64>. Although functionally correct on common systems, coverage for extreme or invalid values helps ensure no hidden bugs arise if inputs exceed typical 64-bit ranges or if negative values accidentally appear.Would you like a sample unit test added or an extended verification script?
lib/segment/src/types.rs (13)
738-743: Confirm handling of additional fields.Using
#[schemars(deny_unknown_fields)]together with#[serde(flatten)]may restrict rather than permit unknown top-level fields. Ensure this behavior aligns with the PR's goal of allowing extra keys in the JSON.
753-758: Struct usability check.
StrictModeSparseOutputmirrorsStrictModeSparsewith an optionalmax_length. This is consistent with the new output design.
760-771: Iterative conversion logic looks good.The loop properly transforms each sparse config entry into the new output form, preserving keys without errors.
773-778: Straightforward field mapping.The
Fromconversion forStrictModeSparseintoStrictModeSparseOutputis concise and correct.
815-820: Consistent pattern with sparse logic.The multivector config’s
anonymize()follows the same approach, recursing into inner fields.
823-829: Validate unknown fields approach.Check if
#[schemars(deny_unknown_fields)]is truly what you need here, given the PR’s stated aim of handling extra keys.
830-841: Mapping logic confirmed.The loop-based copy from the old
StrictModeMultivectorConfigto the new output struct looks correct.
843-849: Straightforward new output struct.
StrictModeMultivectorOutputis aligned withStrictModeMultivector, preserving backward compatibility as intended.
850-856: Correct field forwarding.The
Fromimplementation forStrictModeMultivectorcopies themax_vectorsfield directly, maintaining consistency across conversions.
874-874: Minor documentation improvement.Changing “eg.” to “e.g.” clarifies the comment for retrieval. No further issues here.
878-878: Minor documentation improvement.Similarly clarifies doc comments for updates. Looks fine.
1011-1089: Extensive new output struct.
StrictModeConfigOutputis thorough and matches the fields ofStrictModeConfig. Ensure regular updates of doc comments to reflect the new usage.
1091-1135: ComprehensiveFromconversion.This mapping cleanly handles nested structures, providing a solid boundary between the input config and the new output struct.
timvisee
left a comment
There was a problem hiding this comment.
Extra inputs are not permitted
I'd argue that pydantic should ignore unknown field instead.
|
Working on fixing the associated Python client PR before merging this one to make sure it does not break anything. I am currently chasing down an infinite recursion in the |
|
I am merging this PR because I have been able to show that:
|
* Separate StrictModeConfig output structure * happy regen * add strict sparse config * introduce CollectionConfigTelemetry * upd openapi --------- Co-authored-by: Andrey Vasnetsov <andrey@vasnetsov.com>
Separate the whole struct tree of
StrictModeConfigused in theCollectionInfoin order to enable the Python REST client to to tag those types as accepting extra keys during deserialization.This is required to keep the backward compatibility between the Python client 1.13.2 and the upcoming 1.14.0 version.
We have used a similar pattern in the past for
OptimizerConfigandOptimizerConfigDifffor the same issue.As a final test I validated that the new output classes are tagged properly in the client generation