Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Update TimeKeyEncoder to Preserve Lexicographical Order #21

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

matthiasmatt
Copy link
Contributor

@matthiasmatt matthiasmatt commented Oct 28, 2024

Update TimeKeyEncoder to Preserve Lexicographical Order

Summary

This pull request updates the TimeKeyEncoder in the collections module to ensure that time.Time keys are encoded in a lexicographically sortable manner that aligns with chronological order. This change fixes issues with iterators that rely on time-based keys, particularly when retrieving data in descending order.

Background

Previously, the TimeKeyEncoder used sdk.FormatTimeBytes(t) and sdk.ParseTimeBytes(b) for encoding and decoding time.Time values. This approach encodes times into RFC3339 string formats, which do not guarantee that the lexicographical order of the encoded bytes matches the chronological order of the times. As a result, iterators operating over these keys could return data in an incorrect order.

This issue became evident when attempting to retrieve the latest snapshots in modules like the Oracle module, where iterators did not return the most recent data as expected.

4. Enhanced Testing

  • Unit Tests: Updated and added unit tests to verify that encoding and decoding of time.Time values work as expected and that iterators return data in the correct order.

Implications

Data Migration

  • Breaking Change: This update changes the way time.Time keys are encoded, which means existing data stored with the previous encoding will not be accessible using the new encoder.
  • Migration Required: A data migration is necessary to re-encode existing keys using the new TimeKeyEncoder. Without migration, data retrieval for time-based keys will fail or return incorrect results.

Backward Compatibility

  • This change is not backward compatible with previous versions that used the old TimeKeyEncoder. Users must be aware of this when upgrading and plan for data migration.

Summary by CodeRabbit

  • New Features

    • Introduced a new direct dependency for improved functionality.
    • Enhanced encoding and decoding methods for time keys, improving efficiency.
  • Tests

    • Added multiple test functions to validate the behavior of KeySet with time.Time keys, ensuring proper insertion, deletion, iteration, and encoding/decoding.

Copy link

coderabbitai bot commented Oct 28, 2024

Walkthrough

The changes involve updates to the go.mod, keys.go, and keyset_test.go files. The go.mod file now includes a new direct dependency on github.com/cosmos/gogoproto v1.4.12, while removing it from the indirect dependencies. In keys.go, the timeKey struct's Encode and Decode methods have been modified for more efficient handling of time.Time objects. The keyset_test.go file has been expanded with multiple new test functions to enhance testing of the KeySet functionality, particularly for operations involving time.Time keys.

Changes

File Change Summary
go.mod Added direct dependency: github.com/cosmos/gogoproto v1.4.12; removed indirect dependency.
keys.go Updated Encode method to directly convert time.Time to byte slice; modified Decode method for error handling and decoding.
keyset_test.go Added multiple test functions for KeySet with time.Time keys, including tests for insertion, deletion, iteration, and encoding/decoding.

Poem

🐇 In the meadow where time keys play,
New dependencies hop in, bright as day.
Encoding, decoding, all in a line,
Tests now flourish, oh how they shine!
With every change, we leap and bound,
In our code garden, joy is found! 🌼


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (6)
keys.go (2)

71-75: Consider enhancing error message for debugging.

The implementation correctly decodes the binary format and ensures UTC timezone consistency. However, the error message could be more helpful for debugging.

Consider enhancing the panic message:

-		panic("invalid time key")
+		panic(fmt.Errorf("invalid time key: expected 8 bytes, got %d bytes %s", len(b), HumanizeBytes(b)))

63-75: Document breaking changes and provide migration guidance.

This change in time key encoding is a breaking change that affects data persistence. To ensure a smooth upgrade path:

  1. Document the breaking change in CHANGELOG.md
  2. Provide migration scripts or guidance for users to re-encode existing time-based keys
  3. Consider adding version bytes to the key encoding to support future changes

Would you like assistance in drafting the migration documentation or implementing a migration helper function?

keyset_test.go (4)

108-108: Simplify time comparisons using time.Equal

When comparing time.Time values, you can use the built-in Equal method instead of stripping monotonic clock readings and comparing UnixNano timestamps. This simplifies the code and improves readability.

Apply this refactor to simplify time comparisons:

-expectedTime := times[i].Round(0)
-actualTime := k.Round(0)
-require.Equal(t, expectedTime.UnixNano(), actualTime.UnixNano())
+require.True(t, times[i].Equal(k))

Repeat this change in all test functions where time comparisons are performed.

Also applies to: 144-144, 154-154, 192-192, 213-213, 251-251


220-227: Extend tests with sub-second time differences

The current tests use time differences in whole seconds. To thoroughly verify the lexicographical order preservation, consider adding times with sub-second differences. This ensures the encoding handles times with finer granularity.

Extend the times slice with sub-second time values:

 times := []time.Time{
 	now.Add(1 * time.Second),
 	now.Add(2 * time.Second),
 	now.Add(3 * time.Second),
 	now.Add(4 * time.Second),
 	now.Add(5 * time.Second),
+	now.Add(5*time.Second + 500*time.Millisecond),
+	now.Add(5*time.Second + 750*time.Millisecond),
 }

255-275: Test insertion of times with identical wall times but different monotonic clocks

Currently, the test inserts the exact same time.Time value multiple times. Consider testing with times that have the same wall clock time but different monotonic clock readings to ensure that they are treated as equal keys in the KeySet.

Modify the test to create times with identical wall times:

 now := time.Date(2021, time.January, 1, 0, 0, 0, 0, time.UTC)

+nowWithDifferentMonotonic := now.Add(0 * time.Nanosecond)
+
 // Insert the same time multiple times (should only be stored once in a set)
 keyset.Insert(ctx, now)
+keyset.Insert(ctx, nowWithDifferentMonotonic)
 keyset.Insert(ctx, now)
 keyset.Insert(ctx, now)

Verify that only one key exists in the set:

 require.Equal(t, 1, len(keys))
+require.True(t, keys[0].Equal(now))
+require.True(t, keys[0].Equal(nowWithDifferentMonotonic))

149-155: Ensure encoding and decoding handles time zones correctly

In TestTimeKeyEncoder_EncodeDecode, consider adding checks to verify that the encoding and decoding process correctly handles time.Time values with different locations (time zones).

Extend the test to include times with different locations:

 now := time.Date(2021, time.January, 1, 0, 0, 0, 0, time.UTC)
+loc, _ := time.LoadLocation("America/New_York")
+nowNY := now.In(loc)

 encoded := TimeKeyEncoder.Encode(now)
 _, decoded := TimeKeyEncoder.Decode(encoded)

+encodedNY := TimeKeyEncoder.Encode(nowNY)
+_, decodedNY := TimeKeyEncoder.Decode(encodedNY)

 // Compare UnixNano timestamps
 require.Equal(t, now.UnixNano(), decoded.UnixNano())
+require.Equal(t, nowNY.UnixNano(), decodedNY.UnixNano())
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 7689949 and f912b74.

📒 Files selected for processing (3)
  • go.mod (1 hunks)
  • keys.go (2 hunks)
  • keyset_test.go (2 hunks)
🔇 Additional comments (3)
keys.go (1)

4-4: LGTM: Required import for binary encoding.

The addition of "encoding/binary" import is necessary for the new BigEndian encoding implementation.

go.mod (1)

13-13: LGTM! Verify gogoproto usage.

The addition of gogoproto as a direct dependency is appropriate given its role in the new time encoding implementation.

Let's verify the new dependency usage:

✅ Verification successful

Let me check the usage of gogoproto in value_encoder.go to understand how it's being utilized.


Let me check the specific context where cosmosproto.Message is being used to understand its role in the implementation.


Gogoproto dependency is correctly used and implemented

The direct dependency on gogoproto is justified and properly implemented:

  • It's used in value_encoder.go for protobuf message handling through cosmosproto.Message interface
  • The implementation is type-safe using Go generics in ProtoValueEncoder function
  • The usage aligns with the PR's objectives for efficient serialization
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that gogoproto is being used in the codebase
# Expected: Find imports and usages of gogoproto, particularly in time-related encoding

# Search for gogoproto imports
echo "Searching for gogoproto imports..."
rg "github.com/cosmos/gogoproto"

# Search for potential proto definitions using gogoproto
echo "Searching for proto definitions using gogoproto..."
fd -e .proto -x rg -l "gogoproto"

Length of output: 652


Script:

#!/bin/bash
# Check the context of gogoproto usage in value_encoder.go
echo "Checking value_encoder.go for gogoproto usage..."
rg -B 3 -A 3 "github.com/cosmos/gogoproto/proto" value_encoder.go

# Look for actual usage of the cosmosproto package
echo "Checking for cosmosproto usage..."
rg "cosmosproto\." value_encoder.go

Length of output: 516


Script:

#!/bin/bash
# Get the full context of cosmosproto.Message usage
echo "Checking context of cosmosproto.Message usage..."
rg -B 5 -A 5 "cosmosproto\.Message" value_encoder.go

# Also check if there are any proto files that might be related to this implementation
echo "Looking for relevant proto files..."
fd -e proto

Length of output: 870

keyset_test.go (1)

59-275: Overall excellent test coverage

The added tests comprehensively cover the functionality of KeySet with time.Time keys, including insertion, deletion, iteration in both ascending and descending orders, range queries, and handling of duplicate keys.

Comment on lines +64 to +68
func (timeKey) Encode(t time.Time) []byte {
b := make([]byte, 8)
binary.BigEndian.PutUint64(b, uint64(t.UnixNano()))
return b
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Time range limitation needs documentation and testing

The implementation's use of UnixNano() introduces time range constraints (years 1678 to 2262), but this limitation is neither documented nor tested. Consider:

  • Add documentation in keys.go explaining the valid time range
  • Add test cases with edge timestamps to verify behavior near the bounds
  • Consider using a different encoding scheme if wider time range support is required
🔗 Analysis chain

Implementation ensures lexicographical ordering but has time range limitations.

The new implementation correctly uses binary encoding of Unix nanoseconds to ensure lexicographical ordering matches chronological ordering. However, there are some important considerations:

  1. This is a breaking change as existing encoded keys will not be decodable.
  2. The use of UnixNano() limits the valid time range to years 1678 to 2262 due to the int64 constraints.

Let's verify the time range limitation:


The search results show that all tests use timestamps from 2021, with relatively small time offsets (hours/seconds). There's no evidence of testing the time range limitations. Let's search for any documentation or comments about time range constraints.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any time values or tests that might be affected by the year 2262 limitation
rg -i "time\.(date|parse|unix)" -A 5

Length of output: 3537


Script:

#!/bin/bash
# Search for any documentation or comments about time constraints
rg -i "(unix|time).*(limit|constraint|range|bound)" 

Length of output: 117

Comment on lines +91 to +94
sort.Slice(times, func(i, j int) bool {
return times[i].Before(times[j])
})

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Refactor duplicate sorting logic into helper functions

The sorting logic for time slices is repeated across multiple test functions. Refactoring this code into helper functions will reduce duplication and enhance maintainability.

Apply this refactor by adding helper functions:

// Helper function to sort times in ascending order
func sortTimesAsc(times []time.Time) []time.Time {
	sortedTimes := make([]time.Time, len(times))
	copy(sortedTimes, times)
	sort.Slice(sortedTimes, func(i, j int) bool {
		return sortedTimes[i].Before(sortedTimes[j])
	})
	return sortedTimes
}

// Helper function to sort times in descending order
func sortTimesDesc(times []time.Time) []time.Time {
	sortedTimes := make([]time.Time, len(times))
	copy(sortedTimes, times)
	sort.Slice(sortedTimes, func(i, j int) bool {
		return sortedTimes[i].After(sortedTimes[j])
	})
	return sortedTimes
}

Then, replace the sorting code in your test functions with these helper functions:

-// Sort times in ascending order
-sortedTimesAsc := make([]time.Time, len(times))
-copy(sortedTimesAsc, times)
-sort.Slice(sortedTimesAsc, func(i, j int) bool {
-	return sortedTimesAsc[i].Before(sortedTimesAsc[j])
-})
+sortedTimesAsc := sortTimesAsc(times)
-// Sort times in descending order
-sortedTimesDesc := make([]time.Time, len(times))
-copy(sortedTimesDesc, times)
-sort.Slice(sortedTimesDesc, func(i, j int) bool {
-	return sortedTimesDesc[i].After(sortedTimesDesc[j])
-})
+sortedTimesDesc := sortTimesDesc(times)

Also applies to: 130-133, 176-181, 196-201

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant