-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support search logs by timestamp for structured and unstructured logs. #42
base: main
Are you sure you want to change the base?
Conversation
# Conflicts: # src/clp_ffi_js/ir/StreamReader.cpp # src/clp_ffi_js/ir/StreamReader.hpp
WalkthroughThis pull request introduces new methods across several log stream reader classes to retrieve the nearest log event index based on a specified timestamp. The method Changes
Sequence Diagram(s)sequenceDiagram
participant JS as JavaScript Interface
participant SR as StreamReader/StructuredIrStreamReader/UnstructuredIrStreamReader
participant GF as Generic Finder
JS->>SR: findNearestLogEventByTimestamp(timestamp)
SR->>GF: generic_find_nearest_log_event_by_timestamp(log_events, timestamp)
GF-->>SR: NullableLogEventIdx
SR-->>JS: NullableLogEventIdx
Possibly related PRs
Suggested reviewers
Tip 🌐 Web search-backed reviews and chat
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future commits, you may follow https://github.com/y-scope/clp-ffi-js?tab=readme-ov-file#linting to run the linter
This PR is blocking y-scope/yscope-log-viewer#152 |
…x_by_timestamp, use std::ranges::upper_bound instead of std::upper_bound
…s null when log events are empty, and return index with "best effort"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
src/clp_ffi_js/ir/StreamReader.cpp
(2 hunks)src/clp_ffi_js/ir/StreamReader.hpp
(6 hunks)src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
(2 hunks)src/clp_ffi_js/ir/StructuredIrStreamReader.hpp
(2 hunks)src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp
(1 hunks)src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/clp_ffi_js/ir/StreamReader.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/clp_ffi_js/ir/StreamReader.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
🪛 GitHub Actions: lint
src/clp_ffi_js/ir/StreamReader.hpp
[error] 65-66: Code formatting violation: Static create method declaration is not properly formatted
[error] 126-126: Code formatting violation: DecodedResultsTsType return type declaration is not properly formatted
[error] 138-139: Code formatting violation: get_log_event_index_by_timestamp method declaration is not properly formatted
[error] 201-203: Code formatting violation: requires clause and its parameters are not properly formatted
[error] 301-303: Code formatting violation: requires clause and its parameters are not properly formatted
🔇 Additional comments (10)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1)
74-75
: LGTM! Method declaration is well-defined.The method signature is consistent with the base class and follows proper C++ practices with the [[nodiscard]] attribute.
src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (2)
11-11
: LGTM! Include directive is appropriately placed.The addition of
<clp/ir/types.hpp>
provides the necessary type definitions.
78-79
: LGTM! Method declaration is consistent.The method signature matches the base class and follows the same pattern as UnstructuredIrStreamReader.
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (2)
14-14
: LGTM! Include directive is appropriately placed.The addition of
<clp/ir/types.hpp>
provides the necessary type definitions.
151-158
: LGTM! Implementation is clean and efficient.The implementation properly delegates to the generic function, promoting code reuse between different reader types.
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (1)
161-168
: LGTM! Implementation is consistent with StructuredIrStreamReader.The implementation properly uses the generic function, ensuring consistent behaviour across different reader types.
src/clp_ffi_js/ir/StreamReader.cpp (1)
132-132
: LGTM! Type registration and function binding are properly implemented.The new type registration and function binding follow the established patterns in the codebase.
Also applies to: 149-153
src/clp_ffi_js/ir/StreamReader.hpp (3)
14-14
: LGTM! Include and type declaration are properly placed.The additions follow the established patterns in the codebase.
Also applies to: 33-33
128-137
: LGTM! Documentation is clear and comprehensive.The documentation clearly explains the method's purpose, parameters, and return values.
313-331
: LGTM! Implementation is efficient and handles edge cases properly.The implementation:
- Correctly handles empty log events
- Uses std::upper_bound for efficient binary search
- Properly handles the case when all timestamps are larger than the target
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/clp_ffi_js/ir/StreamReader.hpp (1)
312-312
: Use Yoda condition for null check.According to the coding guidelines, prefer
false == <expression>
over!<expression>
.Apply this diff:
- if (log_events.empty()) { + if (true == log_events.empty()) {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/clp_ffi_js/ir/StreamReader.hpp
(5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/clp_ffi_js/ir/StreamReader.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: lint (ubuntu-latest)
- GitHub Check: lint (macos-latest)
🔇 Additional comments (5)
src/clp_ffi_js/ir/StreamReader.hpp (5)
33-33
: LGTM! Type declaration follows established pattern.The declaration of
LogEventIdxTsType
aligns with other type declarations in the file and properly integrates with Emscripten's type system.
128-137
: LGTM! Well-documented method declaration.The documentation thoroughly explains all possible return scenarios and the method signature follows C++ best practices with appropriate use of [[nodiscard]].
138-139
: Fix method declaration formatting.The method declaration needs to be reformatted according to the linting rules.
Apply this diff:
- [[nodiscard]] virtual auto get_log_event_index_by_timestamp(clp::ir::epoch_time_ms_t timestamp - ) -> LogEventIdxTsType = 0; + [[nodiscard]] virtual auto get_log_event_index_by_timestamp( + clp::ir::epoch_time_ms_t timestamp + ) -> LogEventIdxTsType = 0;
312-314
: LGTM! Efficient implementation with proper edge case handling.The implementation efficiently uses binary search via std::upper_bound and properly handles all edge cases:
- Empty log events return null
- No exact match returns the last smaller index
- All larger timestamps return the first index
Also applies to: 315-322, 323-329
300-307
: Fix requires clause formatting.The requires clause needs to be reformatted according to the linting rules.
Apply this diff:
-requires requires( - LogEventWithFilterData<LogEvent> const& event, - clp::ir::epoch_time_ms_t timestamp - ) { + requires requires(LogEventWithFilterData<LogEvent> const& event, + clp::ir::epoch_time_ms_t timestamp) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/clp_ffi_js/ir/StreamReader.hpp (2)
49-57
: Add documentation for the GetLogEventIdxInterface concept.The concept is well-defined, but it would benefit from documentation explaining its purpose and requirements.
Add a documentation block before the concept:
+/** + * Concept defining the requirements for log event types that can be searched by timestamp. + * Types satisfying this concept must provide a get_timestamp method that returns a value + * convertible to epoch_time_ms_t. + */ template <typename LogEvent> concept GetLogEventIdxInterface = requires(
299-325
: Consider optimizing and documenting the binary search implementation.While the implementation is correct, consider these improvements:
- Add a comment explaining the binary search strategy
- Use
std::prev
for cleaner iterator manipulationApply this diff to improve the implementation:
template <GetLogEventIdxInterface LogEvent> auto StreamReader::generic_get_log_event_idx_by_timestamp( LogEvents<LogEvent> const& log_events, clp::ir::epoch_time_ms_t timestamp ) -> LogEventIdxTsType { if (log_events.empty()) { return LogEventIdxTsType{emscripten::val::null()}; } + // Use binary search to find the first element greater than the target timestamp auto upper{std::upper_bound( log_events.begin(), log_events.end(), timestamp, [](clp::ir::epoch_time_ms_t ts, LogEventWithFilterData<LogEvent> const& log_event) { return ts < log_event.get_timestamp(); } )}; if (upper == log_events.begin()) { return LogEventIdxTsType{emscripten::val(0)}; } - auto const upper_index{std::distance(log_events.begin(), upper)}; - auto const index{upper_index - 1}; + // Return the index of the previous element (last element <= timestamp) + auto const index{std::distance(log_events.begin(), std::prev(upper))}; return LogEventIdxTsType{emscripten::val(index)}; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/clp_ffi_js/ir/StreamReader.cpp
(2 hunks)src/clp_ffi_js/ir/StreamReader.hpp
(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/clp_ffi_js/ir/StreamReader.cpp
🧰 Additional context used
📓 Path-based instructions (1)
src/clp_ffi_js/ir/StreamReader.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: lint (ubuntu-latest)
- GitHub Check: lint (macos-latest)
🔇 Additional comments (2)
src/clp_ffi_js/ir/StreamReader.hpp (2)
14-14
: LGTM! Type declarations and includes are well-organized.The new include and type declaration follow the existing patterns in the codebase.
Also applies to: 33-33
138-149
: LGTM! Well-documented virtual method declaration.The method signature and documentation are clear and comprehensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/clp_ffi_js/ir/StreamReader.hpp (1)
298-305
: Consider optimizing the binary search predicate.The current implementation creates a lambda for each comparison. Consider using a member function pointer or functor for potentially better performance:
- auto upper{std::upper_bound( - log_events.begin(), - log_events.end(), - timestamp, - [](clp::ir::epoch_time_ms_t ts, LogEventWithFilterData<LogEvent> const& log_event) { - return ts < log_event.get_timestamp(); - } - )}; + struct TimestampCompare { + bool operator()(clp::ir::epoch_time_ms_t ts, + LogEventWithFilterData<LogEvent> const& log_event) const { + return ts < log_event.get_timestamp(); + } + }; + auto upper{std::upper_bound( + log_events.begin(), + log_events.end(), + timestamp, + TimestampCompare{} + )};
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/clp_ffi_js/ir/StreamReader.hpp
(5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/clp_ffi_js/ir/StreamReader.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: lint (ubuntu-latest)
- GitHub Check: lint (macos-latest)
🔇 Additional comments (3)
src/clp_ffi_js/ir/StreamReader.hpp (3)
33-33
: LGTM! Type declaration for LogEventIdxTsType.The declaration is necessary for exposing the timestamp-based search result type to JavaScript.
128-139
: LGTM! Virtual method declaration is well-documented.The pure virtual method declaration follows the class's convention, and the documentation clearly explains all possible return values and edge cases.
311-314
: LGTM! Index calculation is correct and follows best practices.The implementation properly calculates the index using std::distance and maintains const correctness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I added a second review and it didn't get posted. Here is it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/clp_ffi_js/ir/StreamReader.hpp (3)
128-139
: Consider renaming the parameter for consistency.The parameter name
timestamp
could be renamed totarget_ts
to maintain consistency with the documentation style used throughout the codebase.- [[nodiscard]] virtual auto get_log_event_idx_by_timestamp(clp::ir::epoch_time_ms_t timestamp + [[nodiscard]] virtual auto get_log_event_idx_by_timestamp(clp::ir::epoch_time_ms_t target_ts ) -> LogEventIdxTsType = 0;
190-201
: Enhance documentation for the templated implementation.Consider referencing the virtual method's documentation to avoid duplication and potential inconsistencies.
/** * Templated implementation of `get_log_event_idx_by_timestamp`. * * @tparam LogEvent * @param timestamp - * @return the best matched log event index. + * @return See `get_log_event_idx_by_timestamp`. */
289-314
: Improve variable naming for better readability.The variable names could be more descriptive to better convey their purpose:
first_greater_it
could beupper_bound_it
first_greater_idx
could beupper_bound_idx
These names would better align with the standard library's terminology.
- auto first_greater_it{std::upper_bound( + auto upper_bound_it{std::upper_bound( log_events.begin(), log_events.end(), timestamp, [](clp::ir::epoch_time_ms_t ts, LogEventWithFilterData<LogEvent> const& log_event) { return ts < log_event.get_timestamp(); } )}; - if (first_greater_it == log_events.begin()) { + if (upper_bound_it == log_events.begin()) { return LogEventIdxTsType{emscripten::val(0)}; } - auto const first_greater_idx{std::distance(log_events.begin(), first_greater_it)}; + auto const upper_bound_idx{std::distance(log_events.begin(), upper_bound_it)}; - return LogEventIdxTsType{emscripten::val(first_greater_idx - 1)}; + return LogEventIdxTsType{emscripten::val(upper_bound_idx - 1)};
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/clp_ffi_js/ir/StreamReader.hpp
(5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/clp_ffi_js/ir/StreamReader.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: lint (ubuntu-latest)
- GitHub Check: lint (macos-latest)
🔇 Additional comments (1)
src/clp_ffi_js/ir/StreamReader.hpp (1)
33-33
: LGTM! Type declaration follows established patterns.The new type declaration for
LogEventIdxTsType
follows the existing pattern and naming convention used for other JavaScript interop types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments on last review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/clp_ffi_js/ir/StreamReader.hpp (1)
298-305
: Add comments explaining the binary search strategy.While the implementation using
std::upper_bound
is efficient, it would benefit from a brief comment explaining why this approach was chosen and how it achieves the desired result.Add a comment before the
upper_bound
call:+ // Use binary search to find the first element greater than the target timestamp. + // This allows us to step back one position to find the last element less than or equal to the target. auto first_greater_it{std::upper_bound(
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/clp_ffi_js/ir/StreamReader.hpp
(5 hunks)src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
(2 hunks)src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp
- src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
🧰 Additional context used
📓 Path-based instructions (1)
src/clp_ffi_js/ir/StreamReader.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
🔇 Additional comments (2)
src/clp_ffi_js/ir/StreamReader.hpp (2)
33-33
: LGTM! Type declaration follows established patterns.The declaration of
LogEventIdxTsType
is appropriately placed with other output types and follows the existing pattern for TypeScript type declarations.
128-138
: LGTM! Well-documented virtual method with clear behaviour specification.The method signature and documentation follow best practices. The use of
epoch_time_ms_t
for timestamp handling is appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
CMakeLists.txt
(1 hunks)src/clp_ffi_js/ir/StreamReader.cpp
(2 hunks)src/clp_ffi_js/ir/StreamReader.hpp
(5 hunks)src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
(2 hunks)src/clp_ffi_js/ir/StructuredIrStreamReader.hpp
(2 hunks)src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp
(1 hunks)src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp
- src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
- src/clp_ffi_js/ir/StructuredIrStreamReader.hpp
🧰 Additional context used
📓 Path-based instructions (3)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/clp_ffi_js/ir/StreamReader.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/clp_ffi_js/ir/StreamReader.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: lint (ubuntu-latest)
- GitHub Check: lint (macos-latest)
🔇 Additional comments (7)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1)
74-75
: LGTM!The method signature is correct and properly marked with
[[nodiscard]]
andoverride
attributes.src/clp_ffi_js/ir/StreamReader.cpp (2)
132-132
: Consider using std::optional with register_optional().For better type safety and consistency, consider using
std::optional<T>
with Embind'sregister_optional()
function instead of custom nullable type definitions.This has been tracked in issue #51 for implementation in a future PR.
150-153
: LGTM!The function binding is correctly added and properly formatted.
src/clp_ffi_js/ir/StreamReader.hpp (4)
14-14
: LGTM!The include statement is correctly added.
33-33
: LGTM!The type declaration is correctly added.
128-138
: LGTM!The virtual method is well-documented with clear descriptions of parameters and return values.
289-314
: LGTM!The template implementation:
- Uses
std::upper_bound
for efficient searching.- Properly handles edge cases (empty container and all timestamps greater than target).
- Uses clear and descriptive variable names.
src/clp_ffi_js/ir/StreamReader.hpp
Outdated
@@ -29,6 +30,7 @@ EMSCRIPTEN_DECLARE_VAL_TYPE(ReaderOptions); | |||
// JS types used as outputs | |||
EMSCRIPTEN_DECLARE_VAL_TYPE(DecodedResultsTsType); | |||
EMSCRIPTEN_DECLARE_VAL_TYPE(FilteredLogEventMapTsType); | |||
EMSCRIPTEN_DECLARE_VAL_TYPE(LogEventIdxTsType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EMSCRIPTEN_DECLARE_VAL_TYPE(LogEventIdxTsType); | |
// How about `NullableLogEventIdx`? | |
EMSCRIPTEN_DECLARE_VAL_TYPE(LogEventIdxTsType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kirkrodrigues Should we also change the name of FilteredLogEventMapTsType
later? It is also nullable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do prefer that, but what do @davemarco / @junhaoliao think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally even prefer making the change here, as it is just a internal type name change and won't affect any front end code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, isn't FilteredLogEventIdx
also defined in the Log Viewer? It's true that the name in clp-ffi-js doesn't influence (from a code execution perspective) the name in the log viewer, but it's probably better for developers if we stay consistent. So if we rename it here, we should probably rename it in the log viewer as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't change the filteredEventMap. Whether nullable in the name is better I think is a question of style. Like in general i think its better not to put type info in the name, but here I guess maybe exception since we are trying to make more clear than returns null.
return LogEventIdxTsType{emscripten::val::null()}; | ||
} | ||
|
||
auto first_greater_it{std::upper_bound( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::upper_bound assumes the range is sorted, right? What would happen if the log events aren't in ascending timestamp order? It's technically not impossible and we have seen log files where that's true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@junhaoliao iirc, I was told to assume the log events are in accending order. If that's not the case, this whole search function won't work.
If that's not the case, we have to sort all of the log events, cuz' that's the only way to fully garantee all log events are sorted. And we (might?) also need a map to remember their original location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the case with out of order timestamps is rare (as far as we know), we could maybe do something like:
- When decoding the file, keep track of whether the timestamp of two consecutive log events are out of order and if so, set a flag and display a warning to the user that the file contains out of order timestamps and so timestamp-related searches may not function correctly (since two log events with the same timestamp may exist at completely different positions in the log file).
- If that flag is set, we use a brute-force implementation of
find_nearest_log_event_by_timestamp
. Otherwise, we use the implementation you have here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- When decoding the file, keep track of whether the timestamp of two consecutive log events are out of order
"keep track of" means we are basically iterating the log events during the decode right?
if that's the case, can we divide the log events up to chunks using these timestamp outliers, perform searches in the proper chunk, and finally iterate those outliers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would that be straightforward to do given that the log events collection changes based on the active filter? If not, then given how rare the case is (as far as we know), I'm not sure the extra code complexity is worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In an offline discussion with @junhaoliao, we reach to an agreement that we can doc the assumption, and neglact this corner case.
By neglacting this corner case, we can save plenty of resource from verifying log events are in chronological order.
Sorry for the delay in reviewing this. Let's keep further discussions in the PR so that we can all come to a consensus more quickly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
src/clp_ffi_js/ir/StreamReader.cpp
(2 hunks)src/clp_ffi_js/ir/StreamReader.hpp
(5 hunks)src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
(2 hunks)src/clp_ffi_js/ir/StructuredIrStreamReader.hpp
(2 hunks)src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp
(1 hunks)src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp
- src/clp_ffi_js/ir/StructuredIrStreamReader.hpp
- src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp
🧰 Additional context used
📓 Path-based instructions (3)
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/clp_ffi_js/ir/StreamReader.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/clp_ffi_js/ir/StreamReader.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: lint (ubuntu-latest)
- GitHub Check: lint (macos-latest)
🔇 Additional comments (7)
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (2)
14-14
: LGTM!The include is necessary for the
epoch_time_ms_t
type used in the new method.
151-155
: LGTM!The implementation correctly delegates to the generic template function, maintaining consistency with other similar methods in the class.
src/clp_ffi_js/ir/StreamReader.cpp (2)
132-132
: LGTM!The type registration follows the established pattern for nullable types.
150-153
: LGTM!The function binding is properly placed and follows the naming convention used in other bindings.
src/clp_ffi_js/ir/StreamReader.hpp (3)
14-14
: LGTM!The include is necessary for the
epoch_time_ms_t
type used throughout the file.
33-33
: LGTM!The type declaration follows the established pattern for nullable types.
129-145
: LGTM!The method declaration is well documented with clear examples and edge cases.
src/clp_ffi_js/ir/StreamReader.hpp
Outdated
template <typename LogEvent> | ||
auto StreamReader::generic_find_nearest_log_event_idx_by_timestamp( | ||
LogEvents<LogEvent> const& log_events, | ||
clp::ir::epoch_time_ms_t target_ts | ||
) -> NullableLogEventIdx { | ||
if (log_events.empty()) { | ||
return NullableLogEventIdx{emscripten::val::null()}; | ||
} | ||
|
||
// Find the log event whose timestamp is just after `target_ts` | ||
auto first_greater_it{std::upper_bound( | ||
log_events.begin(), | ||
log_events.end(), | ||
target_ts, | ||
[](clp::ir::epoch_time_ms_t ts, LogEventWithFilterData<LogEvent> const& log_event) { | ||
return ts < log_event.get_timestamp(); | ||
} | ||
)}; | ||
|
||
if (first_greater_it == log_events.begin()) { | ||
return NullableLogEventIdx{emscripten::val(0)}; | ||
} | ||
|
||
auto const first_greater_idx{std::distance(log_events.begin(), first_greater_it)}; | ||
|
||
return NullableLogEventIdx{emscripten::val(first_greater_idx - 1)}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Timestamp Ordering Not Verified
- The function in
StreamReader.hpp
employsstd::upper_bound
with a lambda that compares timestamps, which only works correctly when the log events are sorted in ascending order. - The various
get_timestamp
implementations simply return their internal timestamp (e.g. fromm_timestamp
inLogEventWithFilterData.hpp
), and there is no evidence of any explicit sort operation in the codebase. - Instead, the log events appear to be appended (for example, via
emplace_back
inStructuredIrUnitHandler.cpp
), meaning the natural order is assumed to be chronologically correct rather than being enforced programmatically.
🔗 Analysis chain
Verify timestamp ordering assumption.
The implementation assumes log events are sorted by timestamp in ascending order. This assumption should be verified as log files might not always maintain this order.
Run this script to check if timestamps are always in ascending order:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if log events are always sorted by timestamp
# Expected: No output means timestamps are in order
ast-grep --pattern $'class $_ {
$$$
get_timestamp() {
$$$
}
$$$
}'
rg -A 5 "get_timestamp"
Length of output: 4283
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
src/clp_ffi_js/ir/StreamReader.cpp
(2 hunks)src/clp_ffi_js/ir/StreamReader.hpp
(5 hunks)src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
(2 hunks)src/clp_ffi_js/ir/StructuredIrStreamReader.hpp
(2 hunks)src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp
(1 hunks)src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
- src/clp_ffi_js/ir/StructuredIrStreamReader.hpp
- src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp
- src/clp_ffi_js/ir/StreamReader.cpp
🧰 Additional context used
📓 Path-based instructions (2)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/clp_ffi_js/ir/StreamReader.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: lint (ubuntu-latest)
- GitHub Check: lint (macos-latest)
🔇 Additional comments (4)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (1)
161-165
: LGTM!The implementation correctly delegates to the generic implementation and follows the single responsibility principle.
src/clp_ffi_js/ir/StreamReader.hpp (3)
33-33
: LGTM!The type name clearly indicates its purpose and nullability, following the pattern used for other type declarations.
129-146
: LGTM!The method declaration is well-documented with clear explanation of its behaviour, correctly marked as virtual and nodiscard, and uses consistent parameter naming.
196-209
: LGTM!The templated method declaration is well-structured with appropriate documentation and parameter types.
template <typename LogEvent> | ||
auto StreamReader::generic_find_nearest_log_event_by_timestamp( | ||
LogEvents<LogEvent> const& log_events, | ||
clp::ir::epoch_time_ms_t target_ts | ||
) -> NullableLogEventIdx { | ||
if (log_events.empty()) { | ||
return NullableLogEventIdx{emscripten::val::null()}; | ||
} | ||
|
||
// Find the log event whose timestamp is just after `target_ts` | ||
auto first_greater_it{std::upper_bound( | ||
log_events.begin(), | ||
log_events.end(), | ||
target_ts, | ||
[](clp::ir::epoch_time_ms_t ts, LogEventWithFilterData<LogEvent> const& log_event) { | ||
return ts < log_event.get_timestamp(); | ||
} | ||
)}; | ||
|
||
if (first_greater_it == log_events.begin()) { | ||
return NullableLogEventIdx{emscripten::val(0)}; | ||
} | ||
|
||
auto const first_greater_idx{std::distance(log_events.begin(), first_greater_it)}; | ||
|
||
return NullableLogEventIdx{emscripten::val(first_greater_idx - 1)}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Attention: Log events ordering may not be enforced.
Our investigation shows that the implementation of generic_find_nearest_log_event_by_timestamp
relies on log events being in ascending timestamp order, yet searches for sorting or ordering logic (e.g. explicit calls to std::sort
or other ordering mechanisms) produced no evidence to enforce this order during deserialization or insertion. This could potentially lead to incorrect log event lookups if the input sequence isn’t already sorted.
- The log events appear to be added via methods like
emplace_back
without explicit ordering. - No ordering logic (e.g. sorting function calls) was identified to confirm that the events are sorted as required.
🔗 Analysis chain
Verify timestamp ordering assumption.
The implementation assumes log events are sorted by timestamp in ascending order. This assumption should be verified as log files might not always maintain this order.
Run this script to check if timestamps are always in ascending order:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if timestamps are in ascending order during log event deserialization
# Expected: Find all places where timestamps are added to log events
# Search for timestamp assignments during deserialization
ast-grep --pattern $'deserialize_log_event() {
$$$
get_timestamp()
$$$
}'
# Search for places where log events are added to collections
rg -A 5 "emplace_back.*LogEvent"
Length of output: 115
Script:
#!/bin/bash
# Investigate ordering logic for log events in the codebase
# 1. Search for std::sort invocations that might enforce timestamp ordering
rg -n "std::sort\s*\(.*get_timestamp" --color=always
# 2. Locate deserialization functions potentially responsible for creating or ordering log events
rg -n "deserialize_log_event\s*\(" --color=always
# 3. Find where log events are inserted into collections (e.g. using emplace_back) to inspect their ordering
rg -n "emplace_back\s*\(.*LogEvent" --color=always
# 4. Additionally, search for any comparison operators or functions comparing timestamps
rg -n "get_timestamp\s*\(" --color=always
Length of output: 1721
…nction ONLY works with timestamps in chronological order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/clp_ffi_js/ir/StreamReader.hpp (1)
196-322
: Consider improving variable names for better readability.The implementation is efficient and handles edge cases well. However, consider these variable name improvements:
first_greater_it
could beupper_bound_it
to match STL terminologyfirst_greater_idx
could beupper_bound_idx
for consistencyApply this diff to improve readability:
- auto first_greater_it{std::upper_bound( + auto upper_bound_it{std::upper_bound( - if (first_greater_it == log_events.begin()) { + if (upper_bound_it == log_events.begin()) { - auto const first_greater_idx{std::distance(log_events.begin(), first_greater_it)}; + auto const upper_bound_idx{std::distance(log_events.begin(), upper_bound_it)}; - return NullableLogEventIdx{emscripten::val(first_greater_idx - 1)}; + return NullableLogEventIdx{emscripten::val(upper_bound_idx - 1)};
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/clp_ffi_js/ir/StreamReader.hpp
(5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/clp_ffi_js/ir/StreamReader.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: lint (ubuntu-latest)
- GitHub Check: lint (macos-latest)
🔇 Additional comments (2)
src/clp_ffi_js/ir/StreamReader.hpp (2)
33-33
: LGTM!The type declaration is well-placed and follows the established naming pattern for output types.
129-145
: LGTM with a verification note.The method declaration is well-documented and consistent with the class's style. The documentation clearly states the chronological order assumption.
Let's verify if the chronological order assumption is documented elsewhere in the codebase:
✅ Verification successful
I'm going to extend our search beyond markdown files to check for any references to a "chronological order" documentation anywhere in the repository. Please run the script below:
This will help verify whether the assumption is documented elsewhere. Thank you.
LGTM – Chronological order assumption verified.
The search confirms that the chronological ordering is documented within the method’s comment in
src/clp_ffi_js/ir/StreamReader.hpp
. No other contradictory or additional documentation was found.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Search for documentation about log event ordering # Expected: Find documentation about chronological order assumption rg -i "chronological|timestamp.*order|sorted.*timestamp" --type mdLength of output: 68
Script:
#!/bin/bash # Search for any reference to "chronological order" in all files across the repository. rg -i "chronological order" --max-columns 200Length of output: 180
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the PR title, how about:
feat: Add support for finding the log event that's closest to a target timestamp.
* - the collection of log events is sorted in chronological order, or the search won't work; | ||
* - and we insert a marker log event, M, with timestamp `target_ts` into the collection (if log | ||
* events with timestamp `target_ts` already exist in the collection, M should be inserted | ||
* after them). | ||
* | ||
* L is the event just before M, if M is not the first event in the collection; otherwise L is | ||
* the event just after M. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* - the collection of log events is sorted in chronological order, or the search won't work; | |
* - and we insert a marker log event, M, with timestamp `target_ts` into the collection (if log | |
* events with timestamp `target_ts` already exist in the collection, M should be inserted | |
* after them). | |
* | |
* L is the event just before M, if M is not the first event in the collection; otherwise L is | |
* the event just after M. | |
* - the collection of log events is sorted in chronological order; | |
* - and we insert a marker log event, M, with timestamp `target_ts` into the collection (if log | |
* events with timestamp `target_ts` already exist in the collection, M should be inserted | |
* after them). | |
* | |
* L is the event just before M, if M is not the first event in the collection; otherwise L is | |
* the event just after M. | |
* | |
* NOTE: If the collection of log events isn't in chronological order, this method has undefined | |
* behaviour. |
Description
Adds
getLogEventIndexByTimestamp
API inclp_ffi_js/ir/StreamReader.cpp
for ClpStreamReader. It returnsnull
only if there are no log events. Otherwise, it would search logs in "best effort", which means it returns the last index smaller than the timestamp if no exact timestamp match, unless all log event timestamps are larger than the target. In that case, return the first log event index.In
clp_ffi_js/ir/StreamReader.hpp
, implementsgeneric_get_log_event_index_by_timestamp()
, the core logic (basically just binary search) for finding log events by timestamp.This function is called both by
StructuredIrStreamReader
andUnstructuredIrStreamReader
'sget_log_event_index_by_timestamp
.Validation performed
Use the following javascript for testing structured and unstructured logs:
irv2 logs generating scripts:
Summary by CodeRabbit