Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support search logs by timestamp for structured and unstructured logs. #42

Open
wants to merge 33 commits into
base: main
Choose a base branch
from

Conversation

Henry8192
Copy link
Collaborator

@Henry8192 Henry8192 commented Dec 19, 2024

Description

Adds getLogEventIndexByTimestamp API in clp_ffi_js/ir/StreamReader.cpp for ClpStreamReader. It returns null only if there are no log events. Otherwise, it would search logs in "best effort", which means it returns the last index smaller than the timestamp if no exact timestamp match, unless all log event timestamps are larger than the target. In that case, return the first log event index.

In clp_ffi_js/ir/StreamReader.hpp, implements generic_get_log_event_index_by_timestamp(), the core logic (basically just binary search) for finding log events by timestamp.
This function is called both by StructuredIrStreamReader and UnstructuredIrStreamReader's get_log_event_index_by_timestamp.

Validation performed

Use the following javascript for testing structured and unstructured logs:

import ModuleInit from "./cmake-build-debug/ClpFfiJs-node.js"
import fs from "node:fs"

const main = async () => {
    const file = fs.readFileSync("./test-1K.clp.zst")

    console.time("perf")
    const Module = await ModuleInit()
    try {
        const decoder = new Module.ClpStreamReader(new Uint8Array(file), {logLevelKey: "$log_level", timestampKey: "$timestamp"})
        console.log("type:", decoder.getIrStreamType() === Module.IrStreamType.STRUCTURED ? "structured" : "unstructured")
        const numEvents = decoder.deserializeStream()
        console.log(numEvents)
        const results = decoder.decodeRange(0, numEvents, false)
        console.log(results)
        const logEventIdx = decoder.getLogEventIndexByTimestamp(1736183181284)
        console.log(logEventIdx)
    } catch (e) {
        console.error("Exception caught:", e.stack)
    }
    console.timeEnd("perf")
}

void main()

irv2 logs generating scripts:

"""
To install all dependencies:
    pip install clp-ffi-py==0.1.0b1 msgpack nabg zstandard
TODO: set random seed based on `i` so that all logs generated are deterministic.
"""
import random
import time

import msgpack
import nabg
import zstandard as zstd
from clp_ffi_py.ir.native import Serializer

NUM_EVENTS_TO_GENERATE = 1_000
FILE_NAME = "test-1K.clp.zst"

LOG_LEVELS = ["TRACE", 'DEBUG', 'INFO', 'WARN', 'ERROR', 'FATAL', "SOME_UNKNOWN_LEVEL"]

EMOJI_RANGES = [
    (0x1F600, 0x1F64F),  # Emoticons
    (0x1F300, 0x1F5FF),  # Miscellaneous Symbols and Pictographs
    (0x1F680, 0x1F6FF),  # Transport and Map Symbols
    (0x1F900, 0x1F9FF),  # Supplemental Symbols and Pictographs
    (0x2600, 0x26FF),  # Miscellaneous Symbols
    (0x1F1E6, 0x1F1FF),  # Flags
]
EMOJI_POOL = [chr(i) for r in EMOJI_RANGES for i in range(r[0], r[1] + 1)]

NUM_MILLIS_IN_SECOND = 1000


def generate_random_emoji():
    return random.choice(EMOJI_POOL)


def generate_random_message():
    return (
            generate_random_emoji() +
            nabg.ionize() +
            generate_random_emoji()
    )


cctx = zstd.ZstdCompressor(threads=2)

with open(FILE_NAME, "wb") as f:
    with cctx.stream_writer(f) as w:
        with Serializer(w) as s:
            for i in range(NUM_EVENTS_TO_GENERATE):
                random.seed(i)
                log_event = msgpack.dumps({
                    "$log_level": LOG_LEVELS[i % len(LOG_LEVELS)],
                    "$timestamp": int(time.time() * NUM_MILLIS_IN_SECOND),
                    "message": generate_random_message()
                })
                s.serialize_log_event_from_msgpack_map(log_event)

Summary by CodeRabbit

  • New Features
    • Introduced a capability to locate the nearest log event index based on a specified timestamp across key streaming components.
    • Enhanced log event indexing with a new flexible data type that supports both numerical values and the absence of a value, ensuring more robust and efficient data handling.

Copy link

coderabbitai bot commented Dec 19, 2024

Walkthrough

This pull request introduces new methods across several log stream reader classes to retrieve the nearest log event index based on a specified timestamp. The method findNearestLogEventByTimestamp is added to the bindings for StreamReader, StructuredIrStreamReader, and UnstructuredIrStreamReader. A new type alias, NullableLogEventIdx (defined as number | null in JavaScript), is also registered. Additionally, a templated helper function for efficient searching is introduced, while existing functionalities remain unchanged.

Changes

File(s) Change Summary
src/clp_ffi_js/ir/StreamReader.cpp Added binding for findNearestLogEventByTimestamp and registered the new type NullableLogEventIdx (`number
src/clp_ffi_js/ir/StreamReader.hpp Declared the new virtual method find_nearest_log_event_by_timestamp, introduced a templated generic_find_nearest_log_event_by_timestamp, and defined the type alias NullableLogEventIdx.
src/clp_ffi_js/ir/StructuredIrStreamReader.{cpp, hpp} Added the new method find_nearest_log_event_by_timestamp in both source and header files; updated include directives to support new type definitions.
src/clp_ffi_js/ir/UnstructuredIrStreamReader.{cpp, hpp} Added the new method find_nearest_log_event_by_timestamp that calls generic_find_nearest_log_event_by_timestamp using m_encoded_log_events.

Sequence Diagram(s)

sequenceDiagram
    participant JS as JavaScript Interface
    participant SR as StreamReader/StructuredIrStreamReader/UnstructuredIrStreamReader
    participant GF as Generic Finder
    JS->>SR: findNearestLogEventByTimestamp(timestamp)
    SR->>GF: generic_find_nearest_log_event_by_timestamp(log_events, timestamp)
    GF-->>SR: NullableLogEventIdx
    SR-->>JS: NullableLogEventIdx
Loading

Possibly related PRs

Suggested reviewers

  • junhaoliao
  • davemarco
  • kirkrodrigues

Tip

🌐 Web search-backed reviews and chat
  • We have enabled web search-based reviews and chat for all users. This feature allows CodeRabbit to access the latest documentation and information on the web.
  • You can disable this feature by setting web_search: false in the knowledge_base settings.
  • Please share any feedback in the Discord discussion.
✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Member

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future commits, you may follow https://github.com/y-scope/clp-ffi-js?tab=readme-ov-file#linting to run the linter

src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.cpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp Outdated Show resolved Hide resolved
@junhaoliao
Copy link
Member

This PR is blocking y-scope/yscope-log-viewer#152

src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
@Henry8192 Henry8192 marked this pull request as ready for review January 6, 2025 21:42
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 32b1220 and 5221588.

📒 Files selected for processing (6)
  • src/clp_ffi_js/ir/StreamReader.cpp (2 hunks)
  • src/clp_ffi_js/ir/StreamReader.hpp (6 hunks)
  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (2 hunks)
  • src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (2 hunks)
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (1 hunks)
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/clp_ffi_js/ir/StreamReader.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/clp_ffi_js/ir/StreamReader.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

🪛 GitHub Actions: lint
src/clp_ffi_js/ir/StreamReader.hpp

[error] 65-66: Code formatting violation: Static create method declaration is not properly formatted


[error] 126-126: Code formatting violation: DecodedResultsTsType return type declaration is not properly formatted


[error] 138-139: Code formatting violation: get_log_event_index_by_timestamp method declaration is not properly formatted


[error] 201-203: Code formatting violation: requires clause and its parameters are not properly formatted


[error] 301-303: Code formatting violation: requires clause and its parameters are not properly formatted

🔇 Additional comments (10)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1)

74-75: LGTM! Method declaration is well-defined.

The method signature is consistent with the base class and follows proper C++ practices with the [[nodiscard]] attribute.

src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (2)

11-11: LGTM! Include directive is appropriately placed.

The addition of <clp/ir/types.hpp> provides the necessary type definitions.


78-79: LGTM! Method declaration is consistent.

The method signature matches the base class and follows the same pattern as UnstructuredIrStreamReader.

src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (2)

14-14: LGTM! Include directive is appropriately placed.

The addition of <clp/ir/types.hpp> provides the necessary type definitions.


151-158: LGTM! Implementation is clean and efficient.

The implementation properly delegates to the generic function, promoting code reuse between different reader types.

src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (1)

161-168: LGTM! Implementation is consistent with StructuredIrStreamReader.

The implementation properly uses the generic function, ensuring consistent behaviour across different reader types.

src/clp_ffi_js/ir/StreamReader.cpp (1)

132-132: LGTM! Type registration and function binding are properly implemented.

The new type registration and function binding follow the established patterns in the codebase.

Also applies to: 149-153

src/clp_ffi_js/ir/StreamReader.hpp (3)

14-14: LGTM! Include and type declaration are properly placed.

The additions follow the established patterns in the codebase.

Also applies to: 33-33


128-137: LGTM! Documentation is clear and comprehensive.

The documentation clearly explains the method's purpose, parameters, and return values.


313-331: LGTM! Implementation is efficient and handles edge cases properly.

The implementation:

  • Correctly handles empty log events
  • Uses std::upper_bound for efficient binary search
  • Properly handles the case when all timestamps are larger than the target

src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/clp_ffi_js/ir/StreamReader.hpp (1)

312-312: Use Yoda condition for null check.

According to the coding guidelines, prefer false == <expression> over !<expression>.

Apply this diff:

-    if (log_events.empty()) {
+    if (true == log_events.empty()) {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5221588 and abeb4f8.

📒 Files selected for processing (1)
  • src/clp_ffi_js/ir/StreamReader.hpp (5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/clp_ffi_js/ir/StreamReader.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: lint (ubuntu-latest)
  • GitHub Check: lint (macos-latest)
🔇 Additional comments (5)
src/clp_ffi_js/ir/StreamReader.hpp (5)

33-33: LGTM! Type declaration follows established pattern.

The declaration of LogEventIdxTsType aligns with other type declarations in the file and properly integrates with Emscripten's type system.


128-137: LGTM! Well-documented method declaration.

The documentation thoroughly explains all possible return scenarios and the method signature follows C++ best practices with appropriate use of [[nodiscard]].


138-139: Fix method declaration formatting.

The method declaration needs to be reformatted according to the linting rules.

Apply this diff:

-    [[nodiscard]] virtual auto get_log_event_index_by_timestamp(clp::ir::epoch_time_ms_t timestamp
-    ) -> LogEventIdxTsType = 0;
+    [[nodiscard]] virtual auto get_log_event_index_by_timestamp(
+            clp::ir::epoch_time_ms_t timestamp
+    ) -> LogEventIdxTsType = 0;

312-314: LGTM! Efficient implementation with proper edge case handling.

The implementation efficiently uses binary search via std::upper_bound and properly handles all edge cases:

  • Empty log events return null
  • No exact match returns the last smaller index
  • All larger timestamps return the first index

Also applies to: 315-322, 323-329


300-307: Fix requires clause formatting.

The requires clause needs to be reformatted according to the linting rules.

Apply this diff:

-requires requires(
-                 LogEventWithFilterData<LogEvent> const& event,
-                 clp::ir::epoch_time_ms_t timestamp
-         ) {
+    requires requires(LogEventWithFilterData<LogEvent> const& event,
+                     clp::ir::epoch_time_ms_t timestamp) {

src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.hpp Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/clp_ffi_js/ir/StreamReader.hpp (2)

49-57: Add documentation for the GetLogEventIdxInterface concept.

The concept is well-defined, but it would benefit from documentation explaining its purpose and requirements.

Add a documentation block before the concept:

+/**
+ * Concept defining the requirements for log event types that can be searched by timestamp.
+ * Types satisfying this concept must provide a get_timestamp method that returns a value
+ * convertible to epoch_time_ms_t.
+ */
 template <typename LogEvent>
 concept GetLogEventIdxInterface = requires(

299-325: Consider optimizing and documenting the binary search implementation.

While the implementation is correct, consider these improvements:

  1. Add a comment explaining the binary search strategy
  2. Use std::prev for cleaner iterator manipulation

Apply this diff to improve the implementation:

 template <GetLogEventIdxInterface LogEvent>
 auto StreamReader::generic_get_log_event_idx_by_timestamp(
         LogEvents<LogEvent> const& log_events,
         clp::ir::epoch_time_ms_t timestamp
 ) -> LogEventIdxTsType {
     if (log_events.empty()) {
         return LogEventIdxTsType{emscripten::val::null()};
     }

+    // Use binary search to find the first element greater than the target timestamp
     auto upper{std::upper_bound(
             log_events.begin(),
             log_events.end(),
             timestamp,
             [](clp::ir::epoch_time_ms_t ts, LogEventWithFilterData<LogEvent> const& log_event) {
                 return ts < log_event.get_timestamp();
             }
     )};

     if (upper == log_events.begin()) {
         return LogEventIdxTsType{emscripten::val(0)};
     }

-    auto const upper_index{std::distance(log_events.begin(), upper)};
-    auto const index{upper_index - 1};
+    // Return the index of the previous element (last element <= timestamp)
+    auto const index{std::distance(log_events.begin(), std::prev(upper))};

     return LogEventIdxTsType{emscripten::val(index)};
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0e3e21b and 412b96e.

📒 Files selected for processing (2)
  • src/clp_ffi_js/ir/StreamReader.cpp (2 hunks)
  • src/clp_ffi_js/ir/StreamReader.hpp (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/clp_ffi_js/ir/StreamReader.cpp
🧰 Additional context used
📓 Path-based instructions (1)
src/clp_ffi_js/ir/StreamReader.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: lint (ubuntu-latest)
  • GitHub Check: lint (macos-latest)
🔇 Additional comments (2)
src/clp_ffi_js/ir/StreamReader.hpp (2)

14-14: LGTM! Type declarations and includes are well-organized.

The new include and type declaration follow the existing patterns in the codebase.

Also applies to: 33-33


138-149: LGTM! Well-documented virtual method declaration.

The method signature and documentation are clear and comprehensive.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/clp_ffi_js/ir/StreamReader.hpp (1)

298-305: Consider optimizing the binary search predicate.

The current implementation creates a lambda for each comparison. Consider using a member function pointer or functor for potentially better performance:

-    auto upper{std::upper_bound(
-            log_events.begin(),
-            log_events.end(),
-            timestamp,
-            [](clp::ir::epoch_time_ms_t ts, LogEventWithFilterData<LogEvent> const& log_event) {
-                return ts < log_event.get_timestamp();
-            }
-    )};
+    struct TimestampCompare {
+        bool operator()(clp::ir::epoch_time_ms_t ts, 
+                       LogEventWithFilterData<LogEvent> const& log_event) const {
+            return ts < log_event.get_timestamp();
+        }
+    };
+    auto upper{std::upper_bound(
+            log_events.begin(),
+            log_events.end(),
+            timestamp,
+            TimestampCompare{}
+    )};
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 412b96e and 54c7df1.

📒 Files selected for processing (1)
  • src/clp_ffi_js/ir/StreamReader.hpp (5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/clp_ffi_js/ir/StreamReader.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: lint (ubuntu-latest)
  • GitHub Check: lint (macos-latest)
🔇 Additional comments (3)
src/clp_ffi_js/ir/StreamReader.hpp (3)

33-33: LGTM! Type declaration for LogEventIdxTsType.

The declaration is necessary for exposing the timestamp-based search result type to JavaScript.


128-139: LGTM! Virtual method declaration is well-documented.

The pure virtual method declaration follows the class's convention, and the documentation clearly explains all possible return values and edge cases.


311-314: LGTM! Index calculation is correct and follows best practices.

The implementation properly calculates the index using std::distance and maintains const correctness.

@Henry8192 Henry8192 requested a review from davemarco January 13, 2025 17:55
Copy link
Contributor

@davemarco davemarco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I added a second review and it didn't get posted. Here is it

src/clp_ffi_js/ir/StreamReader.cpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp Outdated Show resolved Hide resolved
@Henry8192 Henry8192 requested a review from davemarco January 13, 2025 21:02
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
src/clp_ffi_js/ir/StreamReader.hpp (3)

128-139: Consider renaming the parameter for consistency.

The parameter name timestamp could be renamed to target_ts to maintain consistency with the documentation style used throughout the codebase.

-    [[nodiscard]] virtual auto get_log_event_idx_by_timestamp(clp::ir::epoch_time_ms_t timestamp
+    [[nodiscard]] virtual auto get_log_event_idx_by_timestamp(clp::ir::epoch_time_ms_t target_ts
     ) -> LogEventIdxTsType = 0;

190-201: Enhance documentation for the templated implementation.

Consider referencing the virtual method's documentation to avoid duplication and potential inconsistencies.

     /**
      * Templated implementation of `get_log_event_idx_by_timestamp`.
      *
      * @tparam LogEvent
      * @param timestamp
-     * @return the best matched log event index.
+     * @return See `get_log_event_idx_by_timestamp`.
      */

289-314: Improve variable naming for better readability.

The variable names could be more descriptive to better convey their purpose:

  • first_greater_it could be upper_bound_it
  • first_greater_idx could be upper_bound_idx

These names would better align with the standard library's terminology.

-    auto first_greater_it{std::upper_bound(
+    auto upper_bound_it{std::upper_bound(
             log_events.begin(),
             log_events.end(),
             timestamp,
             [](clp::ir::epoch_time_ms_t ts, LogEventWithFilterData<LogEvent> const& log_event) {
                 return ts < log_event.get_timestamp();
             }
     )};

-    if (first_greater_it == log_events.begin()) {
+    if (upper_bound_it == log_events.begin()) {
         return LogEventIdxTsType{emscripten::val(0)};
     }

-    auto const first_greater_idx{std::distance(log_events.begin(), first_greater_it)};
+    auto const upper_bound_idx{std::distance(log_events.begin(), upper_bound_it)};

-    return LogEventIdxTsType{emscripten::val(first_greater_idx - 1)};
+    return LogEventIdxTsType{emscripten::val(upper_bound_idx - 1)};
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 54c7df1 and eff1849.

📒 Files selected for processing (1)
  • src/clp_ffi_js/ir/StreamReader.hpp (5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/clp_ffi_js/ir/StreamReader.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: lint (ubuntu-latest)
  • GitHub Check: lint (macos-latest)
🔇 Additional comments (1)
src/clp_ffi_js/ir/StreamReader.hpp (1)

33-33: LGTM! Type declaration follows established patterns.

The new type declaration for LogEventIdxTsType follows the existing pattern and naming convention used for other JavaScript interop types.

Copy link
Contributor

@davemarco davemarco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments on last review

@Henry8192 Henry8192 requested a review from davemarco January 14, 2025 19:12
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/clp_ffi_js/ir/StreamReader.hpp (1)

298-305: Add comments explaining the binary search strategy.

While the implementation using std::upper_bound is efficient, it would benefit from a brief comment explaining why this approach was chosen and how it achieves the desired result.

Add a comment before the upper_bound call:

+    // Use binary search to find the first element greater than the target timestamp.
+    // This allows us to step back one position to find the last element less than or equal to the target.
     auto first_greater_it{std::upper_bound(
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eff1849 and 00f89d4.

📒 Files selected for processing (3)
  • src/clp_ffi_js/ir/StreamReader.hpp (5 hunks)
  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (2 hunks)
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp
  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
🧰 Additional context used
📓 Path-based instructions (1)
src/clp_ffi_js/ir/StreamReader.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

🔇 Additional comments (2)
src/clp_ffi_js/ir/StreamReader.hpp (2)

33-33: LGTM! Type declaration follows established patterns.

The declaration of LogEventIdxTsType is appropriately placed with other output types and follows the existing pattern for TypeScript type declarations.


128-138: LGTM! Well-documented virtual method with clear behaviour specification.

The method signature and documentation follow best practices. The use of epoch_time_ms_t for timestamp handling is appropriate.

src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
@LinZhihao-723 LinZhihao-723 self-requested a review January 21, 2025 19:44
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 122f1cf and f4a0207.

📒 Files selected for processing (7)
  • CMakeLists.txt (1 hunks)
  • src/clp_ffi_js/ir/StreamReader.cpp (2 hunks)
  • src/clp_ffi_js/ir/StreamReader.hpp (5 hunks)
  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (2 hunks)
  • src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (2 hunks)
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (1 hunks)
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp
  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
  • src/clp_ffi_js/ir/StructuredIrStreamReader.hpp
🧰 Additional context used
📓 Path-based instructions (3)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/clp_ffi_js/ir/StreamReader.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/clp_ffi_js/ir/StreamReader.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: lint (ubuntu-latest)
  • GitHub Check: lint (macos-latest)
🔇 Additional comments (7)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1)

74-75: LGTM!

The method signature is correct and properly marked with [[nodiscard]] and override attributes.

src/clp_ffi_js/ir/StreamReader.cpp (2)

132-132: Consider using std::optional with register_optional().

For better type safety and consistency, consider using std::optional<T> with Embind's register_optional() function instead of custom nullable type definitions.

This has been tracked in issue #51 for implementation in a future PR.


150-153: LGTM!

The function binding is correctly added and properly formatted.

src/clp_ffi_js/ir/StreamReader.hpp (4)

14-14: LGTM!

The include statement is correctly added.


33-33: LGTM!

The type declaration is correctly added.


128-138: LGTM!

The virtual method is well-documented with clear descriptions of parameters and return values.


289-314: LGTM!

The template implementation:

  • Uses std::upper_bound for efficient searching.
  • Properly handles edge cases (empty container and all timestamps greater than target).
  • Uses clear and descriptive variable names.

CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
@@ -29,6 +30,7 @@ EMSCRIPTEN_DECLARE_VAL_TYPE(ReaderOptions);
// JS types used as outputs
EMSCRIPTEN_DECLARE_VAL_TYPE(DecodedResultsTsType);
EMSCRIPTEN_DECLARE_VAL_TYPE(FilteredLogEventMapTsType);
EMSCRIPTEN_DECLARE_VAL_TYPE(LogEventIdxTsType);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
EMSCRIPTEN_DECLARE_VAL_TYPE(LogEventIdxTsType);
// How about `NullableLogEventIdx`?
EMSCRIPTEN_DECLARE_VAL_TYPE(LogEventIdxTsType);

Copy link
Collaborator Author

@Henry8192 Henry8192 Feb 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kirkrodrigues Should we also change the name of FilteredLogEventMapTsType later? It is also nullable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do prefer that, but what do @davemarco / @junhaoliao think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally even prefer making the change here, as it is just a internal type name change and won't affect any front end code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, isn't FilteredLogEventIdx also defined in the Log Viewer? It's true that the name in clp-ffi-js doesn't influence (from a code execution perspective) the name in the log viewer, but it's probably better for developers if we stay consistent. So if we rename it here, we should probably rename it in the log viewer as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't change the filteredEventMap. Whether nullable in the name is better I think is a question of style. Like in general i think its better not to put type info in the name, but here I guess maybe exception since we are trying to make more clear than returns null.

src/clp_ffi_js/ir/StreamReader.hpp Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
return LogEventIdxTsType{emscripten::val::null()};
}

auto first_greater_it{std::upper_bound(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::upper_bound assumes the range is sorted, right? What would happen if the log events aren't in ascending timestamp order? It's technically not impossible and we have seen log files where that's true.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao iirc, I was told to assume the log events are in accending order. If that's not the case, this whole search function won't work.

If that's not the case, we have to sort all of the log events, cuz' that's the only way to fully garantee all log events are sorted. And we (might?) also need a map to remember their original location.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the case with out of order timestamps is rare (as far as we know), we could maybe do something like:

  • When decoding the file, keep track of whether the timestamp of two consecutive log events are out of order and if so, set a flag and display a warning to the user that the file contains out of order timestamps and so timestamp-related searches may not function correctly (since two log events with the same timestamp may exist at completely different positions in the log file).
  • If that flag is set, we use a brute-force implementation of find_nearest_log_event_by_timestamp. Otherwise, we use the implementation you have here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • When decoding the file, keep track of whether the timestamp of two consecutive log events are out of order
    "keep track of" means we are basically iterating the log events during the decode right?

if that's the case, can we divide the log events up to chunks using these timestamp outliers, perform searches in the proper chunk, and finally iterate those outliers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would that be straightforward to do given that the log events collection changes based on the active filter? If not, then given how rare the case is (as far as we know), I'm not sure the extra code complexity is worth it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an offline discussion with @junhaoliao, we reach to an agreement that we can doc the assumption, and neglact this corner case.

By neglacting this corner case, we can save plenty of resource from verifying log events are in chronological order.

@kirkrodrigues
Copy link
Member

Sorry for the delay in reviewing this. Let's keep further discussions in the PR so that we can all come to a consensus more quickly.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f4a0207 and 8c8c42e.

📒 Files selected for processing (6)
  • src/clp_ffi_js/ir/StreamReader.cpp (2 hunks)
  • src/clp_ffi_js/ir/StreamReader.hpp (5 hunks)
  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (2 hunks)
  • src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (2 hunks)
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (1 hunks)
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp
  • src/clp_ffi_js/ir/StructuredIrStreamReader.hpp
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp
🧰 Additional context used
📓 Path-based instructions (3)
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/clp_ffi_js/ir/StreamReader.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/clp_ffi_js/ir/StreamReader.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: lint (ubuntu-latest)
  • GitHub Check: lint (macos-latest)
🔇 Additional comments (7)
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (2)

14-14: LGTM!

The include is necessary for the epoch_time_ms_t type used in the new method.


151-155: LGTM!

The implementation correctly delegates to the generic template function, maintaining consistency with other similar methods in the class.

src/clp_ffi_js/ir/StreamReader.cpp (2)

132-132: LGTM!

The type registration follows the established pattern for nullable types.


150-153: LGTM!

The function binding is properly placed and follows the naming convention used in other bindings.

src/clp_ffi_js/ir/StreamReader.hpp (3)

14-14: LGTM!

The include is necessary for the epoch_time_ms_t type used throughout the file.


33-33: LGTM!

The type declaration follows the established pattern for nullable types.


129-145: LGTM!

The method declaration is well documented with clear examples and edge cases.

Comment on lines 296 to 322
template <typename LogEvent>
auto StreamReader::generic_find_nearest_log_event_idx_by_timestamp(
LogEvents<LogEvent> const& log_events,
clp::ir::epoch_time_ms_t target_ts
) -> NullableLogEventIdx {
if (log_events.empty()) {
return NullableLogEventIdx{emscripten::val::null()};
}

// Find the log event whose timestamp is just after `target_ts`
auto first_greater_it{std::upper_bound(
log_events.begin(),
log_events.end(),
target_ts,
[](clp::ir::epoch_time_ms_t ts, LogEventWithFilterData<LogEvent> const& log_event) {
return ts < log_event.get_timestamp();
}
)};

if (first_greater_it == log_events.begin()) {
return NullableLogEventIdx{emscripten::val(0)};
}

auto const first_greater_idx{std::distance(log_events.begin(), first_greater_it)};

return NullableLogEventIdx{emscripten::val(first_greater_idx - 1)};
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Timestamp Ordering Not Verified

  • The function in StreamReader.hpp employs std::upper_bound with a lambda that compares timestamps, which only works correctly when the log events are sorted in ascending order.
  • The various get_timestamp implementations simply return their internal timestamp (e.g. from m_timestamp in LogEventWithFilterData.hpp), and there is no evidence of any explicit sort operation in the codebase.
  • Instead, the log events appear to be appended (for example, via emplace_back in StructuredIrUnitHandler.cpp), meaning the natural order is assumed to be chronologically correct rather than being enforced programmatically.
🔗 Analysis chain

Verify timestamp ordering assumption.

The implementation assumes log events are sorted by timestamp in ascending order. This assumption should be verified as log files might not always maintain this order.

Run this script to check if timestamps are always in ascending order:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if log events are always sorted by timestamp
# Expected: No output means timestamps are in order

ast-grep --pattern $'class $_ {
  $$$
  get_timestamp() {
    $$$
  }
  $$$
}'

rg -A 5 "get_timestamp"

Length of output: 4283

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8c8c42e and a984d7b.

📒 Files selected for processing (6)
  • src/clp_ffi_js/ir/StreamReader.cpp (2 hunks)
  • src/clp_ffi_js/ir/StreamReader.hpp (5 hunks)
  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (2 hunks)
  • src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (2 hunks)
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (1 hunks)
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
  • src/clp_ffi_js/ir/StructuredIrStreamReader.hpp
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp
  • src/clp_ffi_js/ir/StreamReader.cpp
🧰 Additional context used
📓 Path-based instructions (2)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/clp_ffi_js/ir/StreamReader.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: lint (ubuntu-latest)
  • GitHub Check: lint (macos-latest)
🔇 Additional comments (4)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (1)

161-165: LGTM!

The implementation correctly delegates to the generic implementation and follows the single responsibility principle.

src/clp_ffi_js/ir/StreamReader.hpp (3)

33-33: LGTM!

The type name clearly indicates its purpose and nullability, following the pattern used for other type declarations.


129-146: LGTM!

The method declaration is well-documented with clear explanation of its behaviour, correctly marked as virtual and nodiscard, and uses consistent parameter naming.


196-209: LGTM!

The templated method declaration is well-structured with appropriate documentation and parameter types.

Comment on lines +296 to +322
template <typename LogEvent>
auto StreamReader::generic_find_nearest_log_event_by_timestamp(
LogEvents<LogEvent> const& log_events,
clp::ir::epoch_time_ms_t target_ts
) -> NullableLogEventIdx {
if (log_events.empty()) {
return NullableLogEventIdx{emscripten::val::null()};
}

// Find the log event whose timestamp is just after `target_ts`
auto first_greater_it{std::upper_bound(
log_events.begin(),
log_events.end(),
target_ts,
[](clp::ir::epoch_time_ms_t ts, LogEventWithFilterData<LogEvent> const& log_event) {
return ts < log_event.get_timestamp();
}
)};

if (first_greater_it == log_events.begin()) {
return NullableLogEventIdx{emscripten::val(0)};
}

auto const first_greater_idx{std::distance(log_events.begin(), first_greater_it)};

return NullableLogEventIdx{emscripten::val(first_greater_idx - 1)};
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Attention: Log events ordering may not be enforced.

Our investigation shows that the implementation of generic_find_nearest_log_event_by_timestamp relies on log events being in ascending timestamp order, yet searches for sorting or ordering logic (e.g. explicit calls to std::sort or other ordering mechanisms) produced no evidence to enforce this order during deserialization or insertion. This could potentially lead to incorrect log event lookups if the input sequence isn’t already sorted.

  • The log events appear to be added via methods like emplace_back without explicit ordering.
  • No ordering logic (e.g. sorting function calls) was identified to confirm that the events are sorted as required.
🔗 Analysis chain

Verify timestamp ordering assumption.

The implementation assumes log events are sorted by timestamp in ascending order. This assumption should be verified as log files might not always maintain this order.

Run this script to check if timestamps are always in ascending order:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if timestamps are in ascending order during log event deserialization
# Expected: Find all places where timestamps are added to log events

# Search for timestamp assignments during deserialization
ast-grep --pattern $'deserialize_log_event() {
  $$$
  get_timestamp()
  $$$
}'

# Search for places where log events are added to collections
rg -A 5 "emplace_back.*LogEvent"

Length of output: 115


Script:

#!/bin/bash
# Investigate ordering logic for log events in the codebase

# 1. Search for std::sort invocations that might enforce timestamp ordering
rg -n "std::sort\s*\(.*get_timestamp" --color=always

# 2. Locate deserialization functions potentially responsible for creating or ordering log events
rg -n "deserialize_log_event\s*\(" --color=always

# 3. Find where log events are inserted into collections (e.g. using emplace_back) to inspect their ordering
rg -n "emplace_back\s*\(.*LogEvent" --color=always

# 4. Additionally, search for any comparison operators or functions comparing timestamps
rg -n "get_timestamp\s*\(" --color=always

Length of output: 1721

…nction ONLY works with timestamps in chronological order.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/clp_ffi_js/ir/StreamReader.hpp (1)

196-322: Consider improving variable names for better readability.

The implementation is efficient and handles edge cases well. However, consider these variable name improvements:

  • first_greater_it could be upper_bound_it to match STL terminology
  • first_greater_idx could be upper_bound_idx for consistency

Apply this diff to improve readability:

-    auto first_greater_it{std::upper_bound(
+    auto upper_bound_it{std::upper_bound(

-    if (first_greater_it == log_events.begin()) {
+    if (upper_bound_it == log_events.begin()) {

-    auto const first_greater_idx{std::distance(log_events.begin(), first_greater_it)};
+    auto const upper_bound_idx{std::distance(log_events.begin(), upper_bound_it)};

-    return NullableLogEventIdx{emscripten::val(first_greater_idx - 1)};
+    return NullableLogEventIdx{emscripten::val(upper_bound_idx - 1)};
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a984d7b and 4788157.

📒 Files selected for processing (1)
  • src/clp_ffi_js/ir/StreamReader.hpp (5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/clp_ffi_js/ir/StreamReader.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: lint (ubuntu-latest)
  • GitHub Check: lint (macos-latest)
🔇 Additional comments (2)
src/clp_ffi_js/ir/StreamReader.hpp (2)

33-33: LGTM!

The type declaration is well-placed and follows the established naming pattern for output types.


129-145: LGTM with a verification note.

The method declaration is well-documented and consistent with the class's style. The documentation clearly states the chronological order assumption.

Let's verify if the chronological order assumption is documented elsewhere in the codebase:

✅ Verification successful

I'm going to extend our search beyond markdown files to check for any references to a "chronological order" documentation anywhere in the repository. Please run the script below:

This will help verify whether the assumption is documented elsewhere. Thank you.


LGTM – Chronological order assumption verified.

The search confirms that the chronological ordering is documented within the method’s comment in src/clp_ffi_js/ir/StreamReader.hpp. No other contradictory or additional documentation was found.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for documentation about log event ordering
# Expected: Find documentation about chronological order assumption

rg -i "chronological|timestamp.*order|sorted.*timestamp" --type md

Length of output: 68


Script:

#!/bin/bash
# Search for any reference to "chronological order" in all files across the repository.
rg -i "chronological order" --max-columns 200

Length of output: 180

Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title, how about:

feat: Add support for finding the log event that's closest to a target timestamp.

Comment on lines +132 to +138
* - the collection of log events is sorted in chronological order, or the search won't work;
* - and we insert a marker log event, M, with timestamp `target_ts` into the collection (if log
* events with timestamp `target_ts` already exist in the collection, M should be inserted
* after them).
*
* L is the event just before M, if M is not the first event in the collection; otherwise L is
* the event just after M.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* - the collection of log events is sorted in chronological order, or the search won't work;
* - and we insert a marker log event, M, with timestamp `target_ts` into the collection (if log
* events with timestamp `target_ts` already exist in the collection, M should be inserted
* after them).
*
* L is the event just before M, if M is not the first event in the collection; otherwise L is
* the event just after M.
* - the collection of log events is sorted in chronological order;
* - and we insert a marker log event, M, with timestamp `target_ts` into the collection (if log
* events with timestamp `target_ts` already exist in the collection, M should be inserted
* after them).
*
* L is the event just before M, if M is not the first event in the collection; otherwise L is
* the event just after M.
*
* NOTE: If the collection of log events isn't in chronological order, this method has undefined
* behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants