Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(indexer/server): implement a task buffer to improve efficiency in managing backpressure issues #519

Conversation

FrankLi123
Copy link
Contributor

@FrankLi123 FrankLi123 commented Sep 3, 2024

Summary

Implement an adaptive "Task Buffer" that efficiently manages task flow between the producer (data-source) and the consumer (worker).

  • Task Buffer

    • Uses a Go slice to effectively manage tasks.
    • Uses a mutex and conditional variables to coordinate interactions with the task buffer between goroutines ; producer goroutines pause when the buffer is full, while consumer goroutines wait when the buffer is empty.
    • The buffer is configured to handle up to 1,000 tasks by default.
  • Performance

    • the Task Buffer Increases performance by allowing producers to add tasks into the buffer queue while the buffer is not full, reducing the producer blocking time.
    • use of mutex and conditional variable may increase complexity and lead to more time spent in context switching of Task Buffer.

Checklist

Does this PR introduce a breaking change?

  • Yes
  • No

Other information

@FrankLi123 FrankLi123 changed the title feat(indexer/server): implement task buffer to improve efficency during Backpressure issues feat(indexer/server): implement a task buffer to improve efficiency in managing backpressure issues Sep 3, 2024
@FrankLi123 FrankLi123 linked an issue Sep 3, 2024 that may be closed by this pull request
1 task
@FrankLi123 FrankLi123 self-assigned this Sep 3, 2024
)

// TaskBuffer represents a fixed-size buffer, it operates as a FIFO buffer
type TaskBuffer struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type TaskBuffer struct {
This is a new data structure, please add sufficient test cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comprehensive test cases have been added in task_buffer_test.go under directory/engine

}

task := sw.tasks[0]
sw.tasks = sw.tasks[1:]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sw.tasks = sw.tasks[1:]
The slice operation sw.tasks = sw.tasks[1:] changes the starting position of the slice, but the underlying array remains unchanged. If this operation is performed frequently (e.g., frequent task retrieval), the underlying array of the slice may grow indefinitely without releasing the used memory, potentially causing a memory leak.
So i think adding sw.tasks[0] = nil would be safer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your point, and I have made the change to address it.

return fmt.Errorf("an error occurred in the source: %w", err)
}

return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return nil
The return nil statement immediately after the if err != nil check is unnecessary because if err != nil is true, the function will return early.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved.

if err := s.handleTasks(ctx, tasks); err != nil {
return fmt.Errorf("handle tasks: %w", err)
if err := s.handleTasks(ctx, task); err != nil {
errorChan <- fmt.Errorf("handle tasks error: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
errorChan <- fmt.Errorf("handle tasks error: %w", err)
I think the function retryableFunc should return the error instead of sending it to the errorChan. This approach can make the retry mechanism clearer and more straightforward

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved.

@FrankLi123 FrankLi123 marked this pull request as draft October 18, 2024 10:14
@FrankLi123 FrankLi123 closed this Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Resolve Backpressure Issues about the Task Channel in Node
2 participants