Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant Performance Regression in ts-proto v2 Compared to v1 #1098

Open
HofmannZ opened this issue Aug 23, 2024 · 6 comments
Open

Significant Performance Regression in ts-proto v2 Compared to v1 #1098

HofmannZ opened this issue Aug 23, 2024 · 6 comments

Comments

@HofmannZ
Copy link

Hello ts-proto maintainers,

First, I want to thank you for your work on ts-proto—it's been an invaluable tool for our projects. We recently benchmarked three libraries (ts-proto, google-protobuf, and protobuf-ts) as part of our ongoing efforts to optimize our performance. Based on those benchmarks, we initially selected ts-proto (v1) as it outperformed the other libraries.

Recently, we decided to include ts-proto v2 in our benchmark tests to evaluate any improvements or changes in performance. However, we were surprised to observe a significant performance regression in ts-proto v2 compared to v1.

Here are the benchmark results:

google-protobuf v3.21.4:

  • readInputs: 7,184 ops/sec (139,186 ns average time)
  • updateInputs: 6,109 ops/sec (163,671 ns average time)

ts-proto v1.181.2:

  • readInputs: 7,505 ops/sec (133,227 ns average time)
  • updateInputs: 6,353 ops/sec (157,404 ns average time)

ts-proto v2.0.3:

  • readInputs: 6,372 ops/sec (156,918 ns average time)
  • updateInputs: 4,624 ops/sec (216,218 ns average time)

protobuf-ts v2.9.4:

  • readInputs: 6,252 ops/sec (159,943 ns average time)
  • updateInputs: 4,435 ops/sec (225,466 ns average time)

As you can see, the performance of ts-proto v2 is significantly lower than v1, particularly in the updateInputs task, where the operations per second dropped by almost 27%, and the average time increased by about 37%.

Steps to Reproduce:

  1. Benchmark ts-proto v1 and v2 using our test suite.
  2. Compare the performance metrics (ops/sec and average time) for tasks like readInputs and updateInputs.

Expected Behavior:

We expected ts-proto v2 to at least maintain the performance levels of v1, if not improve upon them.

Actual Behavior:

The performance of ts-proto v2 is significantly lower, particularly in the updateInputs task.

Environment:

  • Node.js version: v20.16.0
  • ts-proto versions: v1.181.2 and v2.0.3
  • Operating System: MacOS 14.6.1

Additional Context:

We would greatly appreciate any insights into what might be causing this regression. If there's anything specific in the new version that we should be aware of, or if there are configuration changes we can make to mitigate this, please let us know.

Thank you for your time and for all the work you put into maintaining this library!

@stephenh
Copy link
Owner

Hi @HofmannZ ! Thanks for filing the issue!

That is an interesting but unfortunate result...

The biggest change in ts-proto v2 is that we'd migrated to a different binary serialization library, @bufbuild/protobuf, as the previous library we sat on top of, protobuf.js, has been stagnant (albeit technically maintained) for quite awhile, around issues like ESM, Bigint, and custom type support that, over the years, have been painful to deal with.

So, in theory this move was a big win in terms of a getting a nice/needed refresh to our overall stack -- but definitely here you about the performance regression.

@timostamm do you have thoughts/comments on ^?

Fwiw @HofmannZ the benchmark repo you linked to might still be private? The link 404s for me...

@HofmannZ
Copy link
Author

@stephenh - My apologies! I had mistakenly linked to a private repo, but it's now public.

@timostamm
Copy link
Contributor

We've been waiting for resizable ArrayBuffers - there's potential for a significant boost in write performance. They aren't widely enough available yet, though.

I think that conditional imports like they are used in protobuf.js should be avoided because of issues with bundlers and ESM, but it's possible that perf could be improved without that.

It would be interesting to take a look at flame graphs to see where the time is spent compared to the protobuf.js reader and writer. I'm happy to help take a look and make improvements. ts-proto, protobuf-ts, and protobuf-es would all benefit from that.

@stephenh
Copy link
Owner

Selfishly that would be really great & appreciated @timostamm ! :-D

Fwiw @HofmannZ since this is important to you, would also be great ofc if you wanted to analyze/dive deeper into the protobufjs/buildbuf flamegraphs, and I assume you can stay on ts-proto 1.x until the performance is back on par.

Thanks!

@HofmannZ
Copy link
Author

Hi @stephenh and @timostamm,

Thanks for the insights—much appreciated!

I've done some preliminary research on resizable ArrayBuffers, and they seem particularly promising for improving performance in protocol buffer processing. Here’s why they might be beneficial in this context:

  1. Dynamic Memory Management for Protobuf Messages: Protocol buffers often deal with messages of varying sizes. With traditional fixed-size ArrayBuffers, we either need to over-allocate memory or handle complex resizing logic. Resizable ArrayBuffers would allow the buffer to grow dynamically as more data is encoded or decoded, reducing the need for costly reallocations and memory copying during serialization and deserialization.

  2. Performance Gains in Write Operations: When encoding large or complex protocol buffer messages, resizable ArrayBuffers can optimize memory usage by expanding only when necessary. This can lead to faster write operations, as the overhead of managing buffer sizes manually is minimized, directly impacting performance where it matters most in encoding tasks.

  3. Simplified Buffer Management in Protobuf Implementations: Managing memory efficiently is crucial in high-performance applications. Resizable ArrayBuffers can help simplify the implementation of protocol buffers by removing the need for pre-allocating large buffers or implementing custom resizing logic, making the codebase cleaner and less error-prone.

  4. Alignment with Future Protocol Buffer Features: As protocol buffers evolve to support more complex data types and larger messages, leveraging resizable ArrayBuffers can ensure that the library remains capable of handling these future demands efficiently, keeping pace with modern JavaScript capabilities.

However, I noticed that resizable ArrayBuffers are only supported starting from Node.js version 20. Adopting them would mean dropping support for Node.js version 18, which might be a significant decision for the project. It’s definitely something to weigh as we consider optimization strategies.

Regarding the flame graph analysis, I'm keen to contribute and would love your guidance on a few points:

  1. Excluding Warm-up Rounds: My benchmark code includes warm-up rounds that shouldn't influence the analysis. I plan to exclude these from the flame graph profiling by starting data capture after a set number of iterations. If you have preferred methods or tools for this, I'd appreciate any recommendations.

  2. Focus on Key Iterations: Given that the benchmarks run 10,000 times, I’m considering focusing the flame graph analysis on a representative subset of iterations to avoid excessive data and make it easier to identify performance bottlenecks. What would you suggest as the best approach here?

  3. Comparative Flame Graphs: Creating flame graphs for both protobuf.js and @bufbuild/protobuf could provide valuable insights into where performance differences arise. If this sounds useful, I can generate these profiles and share them for further discussion.

I'd appreciate any suggestions or best practices you might have for generating and interpreting these flame graphs effectively. I'm committed to helping find a solution that brings the performance of ts-proto v2 back on par with v1.

@stephenh
Copy link
Owner

Hey @HofmannZ ! Thanks for the questions/context on the flamegraphs...

Honestly when I've done flamegraphs in the past, the issue is either pretty obvious ... or it's not, so I'm not sure some of the fancier "focus on key iterations"/etc approaches are necessary.

Hopefully just seeing protobufjs and @bufbuild/protobuf flamegraphs side-by-side will make the issue easy to see; granted, sometimes it doesn't, if it's an issue that is "death by a thousand cuts" and not an obvious/single hotspot.

But, side by side flamegraphs does seem like the best place to start. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants