Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Switch to pure JVM HttpUrlConnection Implementation #369

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

driverpt
Copy link
Contributor

@driverpt driverpt commented Oct 8, 2022

No description provided.

@msailes
Copy link
Collaborator

msailes commented Oct 9, 2022

Hi Luís,

Do you have any performance data on this change?

Thanks,

@driverpt
Copy link
Contributor Author

driverpt commented Oct 9, 2022

Not yet @msailes , but the purpose is to make this Client OS/Arch agnostic

@driverpt driverpt changed the title [RFC] Switch to pure HttpUrlConnection Implementation [RFC] Switch to pure JVM HttpUrlConnection Implementation Oct 9, 2022
@driverpt
Copy link
Contributor Author

driverpt commented Oct 9, 2022

@msailes , comparing smoke tests output, with HttpURLConnection the invocations look slightly faster. Don't forget that JNI calls are typically expensive because they are not "JIT-able" or Optimizable by the JVM. It's possible to optimize even further the current code (Keep Alive, etc...)

How do you suggest to Benchmark this properly with nice reports?

@marksailes
Copy link

To start with I would test a few thousand requests with a minimal handler (hello world) with the java 11 managed runtime. Then do the same thing again with your 'custom' Java runtime.

@driverpt
Copy link
Contributor Author

driverpt commented Oct 13, 2022

Benchmark for Pure Java HTTP Client

# JMH version: 1.35
# VM version: JDK 1.8.0_152, Java HotSpot(TM) 64-Bit Server VM, 25.152-b16
# VM invoker: **Redacted**
# VM options: **Redacted**
# Blackhole mode: full + dont-inline hint (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: <none>
# Measurement: 15 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: com.amazonaws.services.lambda.runtime.api.client.runtimeapi.LambdaRuntimeClientBenchmark.benchmarkPureJavaClient

# Run progress: 0.00% complete, ETA 00:00:15
# Fork: 1 of 1
Iteration   1: 2.239 ms/op
Iteration   2: 0.979 ms/op
Iteration   3: 0.506 ms/op
Iteration   4: 0.471 ms/op
Iteration   5: 0.387 ms/op
Iteration   6: 0.320 ms/op
Iteration   7: 0.300 ms/op
Iteration   8: 0.224 ms/op
Iteration   9: 0.216 ms/op
Iteration  10: 0.211 ms/op
Iteration  11: 0.208 ms/op
Iteration  12: 0.287 ms/op
Iteration  13: 0.210 ms/op
Iteration  14: 0.210 ms/op
Iteration  15: 0.210 ms/op


Result "com.amazonaws.services.lambda.runtime.api.client.runtimeapi.LambdaRuntimeClientBenchmark.benchmarkPureJavaClient":
  0.465 ±(99.9%) 0.567 ms/op [Average]
  (min, avg, max) = (0.208, 0.465, 2.239), stdev = 0.531
  CI (99.9%): [≈ 0, 1.033] (assumes normal distribution)


# Run complete. Total time: 00:00:16

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                                             Mode  Cnt  Score   Error  Units
LambdaRuntimeClientBenchmark.benchmarkPureJavaClient  avgt   15  0.465 ± 0.567  ms/op

@richarddd
Copy link
Contributor

@msailes , comparing smoke tests output, with HttpURLConnection the invocations look slightly faster. Don't forget that JNI calls are typically expensive because they are not "JIT-able" or Optimizable by the JVM. It's possible to optimize even further the current code (Keep Alive, etc...)

How do you suggest to Benchmark this properly with nice reports?

Hey @driverpt, Richard from AWS here. Thanks for the effort in creating this PR! Much appreciated 🙌

The reason why we have a native implementation is purely for performance (and more importantly startup performance). We have seen that using a native c++ client greatly reduces cold starts because there is no code to JIT by the C1 or C2 compilers. HttpURLConnection is a rather old implementation with support for a lot of deprecated protocols and could not be replaced by HttpClient. But even HttpClient will add some latency to first invokes compared to the native implementation.

The most ideal way to go about this is to let users opt-in and choose their own http client via a ServiceLoader interface.

I would be happy to take a look at such a PR if you're interested :)

Again thanks for your effort!

(BTW HttpURLConnection uses a connection pool by default unless you opt-out)

@driverpt
Copy link
Contributor Author

driverpt commented Nov 1, 2022

Hello @richarddd , thanks for your reply.

I think JNI calls should only be used as a last resort, since they add a lot of complexity to the code base. I do believe that you will gain a couple of milliseconds in terms of coldstart but it's not optimal because you will lose the Java Heap Management.

I like to tinker with Java in terms of performance and I strongly believe that it's possible to optimize this even further like those opt-outs you mentioned and to use Threads to initialize the Client while performing class/method lookups for the handler.

Java is a fast language, it's just a matter of tinkering with the JIT to make the code faster.

@richarddd
Copy link
Contributor

richarddd commented Nov 2, 2022

Hello @richarddd , thanks for your reply.

I think JNI calls should only be used as a last resort, since they add a lot of complexity to the code base. I do believe that you will gain a couple of milliseconds in terms of coldstart but it's not optimal because you will lose the Java Heap Management.

From what we've seen its more than 100+ milliseconds difference inside Lambda :(

I like to tinker with Java in terms of performance and I strongly believe that it's possible to optimize this even further like those opt-outs you mentioned and to use Threads to initialize the Client while performing class/method lookups for the handler.

Java is a fast language, it's just a matter of tinkering with the JIT to make the code faster.

Yes, Java is fast, it's not just fast at startup. Most customers will benefit (in a Serverless environment such as Lambda) from having an HTTP client with less peak performance over one that's fast once warmed up (achieved by JITed code)

If you're measurements are different I would be very interested to see your numbers of this deployed inside Lambda :)

@smirnoal
Copy link
Contributor

smirnoal commented Jan 5, 2023

thanks @driverpt for the contribution. I was also trying different http client implementation in my personal fork some time ago: https://github.com/smirnoal/aws-lambda-java-libs/branches

Please do not close this PR for now. I think we can benefit from this implementation for Java8. I will be separating out http client interfaces and implementation into a dedicated package, which would be consumed by RIC(Runtime Interface Client), I have some ideas around this.

@msailes
Copy link
Collaborator

msailes commented Sep 6, 2023

I think that the HTTP client from https://github.com/awslabs/aws-crt-java would be a good option. I believe it would have the same performance characteristics of curl. But with the ease of maintenance of a regular Java client.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants