Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Odd REQ->recv Socket Performance #53

Closed
binarybana opened this issue Apr 20, 2014 · 5 comments
Closed

Odd REQ->recv Socket Performance #53

binarybana opened this issue Apr 20, 2014 · 5 comments

Comments

@binarybana
Copy link

I've run a few simple ping/pong benchmarks and noticed some weird performance behavior of the REQ socket. Comparing performance to Python's pyzmq, the server (REP socket) performance is about the same, but for the Julia client, the request of the REQ socket reliably takes >100 milliseconds, whereas the first takes about 10ms, and all others are <1ms:

$ julia jobmon/pingcl.jl
elapsed time: 0.014093492 seconds
elapsed time: 0.12233187 seconds
elapsed time: 0.000415424 seconds
elapsed time: 0.000442195 seconds
elapsed time: 0.000316441 seconds
elapsed time: 0.000349937 seconds
elapsed time: 0.000312942 seconds
elapsed time: 0.000283473 seconds
elapsed time: 0.000310993 seconds
...

I've accounted here for garbage collection and whether the server is written in Julia or Python. The Python client has all requests handled in <1ms:

$ python jobmon/pingcl.py
0.000629901885986
0.000393152236938
0.000520944595337
0.000505924224854
0.000555038452148
0.000494956970215
0.000560998916626
0.000461101531982
0.000371932983398
...

I further narrowed this extra time to the recv of the REQ socket by more narrowly bracketing my tic()/toc(), but things look remarkably similar in the code (both libraries are thin wrappings over zmq_msg_recv). And I didn't know where else to look.

@stevengj
Copy link
Contributor

Isn't this just the usual Julia thing where the first time you call a function it takes some time to be compiled?

@binarybana
Copy link
Author

That's what my first thought was, but while the first request is slow (I'd imagine from compilation) it's the second call here that is even slower: 10ms, 100ms, <1ms, <1ms ....

@stevengj
Copy link
Contributor

@JeffBezanson, is there something in the memory subsystem that could cause a large latency like this in mid-computation, even though gc_disable() was called?

@binarybana
Copy link
Author

Looking at memory allocation along with the times:

$ julia pingcl.jl
elapsed time: 0.011553368 seconds (34280 bytes allocated)
elapsed time: 0.118611181 seconds (3361300 bytes allocated)
elapsed time: 0.000641436 seconds (432 bytes allocated)
elapsed time: 0.000623514 seconds (432 bytes allocated)
elapsed time: 0.000627998 seconds (432 bytes allocated)

$ julia pingcl.jl # <- unrolled loop
elapsed time: 0.107864638 seconds (3244916 bytes allocated)
elapsed time: 0.123715298 seconds (3361348 bytes allocated)
elapsed time: 0.00052556 seconds (480 bytes allocated)
elapsed time: 0.000545088 seconds (480 bytes allocated)

So memory allocation seems to be the problem, and the fact that it is showing up in the second time through recv, and not the first makes me wonder if this is a ZMQ.jl problem rather than normal Julia compilation. And I also don't understand why unrolling the loop makes things worse for the first time through and while leaving the second slow.

But other than being odd, I don't know if this is sufficiently worse than normal Julia initialization time to warrant further investigation, so I'd be okay if the maintainers decided to close this issue.

@JamesWrigley
Copy link
Member

Closing this because I cannot reproduce it on master and Julia 1.11:

$  julia --project pongcl.jl
  0.030095 seconds (30.87 k allocations: 1.598 MiB, 97.99% compilation time)
  0.000221 seconds (8 allocations: 256 bytes)
  0.000344 seconds (8 allocations: 256 bytes)
  0.000254 seconds (8 allocations: 256 bytes)
  0.000083 seconds (8 allocations: 256 bytes)
  0.000071 seconds (8 allocations: 256 bytes)
  0.000069 seconds (8 allocations: 256 bytes)
  0.000116 seconds (8 allocations: 256 bytes)
  0.000077 seconds (8 allocations: 256 bytes)
  0.000069 seconds (8 allocations: 256 bytes)
  0.000068 seconds (8 allocations: 256 bytes)
  0.000067 seconds (8 allocations: 256 bytes)
  0.000070 seconds (8 allocations: 256 bytes)
  0.000067 seconds (8 allocations: 256 bytes)
  0.000067 seconds (8 allocations: 256 bytes)
  0.000067 seconds (8 allocations: 256 bytes)
  0.000068 seconds (8 allocations: 256 bytes)
  0.000067 seconds (8 allocations: 256 bytes)
  0.000075 seconds (8 allocations: 256 bytes)
  0.000070 seconds (8 allocations: 256 bytes)
Code

Server:

using ZMQ

s = Socket(ZMQ.REP)
ZMQ.bind(s, "tcp://*:7001")

while true
    msg = ZMQ.recv(s)
    ZMQ.send(s, "pong")
end

Client:

using ZMQ

s = Socket(ZMQ.REQ)
ZMQ.connect(s, "tcp://localhost:7001")

GC.enable(false)
for _ in 1:20
    @time begin
        ZMQ.send(s, "ping")
        msg = ZMQ.recv(s)
    end
end

For me the first call takes much longer because of precompilation, then after a few more iterations it settles to something in the 10's of us range. I suppose after 10 years something was bound to have changed :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants