Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows support #14

Merged
merged 12 commits into from
Feb 6, 2020
Merged

Conversation

JonathonH-PIA
Copy link
Contributor

Support native Windows builds with Msys2/MinGW, using the MinGW libunbound package.

The major issue was that libunbound can't be hooked up to the libuv event loop with a file descriptor like on Unix (libunbound does not use a pipe on Windows, and libuv on Windows can only poll sockets anyway).
Instead, create a worker thread to operate libunbound, using libuv's threading primitives. Results are queued back to the libuv event loop using a libuv async.
hnsd now shuts down cleanly on SIGINT and SIGTERM too, realized it was not doing this when I could not test the thread shutdown code. This uncovered minor issues in the pool shutdown code, which are fixed.
The libunbound path can also be set when configuring, we're building libunbound from source in order to stay on the same version on all platforms.

@JonathonH-PIA
Copy link
Contributor Author

This snowballed a little bit, so let me know if you'd rather I break this up and make separate PRs.

Thanks!

@tynes
Copy link

tynes commented May 10, 2019

Do you think you could add sections to the README regarding Windows Installation and Dependencies? I'm in the process of testing this and I'm figuring out how to build it on Windows, it would make it way easier for me to run it

@JonathonH-PIA
Copy link
Contributor Author

Absolutely, I'll add that and push it up.

@JonathonH-PIA
Copy link
Contributor Author

Let me know if there are any dependency issues after installing per those notes, I kept notes as I was setting it up but it's possible that I could have missed something.

@tynes
Copy link

tynes commented May 10, 2019

From a fresh Windows development environment, I was able to build this PR on Windows following the instructions. It was able to sync to the tip and resolve DNS queries using dig. I then built this PR on Linux just to make sure that still worked, and I was able to sync to the tip and resolve DNS queries using dig again.

@chjj
Copy link
Contributor

chjj commented May 13, 2019

Whoa, great work. That's interesting that windows was having some event loop issues. I was considering dropping the async resolution altogether and just doing synchronous resolutions on the libuv thread pool (this sounds stupid, I know, but I have a lingering suspicion that uv+unbound were causing a heap corruption when used together).

I will play around with this tomorrow. I only wish I had a windows machine to test this on.

@JonathonH-PIA
Copy link
Contributor Author

Thanks Chris, looking forward to hearing how this works for you after you try it. I've been running it on Linux and Mac too.

I'd hesitate to block the libuv event loop with any synchronous recursive resolutions. We've found that on connections with relatively high latency (starting around >150 ms or so), uncached resolution times can be very long, often several seconds. AFAIK this is inherent to running a recursive resolver locally, latency costs you a lot for all the round trips. The cache helps mitigate this, but many domains we've tested are only cached for ~5 minutes.

I'll keep an eye out for the possible heap corruption. I can't think of a way libuv and libunbound would interact poorly to cause that, they don't really directly interact very much (even less with this PR), though anything's possible of course 😛

I did notice in Valgrind that we get some invalid reads/writes on hnsd shutdown. It looks like these are mostly due to freeing libuv resources improperly - uv_close() is async, so the object can't be freed until the callback is invoked. Some uv objects seem to be freed right away, like hsk_rs_t.socket; I used after_close_free() in this PR to free resources after they're closed. These only occur on shutdown though in hnsd, so it probably doesn't explain any heap corruption.

@JonathonH-PIA
Copy link
Contributor Author

I noticed that libunbound is causing the worker thread to busy-wait at 100% CPU - apparently ub_wait() doesn't actually block if there are no requests pending. I'm working on this, will probably have the fix up Monday.

@JonathonH-PIA
Copy link
Contributor Author

Busy-waiting is fixed, shutdown of the worker thread is a bit cleaner now too.
Also ended up fixing the libuv resources in rs/ns/pool that weren't beeing freed properly, was doing some shutdown validation with Valgrind and they were causing a lot of errors.

@JonathonH-PIA
Copy link
Contributor Author

@chjj Have you had a chance to check this out?

@tynes
Copy link

tynes commented May 28, 2019

I vote to make @JonathonH-LTM a maintainer of this repo with commit access, if he wants it that is. Its important to not let the ecosystem fragment too much with many stale PRs and various forks. Newcomers may end up using a vulnerable fork which would lead to a less secure network.

@JonathonH-PIA
Copy link
Contributor Author

Thanks for the vote of confidence Mark, but for now I think I'd rather not take on that additional responsibility. Even with commit access it'd still mean a lot to have JJ's seal of approval 😁. I don't anticipate many more changes from our side for the foreseeable future though.

I appreciate you guys taking the time to review these changes 👍. I want to avoid making this a long-term fork too if possible, and I might revisit commit access if we did start doing more work on this down the road.

@tynes tynes mentioned this pull request Sep 11, 2019
@JonathonH-PIA
Copy link
Contributor Author

@boymanjor Can we merge this? Would be a big help if we don't have to carry these patches downstream, and Windows support is on the TODO list for hnsd anyway.

@boymanjor
Copy link
Contributor

@JonathonH-LTM I'm going to merge this now, but it is worth noting that hnsd does not currently work with the hsd network protocol. @chjj has mentioned a possible, complete rewrite of hnsd. I've halted work on hnsd until we hear more from him, but we might as well get this in now.

@boymanjor boymanjor merged commit 7a87553 into handshake-org:master Feb 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants