Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix for exasock timeout logic #83

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jnmcmanus
Copy link

@jnmcmanus jnmcmanus commented Jan 29, 2024

There's a bug in libs/exasock/socket/common.h that corrupts the timeout parameters set on exasock sockets. The following line occurs twice in this file:

const struct timespec *to = (const struct timespec *)&to_val

This is a static cast from a timeval pointer to a timespec pointer.

Here's how timeval is defined in the GNU C library:

struct timeval {
    time_t   tv_sec;   /* Seconds */
    long int tv_usec;  /* Microseconds */
};

And here's the definition of timespec:

struct timespec {
    time_t   tv_sec;   /* Seconds */
    long int tv_nsec;  /* Nanoseconds */
};

The two problems with this are:

  1. Technically, the types of tv_usec and tv_nsec are implementation defined. In the GNU C library, they're the same (long int), but these could be different types altogether in another implementation.
  2. More gravely, one struct represents the time as a (seconds, microseconds) pair, while the other uses (seconds, nanoseconds)! Obviously, you can't simply cast a pointer from one struct to the other and be done with it. But that's what exasock does, today. Socket options like the send timeout (SO_SENDTIMEO) are specified with a timeval instance. Then, to compute the timeout time, exasock "converts" this to a timespec, so a timeout given in microseconds is interpreted as a time in nanoseconds.

The consequence of this bug is that timeouts set to any value less than a second are essentially ignored. For instance, a timeout of 900,000 microseconds (900 milliseconds) is treated like a timeout of 900,000 nanoseconds (900 microseconds). Since the resolution of the CLOCK_MONOTONIC_COARSE system timer is on the order of milliseconds on Linux (e.g., see https://upvoid.com/devblog/2014/05/linux-timers/), a timeout parameter < 1 millisecond has little meaning/predictability.

This pull request suggests a minimal changeset to fix the bug. It has been tested on a Linux system and confirmed correct.

Resolves Issue #84

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant