-
Notifications
You must be signed in to change notification settings - Fork 868
Timers
A lot of discussions were made in the issue#3003 and related issues/PRs regarding timers used in Open MPI. This page marshals the information and explains the current implementation as of March, 2019 for Open MPI developers.
In this page, the word "timer" is defined as a function to give the current time. The time is expressed as an amount of time since a point in the past. In the MPI world, it can be used to implement the MPI_WTIME
routine.
Several timers are available depending on systems. Each timer has its characteristics.
Time should increase monotonically. In other words, time should not go back into the past.
If a timer is implemented using a CPU cycle counter and a system has multiple cores, time may not increase monotonically when a process is migrated to another core, especially a core on another socket.
Time should increase at a constant rate compared to real time.
If a timer is implemented using a CPU cycle counter, time may not increase at a constant rate when the frequency of the CPU changes.
How small the tick is. If a timer is used for the MPI_WTIME
routine, the resolution is reflected to the MPI_WTICK
routine.
How much time is needed to get the current time.
Whether the timer is affected by a system time correction, like one by a NTP daemon. If affected, time may go back into the past and may jump discontinuously.
Whether the timer is synchronized among compute nodes. This is reflected to the MPI_WTIME_IS_GLOBAL
attribute key.
Many hardware architectures provide high resolution and low overhead timers.
If a hardware-native timer is based on a CPU cycle counter, we should pay attention to core migration of a process and CPU frequency change.
The x86-64 architecture provides the RDTSCP
and the RDTSC
instructions, which read the TSC (time stamp counter). They are complex.
The TSC is implemented differently across CPU models.
TSC type | constant rate tick? | monotonic time? |
---|---|---|
(original) TSC | no | per core (?) |
constant TSC | yes | per core (?) |
invariant TSC | yes | per socket (?) |
A problem of the invariant TSC is that the instruction to determine the frequency is privileged.
The Armv8-A architecture provides the Generic Timer and the Generic Timer feature includes a system counter.
The system counter in the Generic Timer:
- Is a system level. Therefore all CPU cores in a compute node see the same counter.
- Measures the passing of time in real-time.
- Increases at a fixed frequency, typically in the range 1-50MHz, except in lower-power operating modes. The
CNTFRQ_EL0
register holds a copy of the current clock frequency. - Starts operating from zero.
- Can be obtained by reading the
CNTVCT_EL0
register.
See ARM Architecture Reference Manual ARMv8, for ARMv8-A architecture profile.
...
Keyword: time base facility
The SPARC-V9 architecture provides the TICK register.
The counter field of the TICK register:
- Is a 63-bit counter that counts CPU clock cycles.
- Can be read by the
RDTICK
instruction.
See The SPARC Architecture Manual, Version 9
Usually an OS provides library functions to get the current time.
Software-managed timers may be affected by a system time correction, like one by a NTP daemon.
The clock_gettime
function returns the current time and the clock_getres
returns the resolution (precision). The first argument clock_id
is used to select a type of a clock. For example, CLOCK_REALTIME
represents the clock measuring real time for the system since the Epoch. This clock is affected by discontinuous jumps in the system time. CLOCK_MONOTONIC
represents the monotonic clock for the system since an unspecified point in the past. This clock is not affected by discontinuous jumps in the system time.
The clock_gettime
and the clock_getres
functions are defined in POSIX.1-2001 and later. OS X has problem with clock_gettime
?
Amended in Dec 2021: macOS CLOCK_MONOTONIC
can actually go backwards; see https://github.com/mobile-shell/mosh/pull/1124, for example (and the corresponding darwin implementation of clock_gettime()
, which confirms the discussion that CLOCK_MONOTONTIC
can go backwards on macOS, but CLOCK_MONOTONIC_RAW
will not).
Some OSes implement these functions as system calls and therefore they are high overhead. GNU/Linux implements these functions using vDSO on some architectures to avoid the overhead.
The clock_gettime
function returns the current time, expressed as seconds and microseconds since the Epoch. The clock may be affected by discontinuous jumps in the system time.
The gettimeofday
function is defined in POSIX.1-2001 but is marked as obsolete in POSIX.1-2008.
Some OSes implement this function as a system call and therefore it is high overhead. GNU/Linux implements this function using vDSO on some architectures to avoid the overhead.
The times
function returns the CPU time spent executing instructions of the calling process and the CPU time spent in the system while executing tasks on behalf of the calling process.
The times
function are defined in POSIX.1-2001 and later.
Timers are used in several places in the Open MPI code.
The MPI_WTIME
routine returns an elapsed wall-clock time since some time in the past. The MPI_WTICK
routine returns the resolution of the MPI_WTIME
routine.
- Accuracy is important.
- High resolution and low overhead are better.
- The values should not be affected by a system time correction.
We need to trip the event library at some interval in the opal_progress
function.
- Accuracy and high resolution are not important.
- Low overhead is important.
...
...
As of Apr 2019 (HEAD 9bb8fd509b970d31232a430db73aa204b8a9b40d).
-
OPAL_HAVE_CLOCK_GETTIME
If theclock_gettime
function is provided the OS, the value is 1. Otherwise, the value is 0. This macro is defined in$build_dir/opal/include/opal_config.h
.
-
OPAL_TIMER_MONOTONIC
If theopal_sys_timer_get_cycles
function always returns monotonically increasing values in a node, the value is 1. Otherwise, the value is 0. This macro is once defined inopal/include/opal/sys/timer.h
as 1 and is redefined as 0 inopal/include/opal/sys/*/timer.h
for some architectures. -
OPAL_HAVE_SYS_TIMER_GET_CYCLES
If theopal_sys_timer_get_cycles
function is implemented for the architecture, the value is 1. Otherwise, the value is 0. This macro is defined inopal/include/opal/sys/*/timer.h
. -
OPAL_HAVE_SYS_TIMER_IS_MONOTONIC
If theopal_sys_timer_is_monotonic
function is implemented for the architecture, the value is 1. For some architectures, this macro is defined as 1 (with the architecture-dependantopal_sys_timer_is_monotonic
function) inopal/include/opal/sys/*/timer.h
. For other architectures, this macros is defined as 1 (with theopal_sys_timer_is_monotonic
function which returns the value ofOPAL_TIMER_MONOTONIC
) inopal/include/opal/sys/timer.h
.
These macros are currently used in opal/mca/timer/linux/
.
-
opal_sys_timer_get_cycles
This function returns a cycle count. The cycle count may not be the cycle count of the CPU itself, if there is another sufficiently close counter with better behavior characteristics (like the Time Base counter on many Power/PowerPC platforms). This function is defined only if the value of theOPAL_HAVE_SYS_TIMER_GET_CYCLES
macro is 1. -
opal_sys_timer_freq
This function returns the frequency of the cycle counter in use, NOT the frequency of the main CPU. This function is currently defined only forarm64
. -
opal_sys_timer_is_monotonic
This function returns whether theopal_sys_timer_get_cycles
function returns monotonic time. This function is always defined because the default function is defined inopal/include/opal/sys/timer.h
.
These functions are defined in opal/include/opal/sys/*/timer.h
if available.
These functions are currently used in opal/mca/timer/linux/
.
-
OPAL_TIMER_CYCLE_NATIVE
If theopal_timer_base_get_cycle
function is implemented directly using an architecture-dependent cycle counter or computed from some other data (such as a high-resolution timer), the value is 1. Otherwise, the value is 0. -
OPAL_TIMER_CYCLE_SUPPORTED
If theopal_timer_base_get_cycle
function is implemented for the OS, the value is 1. Otherwise, the value is 0. -
OPAL_TIMER_USEC_NATIVE
... -
OPAL_TIMER_USEC_SUPPORTED
If theopal_timer_base_get_usec
function is implemented for the OS, the value is 1. Otherwise, the value is 0.
These macros are defined in opal/mca/timer/*/timer_*.h
.
-
opal_timer_base_get_cycles
This function returns a cycle count. The cycle count may not be the cycle count of the CPU itself, if there is another sufficiently close counter with better behavior characteristics (like the Time Base counter on many Power/PowerPC platforms). -
opal_timer_base_get_usec
This function returns the current time in micro second. -
opal_timer_base_get_freq
This function returns the frequency of the cycle counter in use, NOT the frequency of the main CPU.
These functions are defined in opal/mca/timer/*/timer_*.h
(as inline) or opal/mca/timer/*/timer_*_component.c
(as non-inline) if available.
-
opal_timer_linux_get_cycles_clock_gettime
... -
opal_timer_linux_get_usec_clock_gettime
... -
opal_timer_linux_get_cycles_sys_timer
... -
opal_timer_linux_get_usec_sys_timer
...
-
mca_timer_base_monotonic
...
- Originally
MPI_WTIME
was implemented usinggettimeofday
. - In the commit ee75c45ec5, it was changed to use
opal_timer_base_get_usec
ifOPAL_TIMER_USEC_NATIVE
is 1. In this instance,OPAL_TIMER_USEC_NATIVE
for Linux was 0. - In the PR#285,
OPAL_TIMER_USEC_NATIVE
for Linux was changed toOPAL_HAVE_SYS_TIMER_GET_CYCLES
andMPI_WTIME
was changed to useopal_timer_base_get_cycles
ifOPAL_TIMER_CYCLE_NATIVE
is 1. By this commit,MPI_WTIME
was broken in the case that the CPU frequency changes during MPI program execution. - In the issue#3003, the problem was reported.
- In the PR#3184,
MPI_WTIME
was changed to usegettimeofday
as a workaround. - In the PR#3201,
MPI_WTIME
was changed to useclock_gettime
on Linux.