Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(newlib): usleep returning early (IDFGH-14342) #15132

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

stevenoonan
Copy link

@stevenoonan stevenoonan commented Jan 3, 2025

Description

The current implementation of usleep() can return in the less the specified time, which breaks assumptions callers of usleep() have. This proposed change ensures usleep() never returns in less than the specified time.

From man 3 usleep:

The usleep() function suspends execution of the calling thread for (at least) usec microseconds.

Returning before the specified time can cause great inefficiencies when using C++'s this::thread::sleep_for().

I stumbled upon this when I upgraded a project from IDF 4.4 to IDF 5.3.1 (GCC got updated to 13.2). After updating, my project was running sluggish. I profiled the code and found a thread was using way more processing time than before. It was a simple loop with a this:thread::sleep_for(10ms) call.

Specifically, the changes of this commit gcc-mirror/gcc@cfef4c3 in libstdc++'s __sleep_for() is what causes behavior to change. But the problem is in usleep(). In the older version of libstdc++ __sleep_for(), usleep() is always called once. The newer version added a while loop and a check to the monotonic clock to verify the sleep did not come back early, and if it does, it keeps calling usleep().

The only documented reason a sleep should return early is due to a signal, but of course that is not the case on the ESP32.

The reason usleep() returns early is because it does not compensate for a worst case systick ISR timing. Let's play out an example. If we wanted to sleep for 15ms where the tick period is 10ms, the calculations would give us vTaskDelay(2):

  1. portTICK_PERIOD_MS is defined as 10. So us_per_tick is 10,000.
  2. us is 15000
  3. (15000 + 10000 - 1) / 10000) is 2.4999, so 2 ticks

Two ticks has a minimum of 10ms, because the first tick ISR can happen immediately, so you can only count on n-1 tick periods of time. So of course 10ms is less than 15ms. The current (us + us_per_tick - 1) / us_per_tick compensation attempt is broken. If you do the math for a 10ms wait you'll get 1 tick, which has a minimum of 0 tick periods. But even if it does get half (or 90%) of a period of time slept, we still haven't slept the minimum time we should have.

This means usleep() will return back to libstdc++ __sleep_for(), the monotonic clock check fails, and usleep() is now called again, with the remainder of time. This will always been less than a tick period, so esp_rom_delay_us() is now called. I don't have the source code for esp_rom_delay_us() but I assume it is a blocking busy wait.

Any time spent in esp_rom_delay_us() is quite inefficient for scheduling. If the thread blocking here is the highest priority thread on the system, it will prevent FreeRTOS from time slicing to any other ready state thread during that time. That was exactly what was happening in my project.

My fix is to double check the monotonic clock in usleep(), and it will vTaskDelay() an additional time when needed. This guarantees usleep() will never return early, but sometimes later. Which is to spec, and also, it means libstdc++ won't call usleep() in a way that will cause esp_rom_delay_us() to be called unexpectedly.

For 15ms sleep with a 10ms systick period, the absolute minimal tick count is 2. My change does a simple division of equivalent of (15000/10000)+1 which gives 2.5 or int 2. In the worst case, another tick is needed because as stated above, a tick count of 2 means a worst case of only 10ms waited, so the loop will check for that and call vTaskDelay() again.

Related

Testing

On my own project, as stated above.

@CLAassistant
Copy link

CLAassistant commented Jan 3, 2025

CLA assistant check
All committers have signed the CLA.

Copy link

github-actions bot commented Jan 3, 2025

Messages
📖 🎉 Good Job! All checks are passing!

👋 Hello stevenoonan, we appreciate your contribution to this project!


📘 Please review the project's Contributions Guide for key guidelines on code, documentation, testing, and more.

🖊️ Please also make sure you have read and signed the Contributor License Agreement for this project.

Click to see more instructions ...


This automated output is generated by the PR linter DangerJS, which checks if your Pull Request meets the project's requirements and helps you fix potential issues.

DangerJS is triggered with each push event to a Pull Request and modify the contents of this comment.

Please consider the following:
- Danger mainly focuses on the PR structure and formatting and can't understand the meaning behind your code or changes.
- Danger is not a substitute for human code reviews; it's still important to request a code review from your colleagues.
- To manually retry these Danger checks, please navigate to the Actions tab and re-run last Danger workflow.

Review and merge process you can expect ...


We do welcome contributions in the form of bug reports, feature requests and pull requests via this public GitHub repository.

This GitHub project is public mirror of our internal git repository

1. An internal issue has been created for the PR, we assign it to the relevant engineer.
2. They review the PR and either approve it or ask you for changes or clarifications.
3. Once the GitHub PR is approved, we synchronize it into our internal git repository.
4. In the internal git repository we do the final review, collect approvals from core owners and make sure all the automated tests are passing.
- At this point we may do some adjustments to the proposed change, or extend it by adding tests or documentation.
5. If the change is approved and passes the tests it is merged into the default branch.
5. On next sync from the internal git repository merged change will appear in this public GitHub repository.

Generated by 🚫 dangerJS against c060a85

@espressif-bot espressif-bot added the Status: Opened Issue is new label Jan 3, 2025
@github-actions github-actions bot changed the title fix(newlib): usleep returning early fix(newlib): usleep returning early (IDFGH-14342) Jan 3, 2025
@safocl
Copy link

safocl commented Jan 4, 2025

Aren't these changes the same as the described changes to C++'s this::thread::sleep_for() ?

@stevenoonan
Copy link
Author

stevenoonan commented Jan 4, 2025

Aren't these changes the same as the described changes to C++'s this::thread::sleep_for() ?

The change I'm proposing with this PR is the same method this::thread::sleep_for() (__sleep_for()) is already using. But this is the best option we have, I believe.

Let me play out some examples to clarify what is going on.

With a system tick period of 10ms, calling this::thread::sleep_for(10ms) is causes to a single call to usleep(10000), which calls vTaskDelay(1) in IDF 4. The app thread will yield up until the next system tick. Therefore the thread will sleep will be between 0 and 10ms.

In IDF 5, calling this::thread::sleep_for(10ms) (almost always) causes two calls to usleep(). First will be usleep(10000), which calls vTaskDelay(1). vTaskDelay(1) will sleep between 0 and 10ms. The newer __sleep_for() function will almost always determine that less than 10ms was actually slept, so it will call usleep() with the remainder of time. This remainder is always less than the system tick period. This means esp_rom_delay_us() will be called, for up to 99,000+ microseconds. That will prevent any thread of the same priority to be switched to during that time, which I think is a huge problem for anyone using C++ with IDF, and definitely was the problem in my project when I upgraded to IDF5.

So what options do we have? If we make usleep() guarantee it will always sleep at least the time specified, __sleep_for() will not call usleep() a second time, and esp_rom_delay_us() will not be called. I suppose we could make a custom build of libstdc++ to modify __sleep_for() but I doubt that would be viewed as worth it.

So if we must modify usleep(), what is the best way to do it? These are the options I came up with:

  1. Don't call esp_rom_delay_us() in usleep() for lengths less than a system tick period.
  2. Always sleep the extra tick necessary to ensure the minimum time is slept.
  3. Check an independent clock to see if an extra is needed, and only sleep the extra tick if needed.
  4. Use a hardware timer ISR to sleep the precise amount of time.
  5. Implement a portable API to check when the next system tick ISR will happen.

Option (1) is a non-starter for me. We need a supported way of sleeping accurately for small time periods. Plus, it would break a lot of code currently in the field. Option (2) works, but will often sleep longer than we needed to. With a 10ms system tick, a 15ms sleep sometimes only needs 2 ticks, sometimes three. So why not use option (3) and just check to see if we really need the extra tick?

Option (4) is a non-starter because we don't have an arbitrary number hardware timers, but we do have an arbitrary number of threads that might sleep.

Option (5) is interesting, and may be possible. However it requires an API to be developed that will read the hardware timer registers being employed for the system tick, which I am sure is different across all the different architectures supported. If we know exactly when the next system tick is about to occur, we can correctly calculate how many ticks we need to sleep without subsequently double checking the monotonic clock. This would be awesome, but a lot more work. It might not even be possible for some reason I am unaware of.

The fact that is the same approach used in __sleep_for() is what it is. I don't see a better option. And even if we were using the old version of IDF when the monotonic clock is not checked in __sleep_for(), we still have the reality that calling std::this_thread::sleep_for(10ms) in IDF4 does not follow spec:

Blocks the execution of the current thread for at least the specified sleep_duration.

As I already mentioned, calling std::this_thread::sleep_for(10ms) in IDF4 will almost always sleep less than 10ms. It was always broken, because usleep() was and still is not following spec. In IDF5 std::this_thread::sleep_for(10ms) does sleep exactly 10ms, at the cost of a lot of wasted cycles.

EDIT:

One additional caveat/issue which is not fixed by this PR is calling std::this_thread::sleep_for() for a period less than a system tick. In that case esp_rom_delay_us() will be called for the full duration. This is not what would be naively expected. It would be much more friendly to the C++ app to employ vTaskDelay() to allow a context switch. No one really wants to call esp_rom_delay_us(9000) when std::this_thread::sleep_for(9ms) is called. Perhaps a menuconfig option that would specify the threshold to stop using esp_rom_delay_us() and start using vTaskDelay()? Realistically, I would would think anything beyond 1 millisecond should use vTaskDelay().

@safocl
Copy link

safocl commented Jan 4, 2025

https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/system/pthread.html?highlight=usleep#rtos-integration

When calling a standard libc or C++ sleep function, such as usleep defined in unistd.h, the task will only block and yield the core if the sleep time is longer than one FreeRTOS tick period. If the time is shorter, the thread will busy-wait instead of yielding to another RTOS task.

behavior varies for different sleep durations

@stevenoonan
Copy link
Author

stevenoonan commented Jan 4, 2025

https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/system/pthread.html?highlight=usleep#rtos-integration

When calling a standard libc or C++ sleep function, such as usleep defined in unistd.h, the task will only block and yield the core if the sleep time is longer than one FreeRTOS tick period. If the time is shorter, the thread will busy-wait instead of yielding to another RTOS task.

behavior varies for different sleep durations

Thanks, that covers my addendum edit. The original issues remain where calling sleep_for() with a time larger than the system tick will cause a portion of the time to block wait in IDF5, and usleep() almost always sleeping for less than the specified time when given a time more than a tick period.

@safocl
Copy link

safocl commented Jan 4, 2025

@stevenoonan but the guaranty is not broken with std::this_thread::sleep_for()...

@stevenoonan
Copy link
Author

@stevenoonan but the guaranty is not broken with std::this_thread::sleep_for()...

@safocl The main point is that the guaranty of usleep() is broken. This results in unnecessary blocking whenever std::this_thread::sleep_for() is called with a time more than a tick period in IDF5. With a 10ms tick period, calling std::this_thread::sleep_for(15ms) can result in 5 milliseconds of blocking busy-wait (with the other 10ms yielded). There is no good reason for this blocking to occur - it appears to be an unintended consequence of the interaction between the newer __sleep_for() and usleep() returning before the specified time. It can cause a huge amount of wasted cycles, and that is the problem. This PR ensures all 15ms will be yielded. It also brings usleep() into conformance with the spec.

@stevenoonan stevenoonan force-pushed the usleep-returns-too-fast-fix branch from 416f11d to c060a85 Compare January 20, 2025 21:15
@stevenoonan
Copy link
Author

c060a85 Improves the implementation and comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Opened Issue is new
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants