Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iox-#2330 add notify systemd #2334

Open
wants to merge 55 commits into
base: main
Choose a base branch
from

Conversation

khromenokroman
Copy link
Contributor

@khromenokroman khromenokroman commented Aug 22, 2024

Notes for Reviewer

Pre-Review Checklist for the PR Author

  1. Code follows the coding style of CONTRIBUTING.md
  2. Tests follow the best practice for testing
  3. Changelog updated in the unreleased section including API breaking changes
  4. Branch follows the naming format (iox-123-this-is-a-branch)
  5. Commits messages are according to this guideline
  6. Update the PR title
    • Follow the same conventions as for commit messages
    • Link to the relevant issue
  7. Relevant issues are linked
  8. Add sensible notes for the reviewer
  9. All checks have passed (except task-list-completed)
  10. Assign PR to reviewer

Checklist for the PR Reviewer

  • Consider a second reviewer for complex new features or larger refactorings
  • Commits are properly organized and messages are according to the guideline
  • Code according to our coding style and naming conventions
  • Unit tests have been written for new behavior
  • Public API changes are documented via doxygen
  • Copyright owner are updated in the changed files
  • All touched (C/C++) source code files from iceoryx_hoofs have been added to ./clang-tidy-diff-scans.txt
  • PR title describes the changes

Post-review Checklist for the PR Author

  1. All open points are addressed and tracked via issues

References

  • Closes TBD

@elBoberido sorry if it turned out very badly, I still don't quite understand how your application works, but there is a suggestion to add a check, if the application was launched without using systemd, then everything will remain that way if not, then I will send a ready signal when I am ready to receive clients

this is a continuation based on your answer #2330

are you interested in this functionality? I would really like it to be added, I need it :)

@elBoberido
Copy link
Member

@khromenokroman yes, the functionality would be appreciated.

Can you add a cmake switch which is off by default to not add a hard dependency to systemd? You can check how it is done with IOX_EXPERIMENTAL_POSH_FLAG and replicate it.

Furthermore, if you install the git commit hooks, it takes care of setting the issue number for commits.
https://github.com/eclipse-iceoryx/iceoryx/blob/main/tools/git-hooks/Readme.md
Currently there are commits without the issue number.

Another issue will be to integrate the systemd support so that it does not affect the build of our platforms. In general we try to abstract this away and then have a specific implementations for platforms which do not support a specific feature. This is done for the IpcChannel which is using NamedPipe on Windows and UnixDomainSocket on the other platforms. You can have a look at iceoryx_posh/include/iceoryx_posh/internal/runtime/ipc_interface_base.hpp. Ideally this would not be done with and ifdef but in a similar manner like in iceoryx_platform but we did not yet implement a similar structure for iceoryx_posh yet.

So there is quite a lot to do in order to get this upstream :)

This change updates the GitHub Action to include the libsystemd-dev package in the list of dependencies. Adding this package ensures that all necessary development libraries are available for builds.
Introduce a configurable option to enable systemd support in the build process. This helps in integrating systemd features if needed while keeping it optional by default.
@@ -207,7 +207,7 @@ Macro(iox_add_executable)
if ( QNX )
target_link_libraries(${IOX_TARGET} ${IOX_LIBS_QNX})
elseif ( LINUX )
target_link_libraries(${IOX_TARGET} ${IOX_LIBS_LINUX})
target_link_libraries(${IOX_TARGET} ${IOX_LIBS_LINUX} systemd)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add this only to the iox-roudi target

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elBoberido fixed

IOX_LOG(INFO, "Start watchdog");
while (m_runHandleRuntimeMessageThread.load())
{
if (auto wdres = sd_notify(0, "WATCHDOG=1") < 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the IOX_POSIX_CALL. It takes care of the error handling

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the IOX_POSIX_CALL. It takes care of the error handling

@elBoberido fixed

@@ -129,6 +132,7 @@ class RouDi
private:
std::thread m_monitoringAndDiscoveryThread;
std::thread m_handleRuntimeMessageThread;
std::thread listen_thread_watchdog; // 8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please prefix members with m_

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elBoberido I'm sorry for my mistake, it's been fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No Problem. But do not use ifdefs in the code. Please write an abstraction, similar to what we did IpcChannelType and use that abstraction. For platforms which do not support systemd, write an alternative which basically does nothing when the methods are called. Having ifdefs all over the place makes it hard to maintain and therefore we choose the other approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elBoberido I'm not sure if I understood you correctly, please look at the roudi.hpp file did you talk about this? The announcement and the definition will share this is a test for now, thank you :)

* We get information about how they are running. If as a unit, then we launch
* watchdog and send a notification about the launch, otherwise we do nothing
*/
const char* invocation_id = std::getenv("INVOCATION_ID");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use iox_getenv_s

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use iox_getenv_s

@elBoberido fixed

Comment on lines 297 to 302
if (pthread_setname_np(listen_thread_watchdog.native_handle(), "watchdog") != 0)
{
std::array<char, SIZE_ERROR_MESSAGE> buf{};
strerror_r(errno, buf.data(), buf.size());
IOX_LOG(ERROR, "Can not set name for thread watchdog: " << std::string(buf.data()));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use setThreadName

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use setThreadName

@elBoberido fixed

IOX_LOG(ERROR, "WatchDogError: " << std::string(buf.data()));
return;
}
std::this_thread::sleep_for(std::chrono::seconds(1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be replaced with a blocking timed_wait on a semaphore or something else which can be interupted, else the test time will be massively increased.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elBoberido I'm not very good at multithreaded programming, but in my opinion it's not a very elegant solution :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elBoberido fixed

Introduce preprocessor directives to conditionally enable systemd-related code and linking. This change allows for builds without systemd on Linux by checking for the USE_SYSTEMD flag, offering greater flexibility in different environments.
…read_watchdog

Standardize the naming convention for the listen thread watchdog variable to match the member variable naming pattern. This change ensures consistency and improves code readability.
Refactored thread naming and error handling for the watchdog thread. Replaced raw `sd_notify` usage with `IOX_POSIX_CALL` to better handle errors. Enhanced log messages for clearer error reporting.
In the `roudi.cpp` file, added early returns to handle errors more effectively when setting the thread watchdog name and sending READY=1 or WATCHDOG=1 signals. This ensures the system will not proceed if critical setup steps fail, enhancing reliability and maintainability.
…INVOCATION_ID'

Replaced getenv with iox_getenv_s to fetch 'INVOCATION_ID' securely and handle errors. Added necessary includes and modified related variable and constant for better clarity.
…h_roudi

Update the CMake script to enable systemd support only for iceoryx_posh_roudi. This ensures that systemd is correctly utilized where needed and prevents potential misconfigurations for other targets.
…tion

The 'libsystemd-dev' package was removed from the install script in the 'install-iceoryx-deps-and-clang' GitHub action. This change reduces the set of dependencies to only those required for successful execution.
…lass

Extracted systemd-related functionality into a new Systemd_service_handler class. This refactor improves modularity and simplifies the main RouDi code by encapsulating systemd-specific logic, making it easier to maintain and extend in the future.
Implemented a shutdown method in Systemd_service_handler to allow graceful termination of the watchdog thread. This enhancement ensures proper resource cleanup and improves system robustness by avoiding potential hang-ups during the shutdown process.
…mpilation

Moved the Systemd include directive under a conditional compilation block using the USE_SYSTEMD flag. This change ensures that the dependency on Systemd is only included when necessary, reducing potential compilation issues and improving portability.
…nsistency

Renamed class names and method names to follow camel case conventions, improving code readability and consistency. Updated related variables and comments to match the new naming scheme.
@khromenokroman khromenokroman changed the title Iox #2330 add notify systemd Iox-#2330 add notify systemd Aug 24, 2024
@khromenokroman khromenokroman changed the title Iox-#2330 add notify systemd iox-#2330 add notify systemd Aug 24, 2024
Updated `roudi_systemd` to `roudiSystemd` and `process_notify` to `processNotify` for consistency with naming conventions. This enhances code readability and maintains consistent style throughout the file.
…guide

Include an example unit file for enabling systemd support with the `-DUSE_SYSTEMD=ON` build option. This addition clarifies how to configure and manage the service using systemd.
Introduce std::condition_variable and std::mutex to replace sleep logic in the watchdog loop. This change aims to improve resource management and responsiveness to shutdown signals, making the watchdog more efficient.
Extract environment variable retrieval logic into a static method. This refactoring improves code readability and reuse, ensuring cleaner and more maintainable code.
…adability

Improve SystemdServiceHandler readability by adding detailed comments and helper functions. This refactor helps in managing the watchdog loop and sending notifications more clearly, ensuring proper thread name setting and signal handling.
…re handling

Enhanced the build_systemd.yaml to include steps for better error handling and visibility. Added a 30-second wait and refined the logic to check the status of the roudi service, displaying journal logs if it fails to start. This improves debugging capability in the CI pipeline.
@khromenokroman
Copy link
Contributor Author

Another issue is testing. We need to ensure that the suystemd support does not break. This could be done by having a CI target with systemd and maybe also an integration test. I'm not sure how difficult it would be to create such an integration test, though.

@elBoberido I did a test in ci, but I can't figure out where I need to add it to make it work for you, I added it here build-test.yml but your ci didn't run my ci :( , I've temporarily done it here build_systemd.yaml

https://github.com/khromenokroman/iceoryx/actions/runs/10604085974/job/29389869104

Introduced a new job in build-test.yml to build and run roudi with systemd on Ubuntu. This includes error handling steps such as service status checks, logging, and a wait period to enhance reliability and debugging for the CI pipeline.
@elBoberido
Copy link
Member

@khromenokroman for new contributors, the github CI needs to be triggered by a committer for their first PR. This is the policy for Eclipse projects so we cannot really do something here, except if you create a small PR which is merged first. Then the github CI would also be triggered with each push.

If you like, there is #2209. It should be done if a few minutes but the creator of the issue did not respond anymore. You could create a small PR and move using iox_ssize_t = int; from iceoryx_platform/win/include/iceoryx_platform/unistd.hpp to iceoryx_platform/win/include/iceoryx_platform/types.hpp and replace the using ssize_t = size_t; from the types.hpp. There are then a few other file in iceoryx_platform/win where ssize_t needs to be replaced with iox_ssize_t but that's minimal. What do you think? It would make the handling of this PR simpler for you.

…ace typo

Deleted the build_systemd.yaml GitHub Actions workflow file. Corrected a namespace typo from `systemd` to `service_management` in the roudi.hpp file.
@khromenokroman
Copy link
Contributor Author

for new contributors, the github CI needs to be triggered by a committer for their first PR. This is the policy for Eclipse projects so we cannot really do something here, except if you create a small PR which is merged first. Then the github CI would also be triggered with each push.

@elBoberido no problem, add ci in build-test.yml

@khromenokroman
Copy link
Contributor Author

If you like, there is #2209. It should be done if a few minutes but the creator of the issue did not respond anymore. You could create a small PR and move using iox_ssize_t = int; from iceoryx_platform/win/include/iceoryx_platform/unistd.hpp to iceoryx_platform/win/include/iceoryx_platform/types.hpp and replace the using ssize_t = size_t; from the types.hpp. There are then a few other file in iceoryx_platform/win where ssize_t needs to be replaced with iox_ssize_t but that's minimal. What do you think? It would make the handling of this PR simpler for you.

@elBoberido I'm still new to your project, and I didn't understand anything :) let's do this PR, and I'll try to change what you want with another PR, or do you need to do it in this PR?

Cast `info.total_size` to unsigned int in the `iox_getenv_s` call to ensure type matching and eliminate potential issues with type mismatches. This change improves code robustness and ensures adherence to function signature requirements.
…y (rev.2)

Cast `info.total_size` to unsigned int in the `iox_getenv_s` call to ensure type matching and eliminate potential issues with type mismatches. This change improves code robustness and ensures adherence to function signature requirements.
@khromenokroman
Copy link
Contributor Author

khromenokroman commented Aug 30, 2024

@elBoberido I think that's the problem here
int iox_getenv_s(size_t* actual_size_with_null, char* buffer, size_t buffer_capacity, const char* name)
size_t buffer_capacity but

struct BufferInfo
{
    uint64_t used_size{0};
    uint64_t total_size{0};
};

uint64_t total_size{0}; ??

in build 32bit ubuntu, get error:

/home/runner/work/iceoryx/iceoryx/iceoryx_posh/source/roudi/roudi.cpp:608:26:   required from here
/home/runner/work/iceoryx/iceoryx/iceoryx_posh/source/roudi/roudi.cpp:610:86: error: conversion from ‘uint64_t’ {aka ‘long long unsigned int’} to ‘unsigned int’ may change value [-Werror=conversion]
  610 |         auto result = IOX_POSIX_CALL(iox_getenv_s)(&actualSizeWithNull, buffer, info.total_size, env_var)
      |                                                                                 ~~~~~^~~~~~~~~~
[ 34%] Building CXX object hoofs/test/CMakeFiles/hoofs_moduletests.dir/moduletests/test_container_uninitialized_array.cpp.o
cc1plus: all warnings being treated as errors
gmake[2]: *** [posh/CMakeFiles/iceoryx_posh_roudi.dir/build.make:258: posh/CMakeFiles/iceoryx_posh_roudi.dir/source/roudi/roudi.cpp.o] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:1300: posh/CMakeFiles/iceoryx_posh_roudi.dir/all] Error 2
gmake[1]: *** Waiting for unfinished jobs....

@khromenokroman
Copy link
Contributor Author

@elBoberido Is everything okay, are we doing a merge request?

@khromenokroman
Copy link
Contributor Author

@elBoberido added some tests

Copy link
Member

@elBoberido elBoberido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not yet look at the test but there are already a few changes you could implement.

Oh, is it possible to squash some of the commits?

Comment on lines 214 to 215
- name: Update
run: sudo apt update
- name: Install depends
run: sudo apt install -y gcc g++ cmake libacl1-dev libncurses5-dev pkg-config libsystemd-dev
- name: Checkout
uses: actions/checkout@v4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use this like in other tasks

      - name: Install iceoryx dependencies and clang-tidy
        uses: ./.github/actions/install-iceoryx-deps-and-clang

Only libsystemd-dev would be installed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elBoberido fixed

Comment on lines 220 to 225
- name: Cmake cache
run: cmake -Bbuild -Hiceoryx_meta -DBUILD_SHARED_LIBS=ON -DUSE_SYSTEMD=ON -DBUILD_TEST=ON
- name: build
run: cmake --build build -j 16
- name: Install
run: sudo cmake --build build --target install
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please extend the tools/iceoryx_build_test.sh script and call it here with the appropriate arguments, e.g.

./tools/iceoryx_build_test.sh build-strict build-all out-of-tree build-shared test-add-user use-systemd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elBoberido fixed

Comment on lines 226 to 250
- name: Ldconfig run
run: sudo ldconfig
- name: Create unit file
run: echo -e "[Unit]\nDescription=Test application roudi\n\n[Service]\nType=notify\nRestartSec=10\nRestart=always\nExecStart=/usr/local/bin/iox-roudi\nTimeoutStartSec=10\nWatchdogSec=5\n\n[Install]\nWantedBy=multi-user.target" | sudo tee /usr/lib/systemd/system/test_iox.service > /dev/null
- name: Show unit
run: cat /usr/lib/systemd/system/test_iox.service
- name: Daemon reload
run: sudo systemctl daemon-reload
- name: Check status
run: sudo systemctl status test_iox || true
- name: Start roudi
run: |
sudo systemctl start test_iox || (echo "Failed to start service"; sudo journalctl -u test_iox -n 50; exit 1)
- name: Wait for 30 seconds
run: sleep 30
- name: Check roudi
run: sudo systemctl status test_iox || (echo "Failed to start service"; sudo journalctl -u test_iox -n 50; exit 1)
- name: Stop roudi
run: sudo systemctl stop test_iox
- name: Show journal
run: sudo journalctl -u test_iox -n 100
- name: Start test (integration) posh with systemd
run: cd build/posh/test && ./posh_integrationtests
- name: Start test (module) posh with systemd
run: cd build/posh/test && ./posh_moduletests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this to a bash script in tools/ci so that it can easily be tested locally. The tools/iceoryx_build_test.sh call and installation of libsystemd-dev could also be moved there. The script could be named build-test-ubuntu-systemd.sh

Please also add this to the beginning of the script

if [ "$USER" != "root" ]; then
    echo "Please run this as root or with sudo"
    exit 1
fi

and remove the sudo calls so that developer are aware that their system will be altered when they run the script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elBoberido fixed

Comment on lines 348 to 354
if(USE_SYSTEMD AND ("${IOX_TARGET}" STREQUAL "iceoryx_posh_roudi"))
message(STATUS "[i] Configuring ${IOX_TARGET} with systemd support.")
target_compile_definitions(${IOX_TARGET} PRIVATE USE_SYSTEMD=1)
target_link_libraries(${IOX_TARGET} PUBLIC ${IOX_PUBLIC_LIBS_LINUX} PRIVATE ${IOX_PRIVATE_LIBS_LINUX} systemd)
else()
target_link_libraries(${IOX_TARGET} PUBLIC ${IOX_PUBLIC_LIBS_LINUX} PRIVATE ${IOX_PRIVATE_LIBS_LINUX})
endif()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the wrong place. It needs to be done in iceoryx_posh/CMakeLists.txt. If possible added to PRIVATE_LIBS_LINUX with a cmake generator expression. If that is too cumbersome, it can be done with target_link_libraries(iceoryx_posh_roudi ...). I'm also wondering why target_link_libraries is called here, since it does not add systemd or something similar.

Furthermore, instead of the define, please have a look how it is done with IOX_EXPERIMENTAL_POSH_FLAG and replicate that. Else there might be issues when the roudi lib is used in out-of-tree builds in other binaries.

You also do not need to print the status since it should already be done via build_options.cmake

Comment on lines +240 to +243
if(USE_SYSTEMD AND ("${IOX_TARGET}" STREQUAL "posh_moduletests"))
target_compile_definitions(${IOX_TARGET} PRIVATE USE_SYSTEMD_TEST=1)
message(STATUS "[i] Configuring ${IOX_TARGET} with systemd support.")
endif ()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Please look how it is done with IOX_EXPERIMENTAL_POSH_FLAG

Comment on lines +257 to +258
SendMessageServiceManagement roudiSendMessage;
roudiSendMessage.processNotify();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since processNotify needs to be called anyway, it should be done in the constructor.

m_shutdown.store(true);
}

std::string ServiceManagementSystemd::getEnvironmentVariable(const char* const env_var)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used for anything else than checking whether the INVOCATION_ID is set?

If not, then this should not be a virtual method but just an implementation detail of ServiceManagementSystemd and be more in line with e.g.

bool runUnderServiceManagement();

Comment on lines +633 to +659
bool ServiceManagementSystemd::setThreadNameHelper(iox::string<SIZE_THREAD_NAME>& threadName)
{
bool status_change_name = iox::setThreadName(threadName);
if (!status_change_name)
{
IOX_LOG(ERROR, "Cannot set name for thread " << threadName);
return false;
}
return true;
}

void ServiceManagementSystemd::watchdogLoopHelper()
{
IOX_LOG(INFO, "Start watchdog");
while (!m_shutdown.load())
{
std::unique_lock<std::mutex> lock(watchdogMutex);
if (watchdogNotifyCondition.wait_for(lock, std::chrono::seconds(1), [this] { return m_shutdown.load(); }))
{
break;
}
if (!sendSDNotifySignalHelper("WATCHDOG=1"))
{
return;
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this and also the shutdown, you could use a nice abstraction and do not need to take care of all the thread handling.

You can add this to the base class

PeriodicTask m_watchdog{ PeriodicTaskManualStart, "SvcMgmtWatchdog", *this, &ServiceManagement::notifyWatchdog};

In the ServiceManagement constructor you can than just do a

if (runUnderServiceManagement) {
    notifyProcessReady();
    m_watchdog.start(units::Duration::from_secs(1)); // or just m_watchdog.start(1_s); if the string literals are included
}

The task is automatically stopped when the dtor is called. You basically do not have to take care or anything else.

}
}

void ServiceManagementSystemd::processNotify()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also just nitpicking. For someone not that familiar with systemd, having this method called notifyProcessReady gives a better understanding what it does.

Comment on lines +667 to +676
IOX_LOG(WARN, "Run application in system management");

m_listenThreadWatchdog = std::thread([this] {
iox::string<SIZE_THREAD_NAME> nameThread = "Watchdog";
if (!setThreadNameHelper(nameThread) || !sendSDNotifySignalHelper("READY=1"))
{
return;
}

IOX_LOG(DEBUG, "Notify READY=1 successful");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the proposals from above, this function would be reduced to just these lines and the the PeriodicTask would take care of the rest.

@khromenokroman khromenokroman force-pushed the iox-#2330-add_notify_systemd branch 10 times, most recently from 510c6c4 to c9236aa Compare September 5, 2024 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants