Skip to content

Conversation

VannTen
Copy link
Contributor

@VannTen VannTen commented Jul 21, 2025

What type of PR is this?
/kind flake

What this PR does / why we need it:
If we only try the port, we can try to connect in the playbook which is
executed next even though the managed node has not yet completed it's
boot-up sequence ("System is booting up. Unprivileged users are not
permitted to log in yet. Please come back later. For technical details,
see pam_nologin(8).")

wait_for_connection works, but we need to take care to exclude errors
which are not 'unreachable' (in particular, python-less hosts would
failed that task, but only after ssh succeded).

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:
See and example of the problem : https://gitlab.com/kargo-ci/kubernetes-sigs-kubespray/-/jobs/10754918165 (look for 'System is booting up')

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/flake Categorizes issue or PR as related to a flaky test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 21, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: VannTen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 21, 2025
@VannTen
Copy link
Contributor Author

VannTen commented Jul 21, 2025

/retest

@VannTen VannTen force-pushed the ci/fix_ssh_flakey branch 2 times, most recently from 8eaf3e9 to 11b32d1 Compare July 21, 2025 13:39
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 21, 2025
@VannTen VannTen marked this pull request as draft July 21, 2025 13:39
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 21, 2025
@tico88612
Copy link
Member

If this worked, #12359 can fixed

@VannTen
Copy link
Contributor Author

VannTen commented Aug 29, 2025

This does not work yet, because for some reason wait_for_connection use a local connection

PLAY [Wait until SSH is available] *********************************************
Friday 25 July 2025  08:56:28 +0000 (0:00:00.024)       0:00:16.860 *********** 
[WARNING]: Reset is not implemented for this connection
wait_for_connection: attempting ping module test
<10.11.207.170> ESTABLISH LOCAL CONNECTION FOR USER: root
<10.11.207.170> EXEC /bin/sh -c 'echo ~root && sleep 0'
<10.11.207.170> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /root/.ansible/tmp `"&& mkdir "` echo /root/.ansible/tmp/ansible-tmp-1753433788.737992-180-50952141315317 `" && echo ansible-tmp-1753433788.737992-180-50952141315317="` echo /root/.ansible/tmp/ansible-tmp-1753433788.737992-180-50952141315317 `" ) && sleep 0'
wait_for_connection: attempting ping module test
<10.11.18.230> ESTABLISH LOCAL CONNECTION FOR USER: root

If we only try the port, we can try to connect in the playbook which is
executed next even though the managed node has not yet completed it's
boot-up sequence ("System is booting up. Unprivileged users are not
permitted to log in yet. Please come back later. For technical details,
see pam_nologin(8).")

wait_for_connection works, but we need to take care to exclude errors
which are not 'unreachable' (in particular, python-less hosts would
failed that task, but only after ssh succeded).
@VannTen VannTen force-pushed the ci/fix_ssh_flakey branch from e9b6bcb to a228ffa Compare August 29, 2025 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/flake Categorizes issue or PR as related to a flaky test. release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants