Cannot use Kubernetes healthcheck probes without certain executables (tcpSocket/nc, httpGet/curl) inside the container

### Issue Description

I'm moving some servers/services to `podman kube play` and ran in to a problem. Several (not all) servers died after a few minutes, seemingly consistent with configured probe limits, despite it being clear that the services were actually reachable and usable from clients. Disabling the health checks also meant that the service would stay up. After some digging I found the issue.

Container health check probes (`startupProbe`, `readinessProbe`, `livenessProbe`) with checks of kind `tcpSocket` or `httpGet` are effectively equivalent to `exec` checks. This is because they get [converted to `exec` commands by `podman` in `kube.go`](https://github.com/containers/podman/blob/v4.5.0/pkg/specgen/generate/kube/kube.go#L566-L599).

The `exec` conversion means executing `nc` to check for open TCP ports or `curl` to `GET` an HTTP URL, from _inside the container_. Containers which only have the bare minimum of software installed (as is best practice) may not have these "external dependencies", in which case the probes will _always fail_.

It is my understanding that both `tcpSocket` and `httpGet` should probe from within the _pod_, but not from within the particular _container_ it probes. This places the `nc`/`curl` (or equivalent) dependency requirements on the pod manager.

Should these TCP/HTTP probe connection attempts be implemented in `podman` instead?

Idea: probe dependencies do not have to be direct dependencies of `podman`. Podman may use minimal "probe images", and delegate checks to ephemeral health check containers. This may increase flexibility and potentially allow for broader probe kind support.

### Steps to reproduce the issue

1. Start a well-configured server/service in a pod using `podman kube play`, where at least one container has well-configured health checks of kinds `tcpSocket` or `httpGet`.
2. Monitor the pod to see if the container gets to the `healthy` state.
3. If it _does not_ reach the `healthy` state, inspect if the container/image has `nc`/`curl` (with sufficient feature support) installed in the `$PATH`.

### Describe the results you received

Health check results depend on not only on the containerized server/service itself, but also on other software included in the container/image.

### Describe the results you expected

I was under the impression that "outside" health checks, such as `tcpSocket` and `httpGet`, should not rely on health check software (which is not usually a part of the actual server/service software) within the container itself.

### podman info output

```yaml
host:
  arch: amd64
  buildahVersion: 1.30.0
  cgroupControllers:
  - cpu
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon_2:2.1.7-0debian12+obs15.22_amd64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.7, commit: '
  cpuUtilization:
    idlePercent: 97.66
    systemPercent: 0.92
    userPercent: 1.43
  cpus: 1
  databaseBackend: boltdb
  distribution:
    codename: bookworm
    distribution: debian
    version: "12"
  eventLogger: journald
  hostname: server
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 6.1.0-7-amd64
  linkmode: dynamic
  logDriver: journald
  memFree: 97869824
  memTotal: 1004994560
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun_101:1.8.4-0debian12+obs55.7_amd64
    path: /usr/bin/crun
    version: |-
      crun version 1.8.4
      commit: 5a8fa99a5e41facba2eda4af12fa26313918805b
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns_1.2.0-1_amd64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.4
  swapFree: 3581431808
  swapTotal: 3779063808
  uptime: 116h 22m 6.00s (Approximately 4.83 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /home/username/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: btrfs
  graphOptions: {}
  graphRoot: /home/username/.local/share/containers/storage
  graphRootAllocated: 31138512896
  graphRootUsed: 6853885952
  graphStatus:
    Build Version: Btrfs v6.2
    Library Version: "102"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 0
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /home/username/.local/share/containers/storage/volumes
version:
  APIVersion: 4.5.0
  Built: 0
  BuiltTime: Thu Jan  1 00:00:00 1970
  GitCommit: ""
  GoVersion: go1.19.8
  Os: linux
  OsArch: linux/amd64
  Version: 4.5.0
```

### Podman in a container

No

### Privileged Or Rootless

Rootless

### Upstream Latest Release

Yes

### Additional environment details

Tested on:

- Debian Testing (bookworm), running in a Proxmox VPS.
- Ubuntu 22.10, on bare metal.

### Additional information

## Test cases

The [Kubernetes documentation provides probe examples](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/) (referenced in `kube.go`) which can be executed directly with `podman kube play`. While monitoring `podman` container statuses, `kube play` each yaml file for at least a minute before taking it `--down`.

### exec-liveness.yaml

The `exec` probe works as expected, entering the `healthy` state immediately and later restarting when the health check deliberately fails. Failure command output `cat: can't open '/tmp/healthy': No such file or directory`.

```shell
podman kube play 'https://k8s.io/examples/pods/probe/exec-liveness.yaml'
podman kube play --down 'https://k8s.io/examples/pods/probe/exec-liveness.yaml'
```

### tcp-liveness-readiness.yaml

The `tcpSocket` probe never leaves the `starting` state, and gets restarted after several failures. There is no command output.

```shell
podman kube play 'https://k8s.io/examples/pods/probe/tcp-liveness-readiness.yaml'
podman kube play --down 'https://k8s.io/examples/pods/probe/tcp-liveness-readiness.yaml'
```

### http-liveness.yaml

The `httpGet` probe never leaves the `starting` state, and gets restarted after several failures. There is no command output.

```shell
podman kube play 'https://k8s.io/examples/pods/probe/http-liveness.yaml'
podman kube play --down 'https://k8s.io/examples/pods/probe/http-liveness.yaml'
```

### grpc-liveness.yaml

The `grpc` probe is not supported by `podman`, but is included here for completeness with the health check examples from Kubernetes.io.

<details>
<summary>grpc</summary>

It seems the `grpc` probe is ignored, and the container keeps running without a health state (`starting`, `healthy`, `unhealthy`, ...) in the `podman ps` output.

This may be used as an example of additional "outside" health check kinds, which may be separately containerized without imposing these dependencies on the `podman` binary itself. See [gRPC health checks](https://github.com/grpc/grpc/blob/master/doc/health-checking.md).

```shell
podman kube play 'https://k8s.io/examples/pods/probe/grpc-liveness.yaml'
podman kube play --down 'https://k8s.io/examples/pods/probe/grpc-liveness.yaml'
```

</details>

### Test monitoring

Monitor the container states separately, for example either by watching `podman ps` "live" or by logging the `podman inspect` output.

```shell
# NOTE: watch status live.
watch --differences --interval 1 podman ps

# NOTE: keep a status log.
( while true; do date ; podman inspect --latest | jq '.[] | { Name, Health: .State.Health }' ; sleep 5 ; done ; )
```

## Workarounds

1. Install `nc`/`curl` directly in the container in an extra build step.
1. Find a different (but equivalent) health check method which may `exec` directly in the container. One example may be to test for sockets created when the container/server has initialized fully: `test -S /path/to/server/socket`.
1. Utilize existing container software for workarounds, perhaps script interpreters such as `perl` or `python`.

Here's an example of using [`bash` redirections](https://www.gnu.org/software/bash/manual/html_node/Redirections.html) to simulate a `nc -z` check on `localhost:8080` (TCP). Note that this workaround will send (empty) data to the server port, which may cause side-effects if the server acts on the incoming connection.

On failure the output is `bash: connect: Connection refused\nbash: line 1: /dev/tcp/localhost/8080: Connection refused` and exit code is non-zero.

```yaml
livenessProbe:
  # TODO: replace with tcpSocket healthcheck.
  exec:
    command:
      - bash
      - "-c"
      - ": > /dev/tcp/localhost/8080"
  failureThreshold: 3
  initialDelaySeconds: 1
  periodSeconds: 5
```

---

<details>
<summary>Executing nc in common base images</summary>

The same issue arises for "simplified" command versions, such as [`nc` in `busybox`](https://boxmatrix.info/wiki/Property:nc) which doesn't always support the `-z` nor `-v` options/features (depending on compile flags and `busybox` version).

```shell
podman run --rm busybox nc
```

```text
BusyBox v1.22.1 (2014-05-22 23:22:11 UTC) multi-call binary.

Usage: nc [-iN] [-wN] [-l] [-p PORT] [-f FILE|IPADDR PORT] [-e PROG]

Open a pipe to IP:PORT or FILE

	-l	Listen mode, for inbound connects
		(use -ll with -e for persistent server)
	-p PORT	Local port
	-w SEC	Connect timeout
	-i SEC	Delay interval for lines sent
	-f FILE	Use file (ala /dev/ttyS0) instead of network
	-e PROG	Run PROG after connect
```

```shell
podman run --rm alpine nc
```

```text
BusyBox v1.35.0 (2022-11-19 10:13:10 UTC) multi-call binary.

Usage: nc [OPTIONS] HOST PORT  - connect
nc [OPTIONS] -l -p PORT [HOST] [PORT]  - listen

	-e PROG	Run PROG after connect (must be last)
	-l	Listen mode, for inbound connects
	-lk	With -e, provides persistent server
	-p PORT	Local port
	-s ADDR	Local address
	-w SEC	Timeout for connects and final net reads
	-i SEC	Delay interval for lines sent
	-n	Don't do DNS resolution
	-u	UDP mode
	-b	Allow broadcasts
	-v	Verbose
	-o FILE	Hex dump traffic
	-z	Zero-I/O mode (scanning)
```

```shell
podman run --rm centos nc
```

Could not find `nc` in `$PATH`.

```shell
podman run --rm fedora nc
```

Could not find `nc` in `$PATH`.

```shell
podman run --rm debian nc
```

Could not find `nc` in `$PATH`.

```shell
podman run --rm ubuntu nc
```

Could not find `nc` in `$PATH`.

</details>

<details>
<summary>Executing curl in common base images</summary>

It's less common to find `curl` installed.

```shell
podman run --rm busybox curl
```

Could not find `curl` in `$PATH`.

```shell
podman run --rm alpine curl
```

Could not find `curl` in `$PATH`.

```shell
podman run --rm centos curl
```

```text
curl: try 'curl --help' or 'curl --manual' for more information
```

```shell
podman run --rm fedora curl
```

```text
curl: try 'curl --help' or 'curl --manual' for more information
```

```shell
podman run --rm debian curl
```

Could not find `curl` in `$PATH`.

```shell
podman run --rm ubuntu curl
```

Could not find `curl` in `$PATH`.

</details>

---

Running a personal [Open Build Service (OBS)](https://openbuildservice.org/) branch of [`podman`](https://build.opensuse.org/package/show/devel:kubic:libcontainers:unstable/podman) v4.5.0 (as [suggested in another issue](https://github.com/containers/podman/issues/14302#issuecomment-1215095245)), with [a build dependency fix](https://gitlab.com/rhcontainerbot/rpms-openqa/podman/-/merge_requests/1) and [added BTRFS support](https://build.opensuse.org/package/show/home:joelpurra:branches:devel:kubic:libcontainers:unstable/podman). I'm just starting out using OBS, but it should not affect this issue.

<details>
<summary>apt show podman</summary>

```text
Package: podman
Version: 4:4.5.0-debian12joelpurra1+obs82.1
Priority: optional
Maintainer: Podman Debbuild Maintainers <https://github.com/orgs/containers/teams/podman-debbuild-maintainers>
Installed-Size: 73.2 MB
Provides: podman-manpages (= 4:4.5.0-debian12joelpurra1+obs82.1)
Depends: catatonit,iptables,nftables,conmon (>= 2:2.0.30),containers-common (>= 4:1),uidmap,netavark (>= 1.0.3-1),libc6,libgpg-error0
Recommends: podman-gvproxy (= 4:4.5.0-debian12joelpurra1+obs82.1)
Suggests: qemu-user-static
Homepage: https://podman.io/
Download-Size: 29.3 MB
APT-Manual-Installed: yes
APT-Sources: https://download.opensuse.org/repositories/home:/joelpurra:/branches:/devel:/kubic:/libcontainers:/unstable/Debian_Testing  Packages
Description: Manage Pods, Containers and Container Images
 podman (Pod Manager) is a fully featured container engine that is a simple
 daemonless tool.  podman provides a Docker-CLI comparable command line that
 eases the transition from other container engines and allows the management of
 pods, containers and images.  Simply put: alias docker=podman.
 Most podman commands can be run as a regular user, without requiring
 additional privileges.
 .
 podman uses Buildah(1) internally to create container images.
 Both tools share image (not container) storage, hence each can use or
 manipulate images (but not containers) created by the other.
 .
 Manage Pods, Containers and Container Images
 podman Simple management tool for pods, containers and images

N: There are 2 additional records. Please use the '-a' switch to see them.
```

</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cannot use Kubernetes healthcheck probes without certain executables (tcpSocket/nc, httpGet/curl) inside the container #18318

Issue Description

Steps to reproduce the issue

Describe the results you received

Describe the results you expected

podman info output

Podman in a container

Privileged Or Rootless

Upstream Latest Release

Additional environment details

Additional information

Test cases

exec-liveness.yaml

tcp-liveness-readiness.yaml

http-liveness.yaml

grpc-liveness.yaml

Test monitoring

Workarounds

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cannot use Kubernetes healthcheck probes without certain executables (tcpSocket/nc, httpGet/curl) inside the container #18318

Description

Issue Description

Steps to reproduce the issue

Describe the results you received

Describe the results you expected

podman info output

Podman in a container

Privileged Or Rootless

Upstream Latest Release

Additional environment details

Additional information

Test cases

exec-liveness.yaml

tcp-liveness-readiness.yaml

http-liveness.yaml

grpc-liveness.yaml

Test monitoring

Workarounds

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions