Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UPF pod stuck at Waiting Idle state with the message: Waiting for pfcp agent service to run #381

Open
gatici opened this issue Sep 26, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@gatici
Copy link
Contributor

gatici commented Sep 26, 2024

Bug Description

UPF k8s operator is installed. It became active. A few hours later we saw that UPF state changed to waiting: Waiting for pfcp agent service to run. It can stay at that status for hours unless we restart the pod.

To Reproduce

Follow the guide: https://canonical-charmed-aether-sd-core.readthedocs-hosted.com/en/latest/tutorials/getting_started/ to install the environment.
Then leave the environment for a few hours. This issue appears.

Environment

Juju: 3.4
Microk8s: 1.27-strict/stable

Relevant log output

upf/0*                       waiting   idle   10.1.146.45         Waiting for pfcp agent service to run


UPF logs:

2024-09-26T10:56:27.120Z [container-agent] 2024-09-26 10:56:27 INFO juju-log HTTP Request: GET https://10.152.183.1/api/v1/namespaces/sdcore/services/upf-external "HTTP/1.1 200 OK"
2024-09-26T10:56:27.196Z [container-agent] 2024-09-26 10:56:27 INFO juju-log HTTP Request: GET https://10.152.183.1/apis/k8s.cni.cncf.io/v1/namespaces/sdcore/network-attachment-definitions "HTTP/1.1 200 OK"
2024-09-26T10:56:27.252Z [container-agent] 2024-09-26 10:56:27 INFO juju-log HTTP Request: GET https://10.152.183.1/apis/k8s.cni.cncf.io/v1/namespaces/sdcore/network-attachment-definitions "HTTP/1.1 200 OK"
2024-09-26T10:56:27.308Z [container-agent] 2024-09-26 10:56:27 INFO juju-log HTTP Request: GET https://10.152.183.1/apis/apps/v1/namespaces/sdcore/statefulsets/upf "HTTP/1.1 200 OK"
2024-09-26T10:56:27.394Z [container-agent] 2024-09-26 10:56:27 INFO juju-log HTTP Request: GET https://10.152.183.1/apis/apps/v1/namespaces/sdcore/statefulsets/upf "HTTP/1.1 200 OK"
2024-09-26T10:56:27.460Z [container-agent] 2024-09-26 10:56:27 INFO juju-log HTTP Request: GET https://10.152.183.1/apis/apps/v1/namespaces/sdcore/statefulsets/upf "HTTP/1.1 200 OK"
2024-09-26T10:56:27.564Z [container-agent] 2024-09-26 10:56:27 INFO juju-log HTTP Request: GET https://10.152.183.1/apis/k8s.cni.cncf.io/v1/namespaces/sdcore/network-attachment-definitions "HTTP/1.1 200 OK"
2024-09-26T10:56:27.617Z [container-agent] 2024-09-26 10:56:27 INFO juju-log HTTP Request: GET https://10.152.183.1/apis/apps/v1/namespaces/sdcore/statefulsets/upf "HTTP/1.1 200 OK"
2024-09-26T10:56:27.669Z [container-agent] 2024-09-26 10:56:27 INFO juju-log HTTP Request: GET https://10.152.183.1/api/v1/namespaces/sdcore/pods/upf-0 "HTTP/1.1 200 OK"
2024-09-26T10:56:31.990Z [container-agent] 2024-09-26 10:56:31 INFO juju-log No `fiveg_n3` relations found.
2024-09-26T10:56:32.026Z [container-agent] 2024-09-26 10:56:32 INFO juju-log HTTP Request: GET https://10.152.183.1/api/v1/namespaces/sdcore/services/upf-external "HTTP/1.1 200 OK"
2024-09-26T10:56:32.113Z [container-agent] 2024-09-26 10:56:32 INFO juju-log HTTP Request: GET https://10.152.183.1/apis/k8s.cni.cncf.io/v1/namespaces/sdcore/network-attachment-definitions "HTTP/1.1 200 OK"
2024-09-26T10:56:32.176Z [container-agent] 2024-09-26 10:56:32 INFO juju-log HTTP Request: GET https://10.152.183.1/apis/apps/v1/namespaces/sdcore/statefulsets/upf "HTTP/1.1 200 OK"
2024-09-26T10:56:32.220Z [container-agent] 2024-09-26 10:56:32 INFO juju-log HTTP Request: GET https://10.152.183.1/api/v1/namespaces/sdcore/pods/upf-0 "HTTP/1.1 200 OK"
2024-09-26T10:56:35.109Z [container-agent] 2024-09-26 10:56:35 INFO juju-log Waiting for pfcp agent service to run
2024-09-26T10:56:35.430Z [container-agent] 2024-09-26 10:56:35 INFO juju.worker.uniter.operation runhook.go:186 ran "update-status" hook (via hook dispatching script: dispatch)

PCFP agent container logs:

gatici@gaticipc:~$ kubectl logs -f upf-0 -n sdcore -c pfcp-agent
2024-09-26T10:28:43.766Z [pebble] HTTP API server listening on ":38814".
2024-09-26T10:28:43.766Z [pebble] Started daemon.
2024-09-26T10:29:10.696Z [pebble] GET /v1/services?names=pfcp-agent 291.746µs 200
2024-09-26T10:29:23.368Z [pebble] GET /v1/notices?timeout=30s 30.001642379s 200
2024-09-26T10:29:53.369Z [pebble] GET /v1/notices?timeout=30s 30.000740132s 200
2024-09-26T10:30:23.372Z [pebble] GET /v1/notices?timeout=30s 30.002088329s 200
2024-09-26T10:30:53.373Z [pebble] GET /v1/notices?timeout=30s 30.000886942s 200
2024-09-26T10:31:23.374Z [pebble] GET /v1/notices?timeout=30s 30.00096819s 200
2024-09-26T10:31:53.376Z [pebble] GET /v1/notices?timeout=30s 30.001307489s 200
2024-09-26T10:32:23.379Z [pebble] GET /v1/notices?timeout=30s 30.002782342s 200
2024-09-26T10:32:53.381Z [pebble] GET /v1/notices?timeout=30s 30.00082632s 200
2024-09-26T10:33:23.383Z [pebble] GET /v1/notices?timeout=30s 30.002053917s 200
2024-09-26T10:33:53.384Z [pebble] GET /v1/notices?timeout=30s 30.000341421s 200
2024-09-26T10:34:23.385Z [pebble] GET /v1/notices?timeout=30s 30.000745471s 200
2024-09-26T10:34:51.868Z [pebble] GET /v1/services?names=pfcp-agent 41.156µs 200
2024-09-26T10:34:53.386Z [pebble] GET /v1/notices?timeout=30s 30.001218508s 200
2024-09-26T10:35:23.388Z [pebble] GET /v1/notices?timeout=30s 30.001140441s 200
2024-09-26T10:35:53.389Z [pebble] GET /v1/notices?timeout=30s 30.000326246s 200
2024-09-26T10:36:23.389Z [pebble] GET /v1/notices?timeout=30s 30.000187329s 200
2024-09-26T10:36:53.390Z [pebble] GET /v1/notices?timeout=30s 30.000603325s 200
2024-09-26T10:37:23.392Z [pebble] GET /v1/notices?timeout=30s 30.002134346s 200
2024-09-26T10:37:53.394Z [pebble] GET /v1/notices?timeout=30s 30.000829317s 200
2024-09-26T10:38:23.396Z [pebble] GET /v1/notices?timeout=30s 30.001970297s 200
2024-09-26T10:38:53.398Z [pebble] GET /v1/notices?timeout=30s 30.001439948s 200
2024-09-26T10:39:23.400Z [pebble] GET /v1/notices?timeout=30s 30.001691226s 200
2024-09-26T10:39:49.073Z [pebble] GET /v1/services?names=pfcp-agent 49.99µs 200
2024-09-26T10:39:53.401Z [pebble] GET /v1/notices?timeout=30s 30.000669967s 200
2024-09-26T10:40:23.404Z [pebble] GET /v1/notices?timeout=30s 30.002590075s 200
2024-09-26T10:40:53.405Z [pebble] GET /v1/notices?timeout=30s 30.000722979s 200
2024-09-26T10:41:23.407Z [pebble] GET /v1/notices?timeout=30s 30.002170453s 200
2024-09-26T10:41:53.408Z [pebble] GET /v1/notices?timeout=30s 30.000144889s 200
2024-09-26T10:42:23.409Z [pebble] GET /v1/notices?timeout=30s 30.00055205s 200
2024-09-26T10:42:53.410Z [pebble] GET /v1/notices?timeout=30s 30.000574064s 200
2024-09-26T10:43:23.412Z [pebble] GET /v1/notices?timeout=30s 30.001917554s 200
2024-09-26T10:43:53.415Z [pebble] GET /v1/notices?timeout=30s 30.002598666s 200
2024-09-26T10:44:23.416Z [pebble] GET /v1/notices?timeout=30s 30.0007684s 200
2024-09-26T10:44:53.419Z [pebble] GET /v1/notices?timeout=30s 30.002622405s 200
2024-09-26T10:45:23.420Z [pebble] GET /v1/notices?timeout=30s 30.000716245s 200
2024-09-26T10:45:48.280Z [pebble] GET /v1/services?names=pfcp-agent 36.505µs 200
2024-09-26T10:45:53.422Z [pebble] GET /v1/notices?timeout=30s 30.001224619s 200
2024-09-26T10:46:23.424Z [pebble] GET /v1/notices?timeout=30s 30.001446551s 200
2024-09-26T10:46:53.425Z [pebble] GET /v1/notices?timeout=30s 30.001387528s 200
2024-09-26T10:47:23.428Z [pebble] GET /v1/notices?timeout=30s 30.001964595s 200
2024-09-26T10:47:53.429Z [pebble] GET /v1/notices?timeout=30s 30.000658179s 200
2024-09-26T10:48:23.431Z [pebble] GET /v1/notices?timeout=30s 30.001981279s 200
2024-09-26T10:48:53.433Z [pebble] GET /v1/notices?timeout=30s 30.001347816s 200
2024-09-26T10:49:23.436Z [pebble] GET /v1/notices?timeout=30s 30.002627427s 200
2024-09-26T10:49:53.437Z [pebble] GET /v1/notices?timeout=30s 30.000780755s 200
2024-09-26T10:50:23.440Z [pebble] GET /v1/notices?timeout=30s 30.002602742s 200
2024-09-26T10:50:53.443Z [pebble] GET /v1/notices?timeout=30s 30.002588093s 200
2024-09-26T10:51:23.444Z [pebble] GET /v1/notices?timeout=30s 30.000424493s 200
2024-09-26T10:51:24.612Z [pebble] GET /v1/services?names=pfcp-agent 57.234µs 200
2024-09-26T10:51:53.447Z [pebble] GET /v1/notices?timeout=30s 30.002392851s 200
2024-09-26T10:52:23.450Z [pebble] GET /v1/notices?timeout=30s 30.002155506s 200
2024-09-26T10:52:53.452Z [pebble] GET /v1/notices?timeout=30s 30.001886982s 200
2024-09-26T10:53:23.454Z [pebble] GET /v1/notices?timeout=30s 30.001324014s 200
2024-09-26T10:53:53.456Z [pebble] GET /v1/notices?timeout=30s 30.002065723s 200
2024-09-26T10:54:23.459Z [pebble] GET /v1/notices?timeout=30s 30.002459952s 200
2024-09-26T10:54:53.460Z [pebble] GET /v1/notices?timeout=30s 30.00076308s 200
2024-09-26T10:55:23.463Z [pebble] GET /v1/notices?timeout=30s 30.002785155s 200
2024-09-26T10:55:53.464Z [pebble] GET /v1/notices?timeout=30s 30.000961792s 200
2024-09-26T10:56:23.468Z [pebble] GET /v1/notices?timeout=30s 30.002320279s 200
2024-09-26T10:56:35.103Z [pebble] GET /v1/services?names=pfcp-agent 23.95µs 200
2024-09-26T10:56:53.469Z [pebble] GET /v1/notices?timeout=30s 30.00023671s 200
2024-09-26T10:57:23.470Z [pebble] GET /v1/notices?timeout=30s 30.001129876s 200
2024-09-26T10:57:53.472Z [pebble] GET /v1/notices?timeout=30s 30.001570924s 200
2024-09-26T10:58:23.473Z [pebble] GET /v1/notices?timeout=30s 30.00077927s 200
2024-09-26T10:58:53.476Z [pebble] GET /v1/notices?timeout=30s 30.002541495s 200
2024-09-26T10:59:23.479Z [pebble] GET /v1/notices?timeout=30s 30.002052277s 200
2024-09-26T10:59:53.480Z [pebble] GET /v1/notices?timeout=30s 30.000717119s 200

BESSD container logs:

kubectl logs -f upf-0 -n sdcore -c bessd | more
2024-09-26T10:29:12.568Z [routectl] 2024-09-26 10:29:12,568 INFO Finished pinging missing ARP entries. Sleeping...
2024-09-26T10:29:17.417Z [routectl] 2024-09-26 10:29:17,416 INFO RTM_NEWNEIGH netlink event received.
2024-09-26T10:29:17.417Z [bessd] I0926 10:29:17.417641   631 bessctl.cc:489] *** All workers have been paused ***
2024-09-26T10:29:17.418Z [routectl] 2024-09-26 10:29:17,418 INFO Route entry 0.0.0.0/0 added to coreRoutes
2024-09-26T10:29:17.419Z [bessd] I0926 10:29:17.419003   634 bessctl.cc:693] Checking scheduling constraints
2024-09-26T10:29:17.419Z [bessd] E0926 10:29:17.419067   634 module.cc:226] Mismatch in number of workers for module coreMerge min required 1 max allowed 64 attached workers 0
2024-09-26T10:29:17.419Z [bessd] E0926 10:29:17.419073   634 module.cc:226] Mismatch in number of workers for module coreSrcEther min required 1 max allowed 64 attached workers 0
2024-09-26T10:29:17.420Z [bessd] W0926 10:29:17.420058   635 metadata.cc:79] Metadata attr timestamp/8 of module access_measure has no upstream module that sets the value!
2024-09-26T10:29:17.420Z [bessd] W0926 10:29:17.420070   635 metadata.cc:79] Metadata attr timestamp/8 of module core_measure has no upstream module that sets the value!
2024-09-26T10:29:17.420Z [bessd] I0926 10:29:17.420092   635 bessctl.cc:518] *** Resuming ***
2024-09-26T10:29:17.424Z [routectl] 2024-09-26 10:29:17,424 INFO Neighbor entry does not exist, creating modules.
2024-09-26T10:29:17.424Z [bessd] I0926 10:29:17.424710   636 bessctl.cc:489] *** All workers have been paused ***
2024-09-26T10:29:17.425Z [routectl] 2024-09-26 10:29:17,425 INFO Add Update module coreDstMAC06DFADBB2D87 successfully
2024-09-26T10:29:17.425Z [bessd] I0926 10:29:17.425832   638 bessctl.cc:693] Checking scheduling constraints
2024-09-26T10:29:17.425Z [bessd] E0926 10:29:17.425889   638 module.cc:226] Mismatch in number of workers for module coreDstMAC06DFADBB2D87 min required 1 max allowed 64 attached work
ers 0
2024-09-26T10:29:17.425Z [bessd] E0926 10:29:17.425891   638 module.cc:226] Mismatch in number of workers for module coreMerge min required 1 max allowed 64 attached workers 0
2024-09-26T10:29:17.425Z [bessd] E0926 10:29:17.425895   638 module.cc:226] Mismatch in number of workers for module coreSrcEther min required 1 max allowed 64 attached workers 0
2024-09-26T10:29:17.426Z [bessd] W0926 10:29:17.426789   639 metadata.cc:79] Metadata attr timestamp/8 of module access_measure has no upstream module that sets the value!
2024-09-26T10:29:17.426Z [bessd] W0926 10:29:17.426802   639 metadata.cc:79] Metadata attr timestamp/8 of module core_measure has no upstream module that sets the value!
2024-09-26T10:29:17.426Z [bessd] I0926 10:29:17.426823   639 bessctl.cc:518] *** Resuming ***
2024-09-26T10:29:17.428Z [bessd] I0926 10:29:17.428735   640 bessctl.cc:489] *** All workers have been paused ***
2024-09-26T10:29:17.429Z [routectl] 2024-09-26 10:29:17,429 INFO Module coreRoutes:0->0/coreDstMAC06DFADBB2D87 linked
2024-09-26T10:29:17.429Z [bessd] I0926 10:29:17.429600   642 bessctl.cc:693] Checking scheduling constraints
2024-09-26T10:29:17.429Z [bessd] E0926 10:29:17.429658   642 module.cc:226] Mismatch in number of workers for module coreMerge min required 1 max allowed 64 attached workers 0
2024-09-26T10:29:17.429Z [bessd] E0926 10:29:17.429663   642 module.cc:226] Mismatch in number of workers for module coreSrcEther min required 1 max allowed 64 attached workers 0
2024-09-26T10:29:17.430Z [bessd] W0926 10:29:17.430320   643 metadata.cc:79] Metadata attr timestamp/8 of module access_measure has no upstream module that sets the value!
2024-09-26T10:29:17.430Z [bessd] W0926 10:29:17.430326   643 metadata.cc:79] Metadata attr timestamp/8 of module core_measure has no upstream module that sets the value!
2024-09-26T10:29:17.430Z [bessd] I0926 10:29:17.430438   643 bessctl.cc:518] *** Resuming ***
2024-09-26T10:29:17.432Z [bessd] I0926 10:29:17.432813   644 bessctl.cc:489] *** All workers have been paused ***
2024-09-26T10:29:17.434Z [routectl] 2024-09-26 10:29:17,434 INFO Module coreDstMAC06DFADBB2D87:0->0/coreMerge linked
2024-09-26T10:29:17.434Z [bessd] I0926 10:29:17.434401   646 bessctl.cc:693] Checking scheduling constraints
2024-09-26T10:29:17.435Z [bessd] I0926 10:29:17.435587   647 bessctl.cc:518] *** Resuming ***
2024-09-26T10:29:22.576Z [routectl] 2024-09-26 10:29:22,576 INFO Missing ARP entries: []
2024-09-26T10:29:22.576Z [routectl] 2024-09-26 10:29:22,576 INFO Finished pinging missing ARP entries. Sleeping...
2024-09-26T10:29:23.367Z [pebble] GET /v1/notices?timeout=30s 30.000472566s 200
2024-09-26T10:29:32.584Z [routectl] 2024-09-26 10:29:32,584 INFO Missing ARP entries: []
2024-09-26T10:29:32.584Z [routectl] 2024-09-26 10:29:32,584 INFO Finished pinging missing ARP entries. Sleeping...
2024-09-26T10:29:38.153Z [routectl] 2024-09-26 10:29:38,152 INFO RTM_NEWNEIGH netlink event received.
2024-09-26T10:29:42.592Z [routectl] 2024-09-26 10:29:42,592 INFO Missing ARP entries: []
2024-09-26T10:29:42.592Z [routectl] 2024-09-26 10:29:42,592 INFO Finished pinging missing ARP entries. Sleeping...
2024-09-26T10:29:51.465Z [routectl] 2024-09-26 10:29:51,465 INFO RTM_DELNEIGH netlink event received.
2024-09-26T10:29:51.466Z [routectl] 2024-09-26 10:29:51,466 INFO RTM_DELNEIGH netlink event received.
2024-09-26T10:29:51.466Z [routectl] 2024-09-26 10:29:51,466 INFO RTM_DELNEIGH netlink event received.
2024-09-26T10:29:51.466Z [routectl] 2024-09-26 10:29:51,466 INFO RTM_DELNEIGH netlink event received.
2024-09-26T10:29:52.600Z [routectl] 2024-09-26 10:29:52,600 INFO Missing ARP entries: []
2024-09-26T10:29:52.600Z [routectl] 2024-09-26 10:29:52,600 INFO Finished pinging missing ARP entries. Sleeping...
2024-09-26T10:29:53.368Z [pebble] GET /v1/notices?timeout=30s 30.000389503s 200


### Additional context


gatici@gaticipc:~$ juju status
Model   Controller          Cloud/Region        Version  SLA          Timestamp
sdcore  microk8s-localhost  microk8s/localhost  3.4.5    unsupported  14:06:25+03:00

App                       Version  Status   Scale  Charm                     Channel        Rev  Address         Exposed  Message
amf                       1.4.4    active       1  sdcore-amf-k8s            1.5/edge       707  10.152.183.100  no       
ausf                      1.4.2    active       1  sdcore-ausf-k8s           1.5/edge       520  10.152.183.115  no       
cu                                 active       1  oai-ran-cu-k8s                             2  10.152.183.59   no       
du                                 active       1  oai-ran-du-k8s                             1  10.152.183.216  no       
gnbsim                    1.4.3    active       1  sdcore-gnbsim-k8s         1.5/edge       501  10.152.183.252  no       
grafana-agent             0.32.1   waiting      1  grafana-agent-k8s         latest/stable   45  10.152.183.116  no       installing agent
mongodb                            active       1  mongodb-k8s               6/beta          38  10.152.183.37   no       Primary
nms                       1.0.0    active       1  sdcore-nms-k8s            1.5/edge       580  10.152.183.204  no       
nrf                       1.4.1    active       1  sdcore-nrf-k8s            1.5/edge       580  10.152.183.244  no       
nssf                      1.4.1    active       1  sdcore-nssf-k8s           1.5/edge       462  10.152.183.190  no       
pcf                       1.4.3    active       1  sdcore-pcf-k8s            1.5/edge       512  10.152.183.39   no       
router                             active       1  sdcore-router-k8s                          0  10.152.183.43   no       
self-signed-certificates           active       1  self-signed-certificates  latest/stable  155  10.152.183.119  no       
smf                       1.5.2    active       1  sdcore-smf-k8s            1.5/edge       590  10.152.183.130  no       
traefik                   v2.11.0  active       1  traefik-k8s               latest/stable  194  10.152.183.205  no       Serving at 10.0.0.6.nip.io
udm                       1.4.3    active       1  sdcore-udm-k8s            1.5/edge       489  10.152.183.232  no       
udr                       1.4.1    active       1  sdcore-udr-k8s            1.5/edge       486  10.152.183.239  no       
ue                                 active       1  oai-ran-ue-k8s                             3  10.152.183.83   no       
upf                       1.4.0    waiting      1  sdcore-upf-k8s            1.5/edge       591  10.152.183.70   no       waiting for units to settle down

Unit                         Workload  Agent  Address      Ports  Message
amf/0*                       active    idle   10.1.146.22         
ausf/0*                      active    idle   10.1.146.50         
cu/0*                        active    idle   10.1.146.40         
du/0*                        active    idle   10.1.146.25         
gnbsim/0*                    active    idle   10.1.146.6          
grafana-agent/0*             blocked   idle   10.1.146.43         grafana-cloud-config: off, logging-consumer: off
mongodb/0*                   active    idle   10.1.146.63         Primary
nms/0*                       active    idle   10.1.146.41         
nrf/0*                       active    idle   10.1.146.28         
nssf/0*                      active    idle   10.1.146.54         
pcf/0*                       active    idle   10.1.146.16         
router/0*                    active    idle   10.1.146.17         
self-signed-certificates/0*  active    idle   10.1.146.4          
smf/0*                       active    idle   10.1.146.29         
traefik/0*                   active    idle   10.1.146.61         Serving at 10.0.0.6.nip.io
udm/0*                       active    idle   10.1.146.46         
udr/0*                       active    idle   10.1.146.35         
ue/0*                        active    idle   10.1.146.42         
upf/0*                       waiting   idle   10.1.146.45         Waiting for pfcp agent service to run
gatici@gaticipc:~$ 
@gatici gatici added the bug Something isn't working label Sep 26, 2024
@gatici gatici changed the title UPF pod stuck at Waiting Idle state with the message: Waiting for pfcp agent service to run UPF pod stuck at Waiting Idle state with the message: Waiting for pfcp agent service to run Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant