Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dropped Interrupts #266

Open
Vincer239 opened this issue Jan 23, 2025 · 3 comments
Open

Dropped Interrupts #266

Vincer239 opened this issue Jan 23, 2025 · 3 comments

Comments

@Vincer239
Copy link

I have found another interesting behavior on my zcu102 journey: lost/dropped Interrupts.
The backstory is the same as last time: using PL on SoC for acceleration purposes.
Anyways, I was testing and I noticed that the Interrupts sent by PL only sometimes actually trigger an Interrupt in Microkit.
So I dialed down how long an interrupt needs to be for Microkit to recognize it and tried again.

I got a consistent theme: Interrupts are dropped if the PD, which receives the interrupt, is busy. The interrupt is also not queued (notified() will run the next possible schedule time), just dropped/missed/lost. This is occurs for me when either 1) PD is called via a ppcall and running or 2) PD is already in notified() and running (also with microkit_irq_ack() already sent).

This is contrary to what I can find in the manual.

It should be noted that once a hardware interrupt has been received, it will not be received again until microkit_irq_ack is called.

In case 1) (running through ppcall), the interrupt should still be caught and queued for PD to do it next available slot and case 2) (already in notified()) should also catch the interrupt, once microkit_irq_ack() has been done.

I have some screenshots from my tests:

Lost IRQ with synchronous debug output
To understand the picture, quick explanation. Our system consists of pd1, pd2 and h1. h1 has highest priority in the system and pd1 and pd2 do ppcalls to h1. Within the h1.protected(), data is written to a shared memory. This triggers a hardware circuit on PL, signalizing new data. This is fifo_new_job. Some time later, Pl returns, sending an interrupt to PS/Microkit. The violet delay between the spikes is the time microkit needs to finish pd1.ppcall() and start pd2.ppcall() until it has written to memory.
As you see, the first return interrupt spike falls into pd2.ppcall() and is completely lost. Only interrupt from pd2's job is making it through.

I was told that maybe the serial printout messes with the system (also see the delay - in ms), so instead of printing it synchronously, I sent notifications to a debug printout pd with the lowest possible priority. I then get this picture:

lost irq assync
Same setup as previously. You also see that we lose both interrupts.

@Vincer239
Copy link
Author

Addition:
If you are lucky and move the first irq somewhere in the middle of the violet spikes, then you can catch the scheduling and the irq is recognized. But that is just pure luck.

@Ivan-Velickovic
Copy link
Collaborator

Ivan-Velickovic commented Jan 24, 2025

Is the IRQ level-triggered or edge-triggered? Have you set the trigger on the IRQ explicitly in the SDF?

If there's no issue with the IRQ trigger, can you apply the following patch to the kernel itself, and re-run and let me know if you see any KERNEL: prints. This will let us know if seL4 is receiving the IRQs and not delivering them to the PDs for some reason or it is not receiving them at all.

diff --git a/src/object/interrupt.c b/src/object/interrupt.c
index bf5604a03..e9ce1a4f8 100644
--- a/src/object/interrupt.c
+++ b/src/object/interrupt.c
@@ -206,6 +206,7 @@ void handleInterrupt(irq_t irq)
         /* Merging the variable declaration and initialization into one line
          * requires an update in the proofs first. Might be a c89 legacy.
          */
+        printf("KERNEL: Received IRQ %d\n", (int)IRQT_TO_IRQ(irq));
         cap_t cap;
         cap = intStateIRQNode[IRQT_TO_IDX(irq)].cap;
         if (cap_get_capType(cap) == cap_notification_cap &&

@Vincer239
Copy link
Author

Vincer239 commented Jan 24, 2025

Is the IRQ level-triggered or edge-triggered?

My understanding is it is IRQ level-triggered. Xilinx documentation says, it needs to be stable high for at least 40ns for it to trigger an interrupt.

Have you set the trigger on the IRQ explicitly in the SDF?

I am little unsure about SDF. If you are asking, if I put the correct interrupt ID in the system configuration file, then yes. I tested that several times in various configurations.

I will test the patch next week in the office.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants