Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#4534: Remove multi irq support for mmio devices #4601

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

andr3wy
Copy link
Contributor

@andr3wy andr3wy commented May 7, 2024

Changes

-Utilized Option instead of Vector to store IRQ lines for MMIO devices.
-Add test to test this.
-Update existing tests
...

Reason

Currently, Firecracker have an ability to create multiple irqs for MMIO devices; however, no more than one IRQ is ever used. Therefore, it was suggested to change this to use an Option instead of a vector of length 0 or 1.
...

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • If a specific issue led to this PR, this PR closes the issue.
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this
    PR.
  • API changes follow the Runbook for Firecracker API changes.
  • User-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.
  • New TODOs link to an issue.
  • Commits meet
    contribution quality standards.

  • This functionality cannot be added in rust-vmm.

@andr3wy andr3wy force-pushed the 4534 branch 2 times, most recently from fec52fa to a1d2133 Compare May 7, 2024 03:08
Copy link

codecov bot commented May 7, 2024

Codecov Report

Attention: Patch coverage is 91.66667% with 1 line in your changes missing coverage. Please review.

Project coverage is 83.12%. Comparing base (d28ef60) to head (5d36dd7).

Files with missing lines Patch % Lines
src/vmm/src/device_manager/mmio.rs 91.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4601      +/-   ##
==========================================
- Coverage   83.13%   83.12%   -0.01%     
==========================================
  Files         245      245              
  Lines       26697    26692       -5     
==========================================
- Hits        22194    22189       -5     
  Misses       4503     4503              
Flag Coverage Δ
5.10-c5n.metal 83.59% <100.00%> (-0.01%) ⬇️
5.10-m5n.metal 83.58% <100.00%> (-0.01%) ⬇️
5.10-m6a.metal 82.79% <100.00%> (-0.01%) ⬇️
5.10-m6g.metal 79.55% <90.00%> (-0.01%) ⬇️
5.10-m6i.metal 83.57% <100.00%> (-0.01%) ⬇️
5.10-m7g.metal 79.55% <90.00%> (-0.01%) ⬇️
6.1-c5n.metal 83.59% <100.00%> (+<0.01%) ⬆️
6.1-m5n.metal 83.57% <100.00%> (-0.02%) ⬇️
6.1-m6a.metal 82.79% <100.00%> (-0.01%) ⬇️
6.1-m6g.metal 79.55% <90.00%> (-0.01%) ⬇️
6.1-m6i.metal 83.57% <100.00%> (-0.01%) ⬇️
6.1-m7g.metal 79.55% <90.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@roypat roypat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @andr3wy,
Thanks for your PR! There's a couple formatting and build issues on ARM that'll need to be fixed (see https://buildkite.com/firecracker/firecracker-pr/builds/9851 for what exactly is failing). For the formatting, you can run ./tools/devtool fmt to fix it up (and to check that the style checks will all pass, you can run ./tools/devtool checkstyle). For the build error on ARM, you can do cargo check --target aarch64-unknown-linux-gnu. That should allow you to validate the build without needing a cross-compilation toolchain. Also, please squash all the commits into a single one, and give it a meaningful description. Thanks!

src/vmm/src/device_manager/mmio.rs Outdated Show resolved Hide resolved
src/vmm/src/device_manager/mmio.rs Outdated Show resolved Hide resolved
src/vmm/src/device_manager/mmio.rs Outdated Show resolved Hide resolved
src/vmm/src/device_manager/mmio.rs Outdated Show resolved Hide resolved
src/vmm/src/device_manager/mmio.rs Outdated Show resolved Hide resolved
@andr3wy andr3wy force-pushed the 4534 branch 3 times, most recently from 2bf0526 to c9fe5ad Compare May 7, 2024 16:25
@andr3wy
Copy link
Contributor Author

andr3wy commented May 7, 2024

Issues should be resolved.

@andr3wy andr3wy force-pushed the 4534 branch 2 times, most recently from 79d925a to 4d1e052 Compare May 7, 2024 17:13
@roypat
Copy link
Contributor

roypat commented May 8, 2024

Thanks for squashing the commits! It seems that a rebase went a bit awry though, your branch now has some commits that are already on main :( Could you drop those, so that only your commit remains part of this PR? Also, please add a commit body message - it's required for the style CI step to pass - just rephrasing the what you put under "Reason" in the PR description will be fine :).

Also I think there's still a compilation issue on ARM, could you have a look at that, too?

@andr3wy
Copy link
Contributor Author

andr3wy commented May 8, 2024

Wanted to write that I won't have access to a computer for at least a month (out of town), so this will have to be on the back burner for a while.

@JonathanWoollett-Light
Copy link
Contributor

Related to #4534

@roypat roypat linked an issue May 16, 2024 that may be closed by this pull request
3 tasks
@roypat
Copy link
Contributor

roypat commented Jul 1, 2024

Wanted to write that I won't have access to a computer for at least a month (out of town), so this will have to be on the back burner for a while.

Hi @andr3wy,
are you still interested in working on this?

@roypat roypat force-pushed the 4534 branch 2 times, most recently from 5779211 to 810402e Compare July 15, 2024 12:25
@roypat
Copy link
Contributor

roypat commented Jul 15, 2024

Hi @andr3wy,
I've gone ahead and rebased the PR, and fixed the remaining outstanding compiler warnings. Thanks again for your contribution here!

@roypat roypat force-pushed the 4534 branch 2 times, most recently from 69b8c2c to d7c0a7d Compare July 16, 2024 10:27
@roypat roypat force-pushed the 4534 branch 2 times, most recently from 838e581 to 64264d4 Compare September 13, 2024 09:09
tools/devtool Outdated Show resolved Hide resolved
roypat added a commit to roypat/firecracker that referenced this pull request Jan 10, 2025
If the UFFD handler exits abnormaly for some reason, have it take down
Firecracker as well by SIGKILL-ing it from a panic hook. For this,
reintroduce the "get peer creds" logic. We have to use SIGKILL because
Firecracker could be inside the handler for a KVM-originated page fault
that is not marked as interruptible, in which case all signals but
SIGKILL are ignored (happens for example during KVM_SET_MSRS when it
triggers the initialization of a gfn_to_pfn_cache for the kvm-clock
page, which uses GUP without FOLL_INTERRUPTIBLE).

While we're at it, add a hint to the generic "process not found" error
message to indicate that potentially Firecracker died, and that the
cause of this could be the UFFD handler crashing (for example, in firecracker-microvm#4601
the cause of the mystery hang is the UFFD handler crashing, but we were
stumped by what's going on for over half a year. Let's avoid that going
forward).

Signed-off-by: Patrick Roy <[email protected]>
roypat added a commit to roypat/firecracker that referenced this pull request Jan 10, 2025
If the UFFD handler exits abnormaly for some reason, have it take down
Firecracker as well by SIGKILL-ing it from a panic hook. For this,
reintroduce the "get peer creds" logic. We have to use SIGKILL because
Firecracker could be inside the handler for a KVM-originated page fault
that is not marked as interruptible, in which case all signals but
SIGKILL are ignored (happens for example during KVM_SET_MSRS when it
triggers the initialization of a gfn_to_pfn_cache for the kvm-clock
page, which uses GUP without FOLL_INTERRUPTIBLE).

While we're at it, add a hint to the generic "process not found" error
message to indicate that potentially Firecracker died, and that the
cause of this could be the UFFD handler crashing (for example, in firecracker-microvm#4601
the cause of the mystery hang is the UFFD handler crashing, but we were
stumped by what's going on for over half a year. Let's avoid that going
forward).

We can't enable this by default because it interferes with unittests,
and also the "malicious_handler", so expose a function on `Runtime` to
enable it only in valid_handler and fault_all_handler.

Signed-off-by: Patrick Roy <[email protected]>
roypat added a commit to roypat/firecracker that referenced this pull request Jan 10, 2025
If the UFFD handler exits abnormaly for some reason, have it take down
Firecracker as well by SIGKILL-ing it from a panic hook. For this,
reintroduce the "get peer creds" logic. We have to use SIGKILL because
Firecracker could be inside the handler for a KVM-originated page fault
that is not marked as interruptible, in which case all signals but
SIGKILL are ignored (happens for example during KVM_SET_MSRS when it
triggers the initialization of a gfn_to_pfn_cache for the kvm-clock
page, which uses GUP without FOLL_INTERRUPTIBLE).

While we're at it, add a hint to the generic "process not found" error
message to indicate that potentially Firecracker died, and that the
cause of this could be the UFFD handler crashing (for example, in firecracker-microvm#4601
the cause of the mystery hang is the UFFD handler crashing, but we were
stumped by what's going on for over half a year. Let's avoid that going
forward).

We can't enable this by default because it interferes with unittests,
and also the "malicious_handler", so expose a function on `Runtime` to
enable it only in valid_handler and fault_all_handler.

Signed-off-by: Patrick Roy <[email protected]>
roypat added a commit to roypat/firecracker that referenced this pull request Jan 14, 2025
If the UFFD handler exits abnormaly for some reason, have it take down
Firecracker as well by SIGKILL-ing it from a panic hook. For this,
reintroduce the "get peer creds" logic. We have to use SIGKILL because
Firecracker could be inside the handler for a KVM-originated page fault
that is not marked as interruptible, in which case all signals but
SIGKILL are ignored (happens for example during KVM_SET_MSRS when it
triggers the initialization of a gfn_to_pfn_cache for the kvm-clock
page, which uses GUP without FOLL_INTERRUPTIBLE).

While we're at it, add a hint to the generic "process not found" error
message to indicate that potentially Firecracker died, and that the
cause of this could be the UFFD handler crashing (for example, in firecracker-microvm#4601
the cause of the mystery hang is the UFFD handler crashing, but we were
stumped by what's going on for over half a year. Let's avoid that going
forward).

Signed-off-by: Patrick Roy <[email protected]>
roypat added a commit to roypat/firecracker that referenced this pull request Jan 16, 2025
If the UFFD handler exits abnormaly for some reason, have it take down
Firecracker as well by SIGKILL-ing it from a panic hook. For this,
reintroduce the "get peer creds" logic. We have to use SIGKILL because
Firecracker could be inside the handler for a KVM-originated page fault
that is not marked as interruptible, in which case all signals but
SIGKILL are ignored (happens for example during KVM_SET_MSRS when it
triggers the initialization of a gfn_to_pfn_cache for the kvm-clock
page, which uses GUP without FOLL_INTERRUPTIBLE).

While we're at it, add a hint to the generic "process not found" error
message to indicate that potentially Firecracker died, and that the
cause of this could be the UFFD handler crashing (for example, in firecracker-microvm#4601
the cause of the mystery hang is the UFFD handler crashing, but we were
stumped by what's going on for over half a year. Let's avoid that going
forward).

We can't enable this by default because it interferes with unittests,
and also the "malicious_handler", so expose a function on `Runtime` to
enable it only in valid_handler and fault_all_handler.

Signed-off-by: Patrick Roy <[email protected]>
roypat added a commit to roypat/firecracker that referenced this pull request Jan 16, 2025
If the UFFD handler exits abnormaly for some reason, have it take down
Firecracker as well by SIGKILL-ing it from a panic hook. For this,
reintroduce the "get peer creds" logic. We have to use SIGKILL because
Firecracker could be inside the handler for a KVM-originated page fault
that is not marked as interruptible, in which case all signals but
SIGKILL are ignored (happens for example during KVM_SET_MSRS when it
triggers the initialization of a gfn_to_pfn_cache for the kvm-clock
page, which uses GUP without FOLL_INTERRUPTIBLE).

While we're at it, add a hint to the generic "process not found" error
message to indicate that potentially Firecracker died, and that the
cause of this could be the UFFD handler crashing (for example, in firecracker-microvm#4601
the cause of the mystery hang is the UFFD handler crashing, but we were
stumped by what's going on for over half a year. Let's avoid that going
forward).

We can't enable this by default because it interferes with unittests,
and also the "malicious_handler", so expose a function on `Runtime` to
enable it only in valid_handler and fault_all_handler.

Signed-off-by: Patrick Roy <[email protected]>
roypat added a commit to roypat/firecracker that referenced this pull request Jan 16, 2025
If the UFFD handler exits abnormaly for some reason, have it take down
Firecracker as well by SIGKILL-ing it from a panic hook. For this,
reintroduce the "get peer creds" logic. We have to use SIGKILL because
Firecracker could be inside the handler for a KVM-originated page fault
that is not marked as interruptible, in which case all signals but
SIGKILL are ignored (happens for example during KVM_SET_MSRS when it
triggers the initialization of a gfn_to_pfn_cache for the kvm-clock
page, which uses GUP without FOLL_INTERRUPTIBLE).

While we're at it, add a hint to the generic "process not found" error
message to indicate that potentially Firecracker died, and that the
cause of this could be the UFFD handler crashing (for example, in firecracker-microvm#4601
the cause of the mystery hang is the UFFD handler crashing, but we were
stumped by what's going on for over half a year. Let's avoid that going
forward).

We can't enable this by default because it interferes with unittests,
and also the "malicious_handler", so expose a function on `Runtime` to
enable it only in valid_handler and fault_all_handler.

Signed-off-by: Patrick Roy <[email protected]>
roypat added a commit to roypat/firecracker that referenced this pull request Jan 16, 2025
If the UFFD handler exits abnormaly for some reason, have it take down
Firecracker as well by SIGKILL-ing it from a panic hook. For this,
reintroduce the "get peer creds" logic. We have to use SIGKILL because
Firecracker could be inside the handler for a KVM-originated page fault
that is not marked as interruptible, in which case all signals but
SIGKILL are ignored (happens for example during KVM_SET_MSRS when it
triggers the initialization of a gfn_to_pfn_cache for the kvm-clock
page, which uses GUP without FOLL_INTERRUPTIBLE).

While we're at it, add a hint to the generic "process not found" error
message to indicate that potentially Firecracker died, and that the
cause of this could be the UFFD handler crashing (for example, in firecracker-microvm#4601
the cause of the mystery hang is the UFFD handler crashing, but we were
stumped by what's going on for over half a year. Let's avoid that going
forward).

We can't enable this by default because it interferes with unittests,
and also the "malicious_handler", so expose a function on `Runtime` to
enable it only in valid_handler and fault_all_handler.

Signed-off-by: Patrick Roy <[email protected]>
roypat added a commit that referenced this pull request Jan 16, 2025
If the UFFD handler exits abnormaly for some reason, have it take down
Firecracker as well by SIGKILL-ing it from a panic hook. For this,
reintroduce the "get peer creds" logic. We have to use SIGKILL because
Firecracker could be inside the handler for a KVM-originated page fault
that is not marked as interruptible, in which case all signals but
SIGKILL are ignored (happens for example during KVM_SET_MSRS when it
triggers the initialization of a gfn_to_pfn_cache for the kvm-clock
page, which uses GUP without FOLL_INTERRUPTIBLE).

While we're at it, add a hint to the generic "process not found" error
message to indicate that potentially Firecracker died, and that the
cause of this could be the UFFD handler crashing (for example, in #4601
the cause of the mystery hang is the UFFD handler crashing, but we were
stumped by what's going on for over half a year. Let's avoid that going
forward).

We can't enable this by default because it interferes with unittests,
and also the "malicious_handler", so expose a function on `Runtime` to
enable it only in valid_handler and fault_all_handler.

Signed-off-by: Patrick Roy <[email protected]>
Each MMIO device in Firecracker only utilizes at most one irq line, so
capture this property at the type level.

Signed-off-by: Andrew Yao <[email protected]>
Signed-off-by: Patrick Roy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove multi irq support for mmio devices
4 participants