Skip to content

Conversation

@AndrewQuijano
Copy link
Collaborator

@AndrewQuijano AndrewQuijano commented Nov 28, 2025

Your checklist for this pull request

  • I've documented or updated the documentation of every API function and struct this PR changes.
  • I've added tests that prove my fix is effective or that my feature works (if possible)

Detailed description

This PR completes two changes:

  1. It adds ARM; OSI Linux already supports multiple architectures like ARM.

For 32-bit ARM, the system calls are identical it i386

https://github.com/panda-re/panda/blob/dev/panda/plugins/syscalls2/generated/syscalls_ext_typedefs_arm64.h#L1377-L1384

https://github.com/panda-re/panda/blob/dev/panda/plugins/syscalls2/generated/syscalls_ext_typedefs_arm.h#L1589-L1596

For 64-bit ARM, the system calls are identical to x86-64

https://github.com/panda-re/panda/blob/dev/panda/plugins/syscalls2/generated/syscalls_ext_typedefs_arm64.h#L1277-L1284

https://github.com/panda-re/panda/blob/dev/panda/plugins/syscalls2/generated/syscalls_ext_typedefs_arm64.h#L1373-L1380

I'm also assuming that I need to check register 0 for both the ARM architecture to get the number of bytes read.

  1. For LAVA, I intend to have one recording have one binary run on multiple files in one shot. This would massively improve performance, but file_taint only works with one file at a time. I'm updating the rules on matching to support regular expression matching at the end, allowing for matching multiple files.

fnmatch seems to be the best fit for supporting flexibility on file names, matching how shells do file matching.

https://man7.org/linux/man-pages/man3/fnmatch.3.html
...

Test plan

With LAVA, I will test being able to run with files such as ./toy/inputs/*, which should taint files such as ./toy/inputs/small-1.bin, ./toy/inputs/small-2.bin

I can confirm, based on LAVA logs, that two files, testbig.bin and testsmall.bin, were tainted using the wildcard. Additionally, it appears that taint2 works on both files. I added a debug print on label_ram JUST to make sure.

When testing originally, I saw the taint2 hypercall warning, hence that debug message, but I guess I was needlessly spooked.

bug_mining.log

Also, I added panda logging to any new files tainted, see here:
image

This would be useful so if your log captures multiple file taints, you can figure out which taints belong to which files!

PSA: If you use Python to convert panda log to JSON, you MUST use the updated version of PyPanda that will be made with this PR, then you should be able to see the FileMatchTaint instance.

You would see this in Python under pandare/plog_pb2.py, and search for "file_taint_match"

...

Closing issues

N/A
...

@AndrewQuijano AndrewQuijano marked this pull request as ready for review November 28, 2025 21:39
@AndrewQuijano AndrewQuijano changed the title File Taint: Adding ARM and Wildcard File Taint: Adding ARM for Linux and Wildcard for Files Dec 2, 2025
@AndrewQuijano AndrewQuijano force-pushed the file_regex branch 2 times, most recently from a9b7af5 to 4219acd Compare December 6, 2025 01:05
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds ARM architecture support (32-bit and 64-bit) for Linux file tainting in the file_taint plugin and replaces exact filename matching with wildcard pattern matching using POSIX fnmatch.

Key Changes:

  • Adds ARM register handling (R0 for 32-bit ARM, X0 for 64-bit ARM) to extract syscall return values for read operations
  • Replaces substring-based filename matching with fnmatch-based wildcard pattern matching to support multiple files
  • Includes numerous code style improvements (brace formatting consistency)

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
panda/plugins/file_taint/file_taint.cpp Adds ARM support for Linux read syscalls, replaces filename matching with wildcard pattern matching using fnmatch, adds empty filename validation
panda/plugins/file_taint/README.md Updates documentation to describe new wildcard matching behavior with examples and usage guidance
panda/plugins/taint2/taint_api.cpp Code formatting improvements and adds debug print statement in taint2_label_ram
panda/plugins/taint2/taint2.cpp Code style improvements (brace formatting) and adds informational print statements
panda/plugins/taint2/taint2_hypercalls.cpp Code formatting improvement for multi-line function call
panda/debian/setup.sh Adds TARGET_LIST build argument hardcoded to x86_64-softmmu

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@AndrewQuijano AndrewQuijano force-pushed the file_regex branch 3 times, most recently from 011af6b to c8de23b Compare December 11, 2025 22:48
@AndrewQuijano AndrewQuijano force-pushed the file_regex branch 2 times, most recently from 5e31d1c to 0311fda Compare December 24, 2025 05:55
@AndrewQuijano AndrewQuijano changed the title File Taint: Adding ARM for Linux and Wildcard for Files File Taint: Adding ARM for Linux and Wildcard for Files, with PANDA logging Dec 24, 2025
@lacraig2 lacraig2 merged commit 497f017 into dev Jan 12, 2026
3 checks passed
@lacraig2 lacraig2 deleted the file_regex branch January 12, 2026 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants