Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display analysis information #2134

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

fariss
Copy link
Collaborator

@fariss fariss commented Jun 10, 2024

Closes #857.

This commit introduces two new metadata fields to result_document. Would this be considered a breaking change?

This would require regenrating the rdoc test files. see mandiant/capa-testfiles#239.

Checklist

  • No CHANGELOG update needed
  • No new tests needed
  • No documentation update needed

fariss added 4 commits June 9, 2024 23:08
This commit introduces two new metadata fields:
- apicall_count: total count of all API calls made in the sample
- import_count: total count of Import symbols in the sample
Note, when using rutils.warn(), flake8 raises an error. So using
rutils.bold() for now.
Copy link
Collaborator

@mr-tz mr-tz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, pending the tests to succeed

capa/render/proto/capa.proto Outdated Show resolved Hide resolved
@mr-tz
Copy link
Collaborator

mr-tz commented Jun 10, 2024

I think this requires regenerating the files in tests/data/rd/

@fariss
Copy link
Collaborator Author

fariss commented Jun 10, 2024

Should be good to go once mandiant/capa-testfiles#239 is merged.

@@ -96,7 +98,7 @@ def find_basic_block_capabilities(

def find_code_capabilities(
ruleset: RuleSet, extractor: StaticFeatureExtractor, fh: FunctionHandle
) -> Tuple[MatchResults, MatchResults, MatchResults, int]:
) -> Tuple[MatchResults, MatchResults, MatchResults, FeatureSet]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing the signature of a function is a breaking change, so this should wait until the next major release.

capa/capabilities/static.py Show resolved Hide resolved
feature_counts.file = feature_count

# cumulatively count the total number of Import features
for feature, _ in file_features.items():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use .keys() here to indicate that you won't use the value

@@ -19,6 +19,9 @@

tabulate.PRESERVE_WHITESPACE = True

MIN_LIBFUNCS_RATIO = 0.4
MIN_API_CALLS = 10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where did these numbers come from? and how should i interpret them?

Copy link
Collaborator Author

@fariss fariss Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. MIN_LIBFUNCS_RATIO: When the total count of library function present in a sample is less then 40%, we inform users that capa might pick false positive matches from other functions that would have been classified as library functions. I don't have any statistical data to back this up other than this hex-rays blogpost.
  2. MIN_API_CALLS: When the sample has very few API calls, it is a strong indication that it might be packed/encrypted as regular programs tend to make a lot more than 10 calls (though, we have to run a benchmark across multiple sample to decide what's a good number here). For example this packed capa-testfile emits 0 API features, luckily we detect that it is packed with UPX. If that weren't the case, this banner could serve as an indication that the sample might packed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good explanations!

would you include the key parts here as a comment?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also i'm interested to see how frequently this message is shown to users. I don't think our dogs will identify 40% of functions in most binaries, so i'm a little concerned this message will be shown too often.

have you had a chance to collect these stats against a large number of samples?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's still helpful information since we know there's most likely more library code than we've identified.

@mr-tz
Copy link
Collaborator

mr-tz commented Jun 11, 2024

Stepping back here for a moment, let's consider if we want to implement this differently:

  • add new characteristics: few imports, few detected library functions
  • add new limitation rules using these features
  • update behavior to handle has_file_limitation or similar

That way we can handle the various limitations/warnings consistently. The core extraction logic still resides in capa but we don't have to extend the meta data.

Related: should we provide functionality to easier leverage this in other tools? Right now other tools need to reimplement the logic we have in capa.main to handle special cases/detections.

@williballenthin
Copy link
Collaborator

@mr-tz this would require many fewer breaking changes, which i like

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Display analysis information to user
3 participants