Skip to content

Question: Using gitbom to report/remediate vulnerable dependencies? #3

@imjasonh

Description

@imjasonh

From gitbom.dev#why?

By constructing a complete, concise, and verifiable artifact tree for every software artifact, GitBOM enables:

  • Run-time detection of potential vulnerabilities, regardless of the depth in a dependency tree from which that vulnerability originated
  • Post-exploit forensics
    ...

In short, it would let anyone easily answer the question, “Does this product contain log4j?”

From reading around the rest of the site though, it's unclear to me how folks anticipate realizing that vision.

It's still early days, so I expect this is probably just an open area of active discussion and design, and if so, I'd love to hear ideas!


My understanding is that gitbom aims to take the hashes of source files (and inputs in general, but probably typically at the leafs, ideally that's source files?), and concatenates them in a Git-like form into a string, which is then also hashed to produce the ID of the collected thing. If any input contents change, its hash changes, and the hash of the concatenated data changes, so the output hash changes. Like how a Git commit changes based on changes to file contents, commit message, etc.

All of this is ideally done transparently by build tooling (excellent, love it ❤️), and the final single gitbom ID is available alongside (inside?) the artifact.

My question is, what am I then intended to do with this gitbom hash to determine if it contains log4j? There's no way of telling whether some opaque hash a1b2c3... "contains" any particular other component. Is there some index of these IDs that I should consult?

To complicate things even further, "log4j" could mean lots of things -- presumably I'm trying to identify some vulnerable version of log4j, but there are also presumably any number of perfectly acceptable log4j versions. "Versions" isn't even well defined; ideally I'd depend on a specific official release, but I might carry patches, or consume a released version from some intermediary that carries patches, or might depend on an unreleased codebase from head. I know you're aware of all this, and gitbom's approach absolutely seems like it makes the problem of IDing versions less painful, since you don't really care about "versions", just inputs/source files. But it still makes it hard to tell whether my artifact contains vulnerable inputs, since vulnerability reporting still tends to think in terms of released version ranges (vulnerability introduced in v1.2.3, fixed in v1.2.6)

Is the idea that vulnerability reporting should switch to source-based reporting (vulnerability exists in source file with sha f9c1d3...), and gitbom would let me lookup whether my artifact contains that source file? That likely gets infeasible, since trivial changes to the file (e.g., formatting, unrelated code changes) would change the hash without affecting the vulnerable code. A vulnerability report would have to report the hashes of vulnerable_code.h, vulnerable_code_with_one_trailing_whitespace.h, vulnerable_code_with_two_trailing_whitespaces.h, for every line, combinatorially, out to ~infinity. And that's just whitespace. Even more subtly, some other unrelated change could fix (or not!) the vulnerability, so every possibly-trivial change to the input would need to be inspected to tell whether it's vulnerable. If code is "vulnerable" when x == 5, then var x = 5 is vulnerable, as is var x = 2 + 3. But theoretically due to compiler shenanigans, maybe one could be vulnerable while the other isn't.

I see in the bomsh repo an example of detecting log4j given some gitbom data, but I'm not sure I understand yet how this answers the questions above. I haven't had a chance to dig deeper into it; if the answer is "RTFM" I'll accept that.😅

Anyway, at this point, I'm very likely missing something about how this is supposed to work end-to-end, for both BOM generation and inspection. A bunch of smart folks are thinking about it, and I trust y'all to have come up with something to make vulnerability reporting and remediation as simple as you're doing with putting BOM generation in build tools. Help educate me!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions