Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: v6 database support, updated matcher interfaces #2311

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

kzantow
Copy link
Contributor

@kzantow kzantow commented Dec 10, 2024

This PR refactors the matching process in an effort to simplify the codebase and to support both V5 and V6 databases.

This change is based on a few philosophical ideals: the Grype pkg.Package should contain all the data used to search by, and the vulnerability.Vulnerability should contain all of the data used to match against. This stance allows the vulnerability.Criteria interface to be very simple -- users can easily write go code to determine if the particular vulnerability object matches whatever criteria set forth which is applied during the vulnerability matching process. By association, this means complex logic in the matching process can be easily tested by providing a list of vulnerabilities representing "everything in the db", and just running the existing matcher code!

At it's core, the following interfaces are updated/introduced:

// match.Matcher
type Matcher interface {
	Type() MatcherType

	// Match is called for every package found, returning any matches and an optional Ignorer which will be applied
	// after all matches are found
	Match(vp vulnerability.Provider, p pkg.Package) ([]Match, []IgnoredMatch, error)
}

This PR refactors the matchers to conform to this interface, which does not include the restriction that a matcher must only apply to specific sets of PackageType()s.

// vulnerability.Criteria
type Criteria interface {
	// MatchesVulnerability returns true if the provided value meets the criteria
	MatchesVulnerability(value Vulnerability) (bool, error)
}

// vulnerability.Provider
type Provider interface {
	// FindVulnerabilities returns vulnerabilities matching all the provided criteria
	FindVulnerabilities(criteria ...Criteria) ([]Vulnerability, error)
}

Additionally, this PR introduces a simple vulnerability Criteria interface and refactors the vulnerability Provider to a simple interface accepting these criteria objects. A number of tests had v5-based vulnerability.Provider/Store/etc. implementations, which I've replaced by a simple MockProvider, which allows specifying vulnerabilities directly. There is a test that remains with a v5 store exercising the v5 implementation of the interface behaving as expected. And an analogous v6 test has been added.

Fixes #2132

Additionally, I think it fixes: #1426 despite not doing exactly the same thing that was discussed as needed; we should revisit this as a team to understand if the ask has been addressed the right way.

kzantow and others added 22 commits December 17, 2024 10:07
Signed-off-by: Alex Goodman <[email protected]>
Signed-off-by: Alex Goodman <[email protected]>
Signed-off-by: Keith Zantow <[email protected]>
Signed-off-by: Keith Zantow <[email protected]>
Signed-off-by: Alex Goodman <[email protected]>
@kzantow kzantow changed the title feat: basic v6 database support feat: v6 database support, updated matcher interfaces Jan 15, 2025
@kzantow kzantow marked this pull request as ready for review January 17, 2025 15:26
@@ -33,16 +34,17 @@ type Package struct {
Name string // the package name
Version string // the version of the package
Locations file.LocationSet // the locations that lead to the discovery of this package (note: this is not necessarily the locations that make up this package)
Language pkg.Language // the language ecosystem this package belongs to (e.g. JavaScript, Python, etc)
Language syftPkg.Language // the language ecosystem this package belongs to (e.g. JavaScript, Python, etc)
Distro *distro.Distro // a specific distro this package originated from
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matcher was originally

Match(v5.VulnerabilityProvider, *distro.Distro, pkg.Package) ([]match.Match, error)

which means that it does not necessarily need to be added to the pacakge. When we start matching with per-package distro information keeping these separate means that we can easily separate what was dected from the environment vs what was parsed from the package info (like the pURL). Here they are mixed together.

I don't see a reason why these two changes needed to be made in this PR -- that is I think v6 can be added without needing to break the matcher interface nor add a new type to the core package type.

Copy link
Contributor Author

@kzantow kzantow Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The matcher interface needs to change: the existing interface references the V5-specific VulnerabilityProvider, which at a minimum needs to be changed to the vulnerability.Provider (or whatever the V6 compatible thing is).

Conversely to this, if I have a PURL with something like: pkg:deb/debian/[email protected]?arch=all&distro=debian-12, I feel as though this distro information should be associated with the package (although a change to reading PURLs wasn't yet made as part of this PR; oops). Should there be 2 separate places to find distro? I'm not opposed to having some environmental information passed through separately but I couldn't understand why the distro was the only thing. What if we wanted to pass the image name or architecture? By moving this to the package, it seemed to me to be the right delineation that something somewhere determined we should be matching this package with this distro, and every matcher doesn't have to make that determination.

If this is a blocker, we should probably powwow to figure out the path forward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the things I was especially interested in doing was making one (or more) top-level structs here that could be updated post-v1. In this case, maybe the pkg.Context seems like the right thing to pass in MatchVulnerabilities(<maybe-other-stuff>, pkg.Context, pkg.Package). But as you say doesn't need to be changed as part of this PR.

Copy link
Contributor

@wagoodman wagoodman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to clarify: is the goal to keep the top-level search package? Not a problem if so, just trying to square up the final state of things based on previous conversation.

"github.com/anchore/grype/internal/log"
)

func LoadVulnerabilityDB(cfg distribution.Config, update bool) (*v5.ProviderStore, *distribution.Status, error) {
dbCurator, err := distribution.NewCurator(cfg)
func LoadVulnerabilityDBv6(distCfg v6dist.Config, installCfg v6inst.Config, update bool) (vulnerability.Provider, *v5dist.Status, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you feel about removing all LoadVulnerabilityDB* functions from the top level api? Since they are raising up vulnerability.Provider (which is agnostic) we could change this to return status.Err so that the status type (which is not agnostic) is not really needed. This way these functions can be moved into the v5/v6 packages (something like v5.LoadDB(...) / v6.LoadDB(...).

This would simplify the surface area when making major schema changes like this (instead of having to update the API like this each time).

var vulnSpec *VulnerabilitySpecifier
var pkgSpec *PackageSpecifier
var cpeSpec *cpe.Attributes
var osSpec *OSSpecifier
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should allow for multiple specifiers (these area or'd within the v6 store)


var out []vulnerability.Vulnerability
for _, criteriaSet := range search.CriteriaIterator(criteria) {
var vulnSpec *VulnerabilitySpecifier
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should allow for multiple vuln specifiers (these are or'd within the v6 store)

}

//nolint:funlen,gocognit,gocyclo
func (s vulnerabilityProvider) FindVulnerabilities(criteria ...vulnerability.Criteria) ([]vulnerability.Vulnerability, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think how criteria interplay really should be the stores decision in cases when there is conversion to specifiers, and this function should act more of a pass-through for what is already implemented in the store. Today the store functions search per-package, and allow for os and vuln specifiers to refine the search.

@@ -381,6 +297,102 @@ func (m *VulnerabilityMatcher) normalizeByCVE(match match.Match) match.Match {
return match
}

// splitStockMatcher splits matchers into 2 sets: everything other than the stock matcher, and the stock matcher in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these changes really needed for v6 schema? or is this more of a grype 1.0 change? I'm trying to understand the motivation for the changes for things above the matchers, which I thought would remain relatively unchanged for this PR (but probably get refactored a lot for grype v1).

@kzantow
Copy link
Contributor Author

kzantow commented Jan 21, 2025

is the goal to keep the top-level search package? Not a problem if so, just trying to square up the final state of things based on previous conversation.

Yes, but this package does not have the same purpose as the prior implementation. This is for search.By<something> to read nicely and have pieces of data that can be specified and referenced by both db implementations and vulnerability finding logic. I don't really have strong opinions about the naming of this package aside from it reading well with <name>.By<thing> or some other pattern that we like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simplify grype DB access abstractions RFC: Explicit reporting of negative matches
2 participants