Skip to content

Consuming dependency SBOMs and Adding dependencies to dependencies #11165

@fviernau

Description

@fviernau

[This proposal is still work in progress and will be updated]

Initial rough outline of the ask

A supplier provided package 'MGR::foo:1.00' for which no
source code is available but an SBOM, e.g. 'foo.spdx.json'.

ORT should "know" about the association to enable
the following:

  1. Dependencies from SBOM should be added to the
    dependency graph as dependencies of MGR::foo:1.00
    along with proper metadata.
  2. The scanner should add the data from the SBOMs for 'MGR::foo:1.00' and its dependencies
    instead of attempting to scan with ScanCode and fail with a scan issue.
  3. The evaluator rules should work for MGR::foo:1.00
    and it's dependencies as usual, and should have an
    indication if an examined package corresponds to an
    SBOM or to a dependency of an SBOM.
  4. Advisor(s) should query vulnerabilities including SBOM dependencies.
  5. The notifier should also have such indication available, so that vulnerabilies can be propagated to different
    channels.
  6. Produce a combined SBOM. TBC if it is necessary to produce an archive which
    includes all corresponding original SBOMs in unmodified form.
  7. Produce a NOTICE file including entries SBOM dependencies.
  8. Turn around time for making curations / seeing the effect should be low,
    to enable an efficient clearance process. (Could also be solved outside ORT)

Known difficulties for solution design choices

  1. A certain package may be mentioned by multiple SBOMs in a different way, e.g. with conflicting data.
    a.) Data could be merged manually, e.g. into a single curation.
    b.) Data could be namespaced.
  2. Flexibility in SBOM representations: It may not be possible to write a single on-the-fly
    extraction which works for arbitrary SBOMs.
  3. Data in SBOMs may need to be curated.
  4. There are various SBOM formats and versions.

Solution approaches

Common principles

  1. Define ORT data structures for the needed data (e.g. curations), which are SBOM-format-agnostic.
  2. The data in these data structures is then assumed to be correct / already curated.
  3. No on-the-fly extraction (+ SBOM curations) from original SBOMs. As follow up, it could be an option to look into providing a CLI helper command for this task.
  4. New dedicated scanner inserts data such as detected copyright / licenses.
  5. Curation data is centralized, so has to be added / corrected only once for entire ORG.
    • Consequence: There is no error fixing iterations by various teams (redundantly)
    • data is correct once added, in contrary to on-the-fly parsing and then curating.
    • no SBOMs commited to project's code repository

Approach 1 (de-duplicated / merged sbom data)

Turn data from SBOM into following curations:

  1. Curations to define additional dependencies
    a. Enhance package curations
    id: MGR::foo:1.00
       additional-dependencies:
       - MGR::bar:1.00
       - MGR::car:1.00
    
    b. A hierarchical directory structure which maps Identifier -> Set<Identifier>
  2. A hierarchical directory structure ort-sboms (file-path is id) with data per package form sbom
    purl: "some/purl"
    declared_licenses:
    - Apache-2.0
    labels:
    ...
    detected_licenses:
    ....
    
    If multiple SBOMs contain the same dependency with inconsistent data, extracted data would need to be
    merged when commiting to this file.

pros / cons

  • Not possible to have different data for a particular package. This
    seems good, but maybe in edge cases it does not work?
  • de-duplicated data.
  • Will produce a clean product SBOM.
  • Nicer identifiers (no autmatic conflict resultion necessary).
  • no redundancy.
  • lower maintainance effort.

Approach 2

Similar to approach one, but without the level of indirection.
So, define a hierarchical directory sboms which maps Identifier -> SbomData
Where SbomData would look like:

    id: MGR::foo:1.00
       dependencies:
       - MGR::bar:1.00
          purl: "some/purl"
         declared_licenses:
          - Apache-2.0
          labels:
           ....

pros / cons

  • no depduplication
  • conflicting dependency ids possible, which may n
  • A lot of redundancy of dependency data. Probably much harder to maintain
    when database grows, e.g. used on ORG wide scale.
  • SBOM data is not scattered, just in one place.
  • possible duplicates in result, reports, such as SBOM, notice, which is not so nice.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions