Consuming dependency SBOMs and Adding dependencies to dependencies

_[This proposal is still work in progress and will be updated]_

## Initial rough outline of the ask

A supplier provided package 'MGR::foo:1.00' for which no
source code is available but an SBOM, e.g. 'foo.spdx.json'.

ORT should "know" about the association to enable
the following:

1. Dependencies from SBOM should be added to the
   dependency graph as dependencies of MGR::foo:1.00
   along with proper metadata.
2. The scanner should add the data from the SBOMs for 'MGR::foo:1.00' and its dependencies
   instead of attempting to scan with ScanCode and fail with a scan issue.
4. The evaluator rules should work for MGR::foo:1.00
   and it's dependencies as usual, and should have an
   indication if an examined package corresponds to an
   SBOM or to a dependency of an SBOM.
5. Advisor(s) should query vulnerabilities including SBOM dependencies.
6. The notifier should also have such indication available,   so that vulnerabilies can be propagated to different
   channels.
7. Produce a combined SBOM. TBC if it is necessary to produce an archive which
    includes all corresponding _original SBOMs in unmodified form_.
8. Produce a NOTICE file including entries SBOM dependencies.
9. Turn around time for making curations / seeing the effect should be low,
    to enable an efficient clearance process. (Could also be solved outside ORT)

## Known difficulties for solution design choices 

1. A certain package may be mentioned by multiple SBOMs in a different way, e.g. with conflicting data.
    a.) Data could be merged manually, e.g. into a single curation.
    b.) Data could be namespaced.
2. Flexibility in SBOM representations: It may not be possible to write a single on-the-fly 
    extraction which works for arbitrary SBOMs.
3. Data in SBOMs may need to be curated.
4. There are various SBOM formats and versions.

## Solution approaches

### Common principles

1. Define ORT data structures for the needed data (e.g. curations), which are SBOM-format-agnostic.
2. The data in these data structures is then assumed to be correct / already curated.
3. No on-the-fly extraction (+ SBOM curations) from original SBOMs. As follow up, it could be an option to look into providing a CLI helper command for this task.
4. New dedicated scanner inserts data such as detected copyright / licenses.
5. Curation data is centralized, so has to be added / corrected only once for entire ORG.
    - Consequence: There is no error fixing iterations by various teams (redundantly)
    - data is correct once added, in contrary to on-the-fly parsing and then curating.
    - no SBOMs commited to project's code repository
    
### Approach 1 (de-duplicated / merged sbom data)

Turn data from SBOM into following curations:

1. Curations to define additional dependencies
   a. Enhance package curations
    ``` 
    id: MGR::foo:1.00
       additional-dependencies:
       - MGR::bar:1.00
       - MGR::car:1.00
    ```
   b. A hierarchical directory structure which maps `Identifier -> Set<Identifier>`
2. A hierarchical directory structure `ort-sboms` (file-path is `id`) with data per package form sbom
     ``` 
     purl: "some/purl"
     declared_licenses:
     - Apache-2.0
     labels:
     ...
     detected_licenses:
     ....
    ```
    If multiple SBOMs contain the same dependency with inconsistent data, extracted data would need to be 
    merged when commiting to this file.

**pros / cons**

- Not possible to have different data for a particular package. This
  seems good, but maybe in edge cases it does not work?
- de-duplicated data.
- Will produce a clean product SBOM.
- Nicer identifiers (no autmatic conflict resultion necessary).
- no redundancy.
- lower maintainance effort.

### Approach 2

Similar to approach one, but without the level of indirection.
So, define a hierarchical directory `sboms` which maps `Identifier -> SbomData`
Where `SbomData` would look like:
```
    id: MGR::foo:1.00
       dependencies:
       - MGR::bar:1.00
          purl: "some/purl"
         declared_licenses:
          - Apache-2.0
          labels:
           ....
```

**pros / cons**

- no depduplication
- conflicting dependency ids possible, which may n
- A lot of redundancy of dependency data. Probably much harder to maintain
  when database grows, e.g. used on ORG wide scale.
- SBOM data is not scattered, just in one place.
- possible duplicates in result, reports, such as SBOM, notice, which is not so nice.
 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consuming dependency SBOMs and Adding dependencies to dependencies #11165

Initial rough outline of the ask

Known difficulties for solution design choices

Solution approaches

Common principles

Approach 1 (de-duplicated / merged sbom data)

Approach 2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consuming dependency SBOMs and Adding dependencies to dependencies #11165

Description

Initial rough outline of the ask

Known difficulties for solution design choices

Solution approaches

Common principles

Approach 1 (de-duplicated / merged sbom data)

Approach 2

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions