-
Notifications
You must be signed in to change notification settings - Fork 571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more rust data #2948
base: main
Are you sure you want to change the base?
Add more rust data #2948
Conversation
4fb50b2
to
18ed8a2
Compare
Signed-off-by: C0D3 M4513R <[email protected]>
Signed-off-by: C0D3 M4513R <[email protected]>
Signed-off-by: C0D3 M4513R <[email protected]>
Signed-off-by: C0D3 M4513R <[email protected]>
Signed-off-by: C0D3 M4513R <[email protected]>
Signed-off-by: C0D3 M4513R <[email protected]>
Signed-off-by: C0D3 M4513R <[email protected]>
Signed-off-by: C0D3 M4513R <[email protected]>
Signed-off-by: C0D3 M4513R <[email protected]>
Signed-off-by: C0D3 M4513R <[email protected]>
Signed-off-by: C0D3 M4513R <[email protected]>
8e68e50
to
a6569b5
Compare
Signed-off-by: C0D3 M4513R <[email protected]>
Signed-off-by: C0D3 M4513R <[email protected]>
Signed-off-by: C0D3 M4513R <[email protected]>
Thanks so much @C0D3-M4513R for the PR! I see you still have it in draft. Is there anything else you're looking to add before we give it a look and add comments? I did have one question while we wait for the validations to run. Here is the documentation for the "sources" syft supports: Is this what you're referring to when you when mention support for a source is missing? We also have a community meeting that's every other week. I've linked the calendar invite below if you'd like to come and talk about this change or have any suggestions around how we can make the tool better for rust after this PR lands: |
Signed-off-by: C0D3 M4513R <[email protected]>
Here's a list of stuff, that I'd like to get in also:
Also thanks for pointing me to the "sources" syft supports natively, but Rust's registries don't seem to be under there. |
Signed-off-by: C0D3 M4513R <[email protected]>
Signed-off-by: C0D3 M4513R <[email protected]>
with go-toml v1 when a file had something like the following line, it would fail to parse: `[target."cfg(any(target_os = \"linux\", target_os = \"dragonfly\", target_os = \"freebsd\", target_os = \"openbsd\", target_os = \"netbsd\"))".dependencies.accesskit_unix]` Signed-off-by: C0D3 M4513R <[email protected]>
Signed-off-by: C0D3 M4513R <[email protected]>
I am through with what I want to add. Some stuff definetly got a bit ugly (looking at Also I have no idea, if different instances of the I only need to fix tests now (since I added information some cargo tests fail now). Edit: Also it seems like on nixos |
Signed-off-by: C0D3 M4513R <[email protected]>
Signed-off-by: C0D3 M4513R <[email protected]>
Signed-off-by: C0D3 M4513R <[email protected]>
Signed-off-by: C0D3 M4513R <[email protected]>
Signed-off-by: C0D3 M4513R <[email protected]>
Signed-off-by: C0D3 M4513R <[email protected]>
Signed-off-by: C0D3 M4513R <[email protected]>
Signed-off-by: C0D3 M4513R <[email protected]>
I have now addressed the config, caching and relationship ordering. Outdated InfoRegardless I've hit an unexpected roadblock with the tests. I'm suddenly getting the following on a couple cataloger tests. I do not know where that is coming from, so help regarding this would be appreciated.
|
Signed-off-by: C0D3 M4513R <[email protected]>
Signed-off-by: C0D3 M4513R <[email protected]>
801ab4f
to
9b73e1d
Compare
Signed-off-by: C0D3 M4513R <[email protected]>
Signed-off-by: C0D3 M4513R <[email protected]>
Signed-off-by: C0D3 M4513R <[email protected]>
Signed-off-by: C0D3 M4513R <[email protected]>
Signed-off-by: C0D3 M4513R <[email protected]>
Signed-off-by: C0D3 M4513R <[email protected]>
Signed-off-by: C0D3 M4513R <[email protected]>
Both lints and tests should all pass now. Additionally I've revisited the Package Verification Code and added the safety check for the crate Checksum that we've talked about. I still believe that this conforms to spdx spec, because:
An external party can also easily reproduce the Package Verification code for a publicly accessible registry:
I can't find any point that's stated in the spdx documentation for the Package Verification Code, which would indicate this being against spdx spec when using a publicly accessible registries. I'm open for discussion if registries needing authentication, local registries and local crates should be included or not, since for those it is a little more ambiguous if one can independently reproduce the package verification code. The core questions here are:
Also another thing to note is that registries theoretically could use a transparent authentication system, such as whitelisting certain ip's and then mislabeling themselves as not needing auth, which would be undetectable by syft. |
Hi @C0D3-M4513R -- I'd like to follow up from our discussion about Package Verification Code. From the SPDX 2.3 docs:
... this description mentions "all files in the package". I have always read this to mean each file is an SPDX File entry, but the description doesn't seem to say that explicitly, so calculating this field based on files not included in the SBOM could be acceptable. There seems to be a section in an unpublished version of the SPDX 2.3 spec with some general guidance that uses the wording: "If the files bound by the Package are described in the document...", which seems to support my original impression that the files used to calculate this should also be included. But this is still a bit unclear, so I've posed this question for clarification on the SPDX Slack / will follow up on the mailing list to try to get a definitive answer. To me, this is the crux of the question about whether we can implement this particular aspect of the feature, since we wouldn't include the files in the file section, as they were not scanned directly on the source. I would also like to point out that in SPDX 3, using the Package Verification Code is apparently discouraged in favor of the content identifier, which looks like it could be a much more useful git hash for the revision, or possibly some other more useful hash code. All that said, since we're already downloading the package contents, I don't see any issue calculating this for the Syft internal data, regardless of whether we can include it in SPDX provided that it doesn't end up being particularly expensive for no benefit. Sorry for being a bit long-winded here, I just want to make sure we aren't shoehorning something in to our data just for SPDX that isn't really supposed to be there. ... now, on a completely different topic:
This looks like something we might be able to generalize as an internal utility, as the Go license search is able to fetch modules from git to a degree. I would also say this is definitely not required as a part of this PR and could come later if we chose to implement it. From my very brief experience with Rust, I've seen git being used in a number of places and I was under the impression it was fairly common. Or is that not really true and cargo generally fetches things from the registries? |
Your take on that is correct, and I agree that it could/should come later (I see that as being not that simple).
Imo it might be very hard doing that. An internal git repo cache might prove useful, but beyond that I don't see there being much common code. As a matter of fact: I already use git for the "repository" source-kind.
I'm reading this as: I should include the files I use to calculate the pacakge verification code. But I am including the files in the contains relationship? Or is there a separate thing for declaring all the files of a package?
I see that as difficult, because finding a good content-identifier might be hard. I'd even say that the current PURL might be actively harmful, since we don't specify any repository information and just assume it's the default cargo one. |
I just dug into this, and it seems that in spdx there is a File tag, that is used. I would love to be wrong on this, so please correct me, if I am. One idea to solve this is to also allow packages to specify additional Also if you are concerned about the correctness of the files, you should look at the current output. All hashes in the files section are currently wrong: [
{
"fileName":"/.github/workflows/rust.yml",
"SPDXID":"SPDXRef-File-.github-workflows-rust.yml-5218e54acecbaea1",
"checksums":[
{
"algorithm":"SHA1",
"checksumValue":"0000000000000000000000000000000000000000"
}
],
"licenseConcluded":"NOASSERTION",
"licenseInfoInFiles":["NOASSERTION"],
"copyrightText":""
},
{
"fileName":"/Cargo.lock",
"SPDXID":"SPDXRef-File-Cargo.lock-c6bea2c24af05bc1",
"checksums":[
{
"algorithm":"SHA1",
"checksumValue":"0000000000000000000000000000000000000000"}
],
"licenseConcluded":"NOASSERTION",
"licenseInfoInFiles":["NOASSERTION"],
"copyrightText":""
}
] (from https://github.com/C0D3-M4513R/time/actions/runs/9516743702/artifacts/1602138196) |
@C0D3-M4513R there are some ways to add files if we determine that's the right thing to do... but the conundrum I have here is that we probably should not include the files because they were not part of the source that was requested to be scanned. So, if we don't add them, is it acceptable to use them to calculate the Package Verification Code? I don't know the right answer, I posed the question to the SPDX mailing list: https://lists.spdx.org/g/spdx/message/1869, so let's wait to see what the guidance is. |
I'd really like to be done with this pr sometime soon. Can we just get on with the review and ignore the Package Verification Code discussion for now? Also I have seen the error in the static analysis, but I just don't know how to properly fix it (all attempts have resulted in me getting a cyclic import error). And by ignore, I specifically mean to not add any Sha1 Hashes of the files, so the Package Verification Code doesn't get generated. (I've not been able to interpret the responses on the mailing list) |
@C0D3-M4513R I can help with the static analysis failure, but that is pulling on another thread -- introducing the This will ultimately be a larger refactor of the PR, but I'm happy to make code changes and push up on your branch. In terms of the Package Verification Code, its essential that we don't include values that we know are not valid relative to these SBOM specs, and we've gotten confirmation that using the Package Verification Code in this way isn't correct. I think if we remove the Package Verification Code logic from this PR it'll be functionally valid to include. |
Just a heads up -- I'll be pushed up a set of changes today to this branch that will help it along 👍 |
Go ahead. I've debated removing the Package Verification Code myself from this pr, but other things came along. I assume that's one of the things you are going to remove. Feel free to push to my branch anytime. |
I'm pushing up my changes now, but I realize I've left things in a broken state -- I'll still be working on that today, but wanted to at least push up what I have. Here's the list of changes I made at a high level:
What's still needed:
Here's what I mean with that last point... this is what is being added today in your PR relationship-wise for these hosted archives:
But there are a few issues with this:
We might be able to figure something like:
where I'm handwaving at "hosted-at" as an edge type, and where digests would go, and what I'm curious about one thing though: what's the use case for showing source level digests (for each source file) when the package could simply store the sha256 of the source archive itself? Do you have a specific use case / need for having source level digests for all dependencies @C0D3-M4513R ? |
Signed-off-by: Alex Goodman <[email protected]>
This itself also tests that connecting to a repository works and that dependencies can be downloaded.
It definitely was a convenience decision, to get the package verification code to work without much additional code.
The decision to use source level digests came as a logical conclusion from the fact, that source archives are not always available. This loops back to the missing "source kinds" that I mention earlier. Sometimes you actually do have local dependencies (e.g. you have multiple crates in a single repository linked together with a cargo workspace). In order to not have to break anything later I chose to make the digests on the source level, because that is more universal. |
Led me here. Commenting here so I can follow PR updates. I have a need for this functionality also. |
Signed-off-by: C0D3 M4513R <[email protected]>
This repository attempts to add more rust information.
Currently this adds:
Package Verification Code?Things to Note:
registry
as a specialization ofsources
. "registry", "local-registry" and "sparse" are all aregistry
. "path", "git" and "directory" are only asource
.Cargo.lock
, where those dependencies had nosource
attribute.Follow-Up Ideas:
Also if more testing is required, I would appreciate help with what should be tested and what is a priority or not.
Also please note, that this is my very first time writing go.
My code might make sub optimal usage of go's language constructs