Method retrieveContent() in FileScancodeRawComponentInfoProvider should handle non standard text files correctly #277

ohecker · 2024-07-24T21:25:45Z

The method retrieveContent() in FileScancodeRawComponentInfoProvider is used to read file content (license texts and notice file content) from the downloaded package source files. The files to be read are determined by the scancode result file. Optionally retrieveContent() might also extract a subset of lines from the read file.

Whilst this works well for "normal" text files (classic source files), the files referenced in the scancode result file might also be text files without classic line feeds (like .js.map files) or even binary files (like .node). Solicitor currently lacks any functionality to transform the contents of such files into a format suitable for printing and/or to extract specific lines from the files.

Specifically above referenced .node files might be very large (e.g. around 100MB). Within method retrieveContent() the referenced file content is always read into the main memory. In case of such large files this might destabilize Solicitor due to Out-of-Memory errors.

FileScancodeRawComponentInfoProvider#retrieveContent() should be improved to:

also be able to generate printable content for non standard text files (like .js.map or .node)
this includes extracting dedicated line ranges from the file
avoid the danger of OOM even for large files

Unless support for these file types is implemented there should be at least a protection implemented to avoid OOM.
Easiest approach might be to reject processing of files larger than e.g. 1MB in size (assuming that normal source files are smaller than 1MB)

ohecker added the enhancement New feature or request label Jul 24, 2024

ohecker mentioned this issue Jul 25, 2024

Do not read content files within scancode adapter if they exceed a given size #278

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Method retrieveContent() in FileScancodeRawComponentInfoProvider should handle non standard text files correctly #277

Method retrieveContent() in FileScancodeRawComponentInfoProvider should handle non standard text files correctly #277

ohecker commented Jul 24, 2024

Method retrieveContent() in FileScancodeRawComponentInfoProvider should handle non standard text files correctly #277

Method retrieveContent() in FileScancodeRawComponentInfoProvider should handle non standard text files correctly #277

Comments

ohecker commented Jul 24, 2024