Skip to content

A high-performance code scanning tool inspired by ScanCode.

License

Notifications You must be signed in to change notification settings

mstykow/scancode-rust

Repository files navigation

scancode-rust

A high-performance code scanning tool written in Rust that detects licenses, copyrights, and other relevant metadata in source code.

Overview

scancode-rust is designed to be a faster alternative to the Python-based ScanCode Toolkit, aiming to produce compatible output formats while delivering significantly improved performance. This tool currently scans codebases to identify:

  • License information
  • File metadata
  • System information

More ScanCode features coming soon!

Features

  • Efficient file scanning with multi-threading
  • Compatible output format with ScanCode Toolkit
  • Progress indication for large scans
  • Configurable scan depth
  • File/directory exclusion patterns

Installation

From Crates.io (Recommended)

cargo install scancode-rust

Download Precompiled Binary

Download the appropriate binary for your platform from the GitHub Releases page:

  • Linux (x64): scancode-rust-x86_64-unknown-linux-gnu.tar.gz
  • Linux (ARM64): scancode-rust-aarch64-unknown-linux-gnu.tar.gz
  • macOS (Apple Silicon & Intel): scancode-rust-aarch64-apple-darwin.tar.gz
    • Intel Macs: Use Rosetta 2 for native-like performance
  • Windows: scancode-rust-x86_64-pc-windows-msvc.zip

Extract and place the binary in your system's PATH:

# Example for Linux/macOS
tar xzf scancode-rust-*.tar.gz
sudo mv scancode-rust /usr/local/bin/

Build from Source

git clone https://github.com/yourusername/scancode-rust.git
cd scancode-rust
./setup.sh  # Initialize the submodule and configure sparse checkout
cargo build --release

The compiled binary will be available at target/release/scancode-rust.

Usage

scancode-rust [OPTIONS] <DIR_PATH> --output-file <OUTPUT_FILE>

Options

Options:
  -o, --output-file <OUTPUT_FILE>    Output JSON file path
  -d, --max-depth <MAX_DEPTH>        Maximum directory depth to scan [default: 50]
  -e, --exclude <EXCLUDE>...         Glob patterns to exclude from scanning
      --no-assemble                  Disable package assembly (merging related manifest/lockfiles)
  -h, --help                         Print help
  -V, --version                      Print version

Example

scancode-rust ~/projects/my-codebase -o scan-results.json --exclude "*.git*" "target/*" "node_modules/*"

Performance

scancode-rust is designed to be significantly faster than the Python-based ScanCode Toolkit, especially for large codebases, thanks to native Rust performance and parallel processing. See Architecture: Performance Characteristics for details.

Output Format

The tool produces JSON output compatible with ScanCode Toolkit, including:

  • Scan headers with timestamp information
  • File-level data with license and metadata information
  • System environment details

Documentation

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Setting Up for Local Development

To contribute to scancode-rust, follow these steps to set up the repository for local development:

  1. Install Rust
    Ensure you have Rust installed on your system. You can install it using rustup:

    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
  2. Clone the Repository
    Clone the scancode-rust repository to your local machine:

    git clone https://github.com/mstykow/scancode-rust.git
    cd scancode-rust
  3. Initialize the License Submodule
    Use the following script to initialize the submodule and configure sparse checkout:

    ./setup.sh
  4. Install Dependencies
    Install the required Rust dependencies using cargo:

    cargo build
  5. Run Tests
    Run the test suite to ensure everything is working correctly:

    cargo test
  6. Set Up Pre-commit Hooks
    This repository uses pre-commit to run checks before each commit:

    # Using pip
    pip install pre-commit
    
    # Or using brew on macOS
    brew install pre-commit
    
    # Install the hooks
    pre-commit install
  7. Start Developing
    You can now make changes and test them locally. Use cargo run to execute the tool:

    cargo run -- [OPTIONS] <DIR_PATH>

Publishing a Release (Maintainers Only)

Releases are automated using cargo-release and GitHub Actions.

Prerequisites

One-time setup:

  1. Install cargo-release CLI tool:

    cargo install cargo-release
  2. Authenticate with crates.io (one-time only):

    cargo login

    Enter your crates.io API token when prompted. This is stored in ~/.cargo/credentials.toml and persists across sessions.

Release Process

Use the release.sh script:

# Dry-run first (recommended)
./release.sh patch

# Then execute the actual release
./release.sh patch --execute

Available release types:

  • patch: Increments the patch version (0.0.4 → 0.0.5)
  • minor: Increments the minor version (0.0.4 → 0.1.0)
  • major: Increments the major version (0.0.4 → 1.0.0)

What happens automatically:

  1. Updates SPDX license data to the latest version from upstream
  2. Commits the license data update (if changes detected)
  3. cargo-release updates the version in Cargo.toml and Cargo.lock
  4. Creates a git commit: chore: release vX.Y.Z
  5. Creates a GPG-signed git tag: vX.Y.Z
  6. Publishes to crates.io
  7. Pushes commits and tag to GitHub
  8. GitHub Actions workflow is triggered by the tag
  9. Builds binaries for all platforms:
    • Linux: x64 and ARM64
    • macOS: ARM64 (Apple Silicon, works on Intel via Rosetta 2)
    • Windows: x64
  10. Creates archives (.tar.gz/.zip) and SHA256 checksums
  11. Creates a GitHub Release with all artifacts and auto-generated release notes

Note: The release script ensures every release ships with the latest SPDX license definitions. It also handles a sparse checkout workaround for cargo-release.

Monitor the GitHub Actions workflow to verify completion.

License

This project is licensed under the Apache License 2.0.

About

A high-performance code scanning tool inspired by ScanCode.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 5

Languages