Skip to content

feat: add github/openssf scrapers#1875

Open
moshloop wants to merge 5 commits intomainfrom
security-scrapers
Open

feat: add github/openssf scrapers#1875
moshloop wants to merge 5 commits intomainfrom
security-scrapers

Conversation

@moshloop
Copy link
Member

@moshloop moshloop commented Jan 23, 2026

Summary by CodeRabbit

  • New Features

    • Added GitHub Security scraper for Dependabot, code scanning, and secret scanning alerts.
    • Added OpenSSF Scorecard scraper for repository security assessments.
    • Enhanced debugging and result merging capabilities for scraped data.
  • Bug Fixes

    • Improved error handling and validation in JSON processing and AWS Route53 configuration.
  • Chores

    • Added test fixtures and supporting infrastructure for new scrapers.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 23, 2026

Walkthrough

This PR introduces two new security-focused scrapers (GitHub Security and OpenSSF Scorecard), extends the API with enhanced result formatting methods, adds support for change redirection across configurations, and implements new CLI flags for result export and database persistence.

Changes

Cohort / File(s) Summary
GitHub Security Scraper
api/v1/github_security.go, scrapers/github/security.go, scrapers/github/security_client.go
New scraper type for monitoring GitHub repository security alerts (Dependabot, code scanning, secret scanning) with incremental scraping support and rate-limit handling.
OpenSSF Scorecard Scraper
api/v1/openssf.go, scrapers/openssf/client.go, scrapers/openssf/scorecard.go, scrapers/openssf/compliance.go
New scraper type for assessing repository security posture via OpenSSF Scorecard API with caching, retry logic, and compliance framework mappings.
Core API Extensions
api/v1/types.go, api/v1/scrapeconfig_types.go, api/context.go
Added GitHubSecurity and OpenSSFScorecard fields to ScraperSpec; introduced IsDebug() and IsTrace() accessor methods on ScrapeConfig and ScrapeContext.
Result Formatting & Aggregation
api/v1/interface.go, api/v1/common.go
New methods for rendering ScrapeResult and related types (Debug, Pretty, PrettyRow, PrettyShort); new FullScrapeResults type and MergeScrapeResults function for aggregating multiple scrape outputs.
Generated Deep Copy Methods
api/v1/zz_generated.deepcopy.go
Added DeepCopyInto and DeepCopy methods for new types (GitHubSecurity, OpenSSFScorecard, related filters and repository types) and updated ScraperSpec deep copy logic.
Change Redirection Support
scrapers/changes/rules.go, scrapers/changes/rules_test.go
Extended ChangeMapping with ConfigID, ConfigType, and ScraperID fields to support redirecting changes across configurations; added test cases for change redirection scenarios.
Scraper Registry Integration
scrapers/common.go
Registered new GitHub Security and OpenSSF Scorecard scrapers in the public All scrapers slice.
CLI and Result Output
cmd/run.go
Added \--export and \--save flags; new result merging logic that aggregates all scrape results and outputs via pretty-printing or database persistence.
Processing and Utilities
scrapers/processors/json.go, scrapers/processors/script.go, scrapers/sql/sql.go, scrapers/aws/aws.go
Enhanced error handling and logging in JSON/Script processors; added debug logging for SQL queries; minor AWS Route53 alias handling improvements.
Test Fixtures & Data
fixtures/github-security.yaml, fixtures/openssf.yaml, scrapers/github/testdata/security/*, scrapers/openssf/testdata/*
New YAML scraper configurations and JSON test datasets for GitHub security alerts (Dependabot, code scanning, secret scanning) and OpenSSF scorecard responses.

Sequence Diagram(s)

sequenceDiagram
    participant Runner as Run Command
    participant Scraper as GitHub Security<br/>Scraper
    participant Client as GitHub<br/>Security Client
    participant API as GitHub API
    participant Formatter as Result<br/>Formatter
    participant Output as Output Handler<br/>(File/DB/Print)

    Runner->>Scraper: Scrape(ctx)
    Scraper->>Client: NewGitHubSecurityClient(config)
    Scraper->>Client: Iterate repositories
    Client->>Client: CheckRateLimit()
    alt Rate Limited
        Client-->>Scraper: Wait duration
        Scraper->>Scraper: Sleep
    end
    Client->>API: GetDependabotAlerts()
    API-->>Client: Alert data
    Client->>API: GetCodeScanningAlerts()
    API-->>Client: Alert data
    Client->>API: GetSecretScanningAlerts()
    API-->>Client: Alert data
    Client-->>Scraper: Aggregated alerts
    Scraper->>Scraper: Calculate health<br/>status
    Scraper-->>Runner: ScrapeResult
    Runner->>Formatter: Render result
    Formatter-->>Output: Formatted output
Loading
sequenceDiagram
    participant Runner as Run Command
    participant Scraper as OpenSSF<br/>Scorecard Scraper
    participant Cache as ScrapeTime<br/>Cache
    participant Client as Scorecard<br/>API Client
    participant API as OpenSSF<br/>Scorecard API
    participant Processor as Result<br/>Processor

    Runner->>Scraper: Scrape(ctx)
    Scraper->>Cache: Check LastScorecardScrapeTime
    alt Cached & Valid
        Scraper-->>Runner: Skip (TTL active)
    else Fetch Required
        Scraper->>Client: GetRepositoryScorecard()
        Client->>API: GET /api/projects/{owner}/{repo}
        alt Retryable Error
            Client->>Client: Retry (up to 3x)
        end
        API-->>Client: ScorecardResponse
        Client-->>Scraper: Scorecard data
        Scraper->>Processor: Process checks
        Processor->>Processor: Filter by MinScore
        Processor->>Processor: Create analyses per check
        Processor-->>Scraper: Structured results
        Scraper->>Cache: Update LastScorecardScrapeTime
        Scraper-->>Runner: ScrapeResult[]
    end
Loading
sequenceDiagram
    participant Cmd as Run Command
    participant Scrapers as All Scrapers
    participant Merger as Result Merger
    participant Exporter as File Exporter
    participant Printer as Pretty Printer
    participant DB as Duty Database

    Cmd->>Scrapers: Execute all scrapers
    Scrapers-->>Cmd: ScrapeResults[]
    Cmd->>Merger: MergeScrapeResults(results)
    Merger->>Merger: Aggregate Configs
    Merger->>Merger: Aggregate Changes
    Merger->>Merger: Aggregate Analysis
    Merger-->>Cmd: FullScrapeResults
    alt --export flag set
        Cmd->>Exporter: Export resources to files
        Exporter-->>Cmd: Write complete
    end
    alt No --export flag
        Cmd->>Printer: Print merged results
        Printer-->>Cmd: Formatted output
    end
    alt --save flag set & DB configured
        Cmd->>DB: Persist results
        DB-->>Cmd: Save complete
        Cmd->>Printer: Print summary
    end
Loading

Possibly related PRs

  • Pr/exec #1787: Modifies ScrapeConfig-related type discrimination logic in api/v1/scrapeconfig_types.go, directly related to the changes introduced for GitHubSecurity and OpenSSFScorecard type recognition.

Suggested reviewers

  • adityathebe
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat: add github/openssf scrapers' clearly and concisely summarizes the main change—adding two new scraper implementations (GitHub Security and OpenSSF Scorecard) as evidenced by the multiple new files and type definitions across the codebase.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch security-scrapers

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
api/v1/types.go (1)

130-136: ApplyPlugin skips GitHubSecurity/OpenSSFScorecard.

New scraper specs won’t receive BaseScraper plugin settings, so plugin labels/transforms won’t apply.

🛠️ Proposed fix
 	for i := range spec.GithubActions {
 		spec.GithubActions[i].BaseScraper = spec.GithubActions[i].BaseScraper.ApplyPlugins(plugins...)
 	}
+
+	for i := range spec.GitHubSecurity {
+		spec.GitHubSecurity[i].BaseScraper = spec.GitHubSecurity[i].BaseScraper.ApplyPlugins(plugins...)
+	}
+
+	for i := range spec.OpenSSFScorecard {
+		spec.OpenSSFScorecard[i].BaseScraper = spec.OpenSSFScorecard[i].BaseScraper.ApplyPlugins(plugins...)
+	}
 
 	for i := range spec.Azure {
 		spec.Azure[i].BaseScraper = spec.Azure[i].BaseScraper.ApplyPlugins(plugins...)
 	}
🤖 Fix all issues with AI agents
In `@api/context.go`:
- Around line 161-167: IsDebug() and IsTrace() may dereference a nil
scrapeConfig; update both methods to guard against nil (and nil Spec if Spec is
a pointer) similarly to ScraperID(): check if ctx.scrapeConfig == nil (and
ctx.scrapeConfig.Spec == nil when applicable) and return false if nil, otherwise
call and return ctx.scrapeConfig.Spec.IsDebug() / IsTrace(); this prevents
panics when WithScrapeConfig() hasn't been called.

In `@api/v1/github_security.go`:
- Around line 43-52: ParseMaxAge currently calls time.ParseDuration which
doesn't accept the "d" day suffix; update GitHubSecurityFilters.ParseMaxAge to
detect a trailing "d" (case-insensitive), parse the numeric portion as an
integer or float representing days, convert to a time.Duration via days * 24 *
time.Hour, and return that; for all other inputs keep using time.ParseDuration
and propagate its error. Ensure the implementation handles empty MaxAge (already
returns 0,nil), returns a clear error on invalid numeric parsing, and references
the MaxAge field and ParseMaxAge method when making the change.

In `@api/v1/openssf.go`:
- Around line 10-11: The MinScore field currently uses *float64 which breaks
controller-gen; change MinScore to a string pointer (e.g., *string) and add
validation/tagging as needed in the struct, then update the scraper logic to
parse the string into a float64 (handle nil/parse errors) wherever MinScore is
consumed; alternatively, if you must keep a float, add the kubebuilder CRD
marker // +kubebuilder:validation:Type=number and ensure controller-gen is
invoked with crd:allowDangerousTypes=true—locate and update the MinScore field
in the api/v1/openssf.go struct for these changes.

In `@scrapers/github/security_client.go`:
- Around line 121-133: GetSecurityAdvisories currently omits pagination so
results are limited to the API default; update the request options in
GetSecurityAdvisories to copy AlertListOptions.Page and AlertListOptions.PerPage
into the github.ListRepositorySecurityAdvisoriesOptions (alongside State) before
calling c.Client.SecurityAdvisories.ListRepositorySecurityAdvisories so callers
can control Page and PerPage like the other methods (GetDependabotAlerts,
GetCodeScanningAlerts, GetSecretScanningAlerts).

In `@scrapers/github/security.go`:
- Around line 94-106: ScrapeResult.CreatedAt is typed as *time.Time but the code
passes a time.Time from repo.CreatedAt.GetTime(); fix by creating a local
variable to hold the returned time (e.g., tm := repo.CreatedAt.GetTime()) and
assign CreatedAt to its address (CreatedAt: &tm) when constructing the
v1.ScrapeResult; update the construction in the block that builds the
ScrapeResult (v1.ScrapeResult / CreatedAt) so it uses the pointer to the local
time variable.

In `@scrapers/openssf/compliance.go`:
- Around line 10-11: The file-level comment above ComplianceMappings incorrectly
cites "plugins/vite-plugin-security.js" as the data source; update the comment
to reference the correct source (e.g., the OpenSSF Scorecard documentation or
the actual mapping source used in this repo) so it accurately documents where
the compliance mapping originates; locate the top comment near the
ComplianceMappings declaration in scrapers/openssf/compliance.go and replace the
incorrect reference with the proper source name/URL (or remove the irrelevant
Vite/Vue reference) while keeping the explanatory text about mapping OpenSSF
Scorecard checks to compliance framework controls.

In `@scrapers/openssf/scorecard.go`:
- Around line 274-276: The function scorecardCheckNameToKebab is a no-op (it
replaces "-" with "-" and is unused); either remove the function completely or
implement proper kebab-case conversion and use it where needed—e.g., transform
spaces/underscores to dashes and lowercase (apply to scorecardCheckNameToKebab),
or delete the scorecardCheckNameToKebab declaration and any references to it if
no conversion is required.
🧹 Nitpick comments (4)
scrapers/github/security.go (1)

291-296: createConfigInsights is still a FIXME.

Please implement or track via a follow-up issue so the alert details aren’t silently dropped.

scrapers/github/security_client.go (1)

59-79: Empty state string passed to API unconditionally.

When opts.State is empty, you're passing &opts.State (pointer to empty string) to the API. This may cause unexpected filtering behavior depending on how the GitHub API handles empty state values.

Suggested fix to conditionally set State
 func (c *GitHubSecurityClient) GetDependabotAlerts(ctx context.Context, opts AlertListOptions) ([]*github.DependabotAlert, *github.Response, error) {
 	reqOpts := &github.ListAlertsOptions{
-		State: &opts.State,
 		ListOptions: github.ListOptions{
 			Page:    opts.Page,
 			PerPage: opts.PerPage,
 		},
 	}
 
+	if opts.State != "" {
+		reqOpts.State = &opts.State
+	}
+
 	if opts.Severity != "" {
 		reqOpts.Severity = &opts.Severity
 	}
scrapers/openssf/scorecard.go (2)

74-84: Type assertion without safety check could panic.

Line 76 uses a direct type assertion lastScrape.(time.Time) without the comma-ok pattern. If a non-time.Time value is ever stored in the map (due to a bug or future change), this will panic.

Suggested fix for safer type assertion
 func shouldSkipScorecardFetch(ctx api.ScrapeContext, repoFullName string) (bool, string) {
 	if lastScrape, ok := LastScorecardScrapeTime.Load(repoFullName); ok {
-		lastTime := lastScrape.(time.Time)
+		lastTime, ok := lastScrape.(time.Time)
+		if !ok {
+			ctx.Logger.V(3).Infof("invalid cache entry for %s, will refetch", repoFullName)
+			return false, ""
+		}
 		timeSinceLastScrape := time.Since(lastTime)

139-143: Consider if score < 10 is too strict for "failing" status.

A score of 9/10 is marked as "failing", which may be overly strict. Most scoring systems consider 7+ or 8+ as passing. This creates inconsistency with createRepositoryProperties where score >= 7 counts as passing.

Consider aligning with the passing threshold used elsewhere
-		if check.Score < 10 {
+		if check.Score < 7 {
 			analysis.Status = "failing"
 		} else {
 			analysis.Status = "passing"
 		}

Comment on lines 161 to +167
func (ctx ScrapeContext) IsTrace() bool {
return ctx.scrapeConfig.Spec.IsTrace()
}

func (ctx ScrapeContext) IsDebug() bool {
return ctx.scrapeConfig.Spec.IsDebug()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential nil pointer dereference in IsDebug() and IsTrace().

Both methods access ctx.scrapeConfig.Spec without checking if scrapeConfig is nil. Other methods like ScraperID() (lines 144-149) perform this nil check. If IsDebug() or IsTrace() is called before WithScrapeConfig(), this will panic.

Suggested fix for IsDebug (same pattern should be applied to IsTrace)
 func (ctx ScrapeContext) IsDebug() bool {
+	if ctx.scrapeConfig == nil {
+		return false
+	}
 	return ctx.scrapeConfig.Spec.IsDebug()
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (ctx ScrapeContext) IsTrace() bool {
return ctx.scrapeConfig.Spec.IsTrace()
}
func (ctx ScrapeContext) IsDebug() bool {
return ctx.scrapeConfig.Spec.IsDebug()
}
func (ctx ScrapeContext) IsTrace() bool {
return ctx.scrapeConfig.Spec.IsTrace()
}
func (ctx ScrapeContext) IsDebug() bool {
if ctx.scrapeConfig == nil {
return false
}
return ctx.scrapeConfig.Spec.IsDebug()
}
🤖 Prompt for AI Agents
In `@api/context.go` around lines 161 - 167, IsDebug() and IsTrace() may
dereference a nil scrapeConfig; update both methods to guard against nil (and
nil Spec if Spec is a pointer) similarly to ScraperID(): check if
ctx.scrapeConfig == nil (and ctx.scrapeConfig.Spec == nil when applicable) and
return false if nil, otherwise call and return ctx.scrapeConfig.Spec.IsDebug() /
IsTrace(); this prevents panics when WithScrapeConfig() hasn't been called.

Comment on lines +43 to +52
// MaxAge filters alerts by age (e.g., "90d", "30d")
MaxAge string `yaml:"maxAge,omitempty" json:"maxAge,omitempty"`
}

// ParseMaxAge converts the MaxAge string to a time.Duration
func (f GitHubSecurityFilters) ParseMaxAge() (time.Duration, error) {
if f.MaxAge == "" {
return 0, nil
}
return time.ParseDuration(f.MaxAge)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Go time.ParseDuration supported time units documentation

💡 Result:

In Go’s standard library, time.ParseDuration accepts these unit suffixes (as documented):

  • ns
  • us (or µs)
  • ms
  • s
  • m
  • h (hours)

Docs: time.ParseDuration on pkg.go.dev/time (see “Valid time units…”). [1]

Examples: 300ms, -1.5h, 2h45m, 1µs / 1us. [1]

Sources

[1] https://pkg.go.dev/time#ParseDuration


🏁 Script executed:

# Search for MaxAge documentation examples in the codebase
rg -A 5 -B 5 "MaxAge" --type go | head -100

Repository: flanksource/config-db

Length of output: 186


🏁 Script executed:

# Specifically look for "90d" or "30d" examples in comments or strings
rg "(90d|30d)"

Repository: flanksource/config-db

Length of output: 186


🏁 Script executed:

fd github_security.go --exec cat {}

Repository: flanksource/config-db

Length of output: 186


ParseMaxAge cannot handle day units—time.ParseDuration only accepts ns, us, ms, s, m, h.

If the documentation or code examples suggest using "90d" or "30d", they will fail. Either update examples to hours (e.g., 2160h for 90 days) or implement custom day parsing.

🛠️ Proposed fix (support "d" suffix)
 import (
+	"strconv"
+	"strings"
 	"time"
 
 	"github.com/flanksource/duty/types"
 )
@@
 func (f GitHubSecurityFilters) ParseMaxAge() (time.Duration, error) {
 	if f.MaxAge == "" {
 		return 0, nil
 	}
+	if strings.HasSuffix(f.MaxAge, "d") {
+		daysStr := strings.TrimSuffix(f.MaxAge, "d")
+		days, err := strconv.Atoi(daysStr)
+		if err != nil {
+			return 0, err
+		}
+		return time.Hour * 24 * time.Duration(days), nil
+	}
 	return time.ParseDuration(f.MaxAge)
 }
🤖 Prompt for AI Agents
In `@api/v1/github_security.go` around lines 43 - 52, ParseMaxAge currently calls
time.ParseDuration which doesn't accept the "d" day suffix; update
GitHubSecurityFilters.ParseMaxAge to detect a trailing "d" (case-insensitive),
parse the numeric portion as an integer or float representing days, convert to a
time.Duration via days * 24 * time.Hour, and return that; for all other inputs
keep using time.ParseDuration and propagate its error. Ensure the implementation
handles empty MaxAge (already returns 0,nil), returns a clear error on invalid
numeric parsing, and references the MaxAge field and ParseMaxAge method when
making the change.

Comment on lines +10 to +11
// MinScore optionally filters repositories by minimum score (0-10)
MinScore *float64 `yaml:"minScore,omitempty" json:"minScore,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix float type to unblock CRD generation.

The pipeline is failing because controller-gen disallows float types by default due to cross-language serialization inconsistencies. Consider using a string type with validation, or add the // +kubebuilder:validation:Type=number marker with crd:allowDangerousTypes=true in the generator invocation.

🔧 Recommended fix: Use string type with custom unmarshaling
 // OpenSSFScorecard scraper fetches security scorecard data from the OpenSSF Scorecard API
 type OpenSSFScorecard struct {
 	BaseScraper `json:",inline" yaml:",inline"`
 
 	// Repositories is the list of repositories to assess
 	Repositories []OpenSSFRepository `yaml:"repositories" json:"repositories"`
 
-	// MinScore optionally filters repositories by minimum score (0-10)
-	MinScore *float64 `yaml:"minScore,omitempty" json:"minScore,omitempty"`
+	// MinScore optionally filters repositories by minimum score (0-10)
+	// +kubebuilder:validation:Pattern=`^([0-9]|10)(\.[0-9]+)?$`
+	MinScore string `yaml:"minScore,omitempty" json:"minScore,omitempty"`
 }

Then parse the string to float64 in the scraper logic. Alternatively, if floats are truly needed, update the controller-gen invocation to include crd:allowDangerousTypes=true.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// MinScore optionally filters repositories by minimum score (0-10)
MinScore *float64 `yaml:"minScore,omitempty" json:"minScore,omitempty"`
// MinScore optionally filters repositories by minimum score (0-10)
// +kubebuilder:validation:Pattern=`^([0-9]|10)(\.[0-9]+)?$`
MinScore string `yaml:"minScore,omitempty" json:"minScore,omitempty"`
🧰 Tools
🪛 GitHub Actions: Test

[error] 11-11: controller-gen failed: CRD generation error for api/v1/openssf.go: found float, the usage of which is highly discouraged, as support for them varies across languages. Please consider serializing your float as string. If you are really sure you want to use them, re-run with crd:allowDangerousTypes=true. (Step: controller-gen crd paths="./api/..." output:crd:artifacts:config=chart/crds)

🪛 GitHub Check: lint

[failure] 11-11:
found float, the usage of which is highly discouraged, as support for them varies across languages. Please consider serializing your float as string instead. If you are really sure you want to use them, re-run with crd:allowDangerousTypes=true

🪛 GitHub Check: test

[failure] 11-11:
found float, the usage of which is highly discouraged, as support for them varies across languages. Please consider serializing your float as string instead. If you are really sure you want to use them, re-run with crd:allowDangerousTypes=true

🪛 GitHub Check: test-prod

[failure] 11-11:
found float, the usage of which is highly discouraged, as support for them varies across languages. Please consider serializing your float as string instead. If you are really sure you want to use them, re-run with crd:allowDangerousTypes=true

🤖 Prompt for AI Agents
In `@api/v1/openssf.go` around lines 10 - 11, The MinScore field currently uses
*float64 which breaks controller-gen; change MinScore to a string pointer (e.g.,
*string) and add validation/tagging as needed in the struct, then update the
scraper logic to parse the string into a float64 (handle nil/parse errors)
wherever MinScore is consumed; alternatively, if you must keep a float, add the
kubebuilder CRD marker // +kubebuilder:validation:Type=number and ensure
controller-gen is invoked with crd:allowDangerousTypes=true—locate and update
the MinScore field in the api/v1/openssf.go struct for these changes.

Comment on lines +121 to +133
// GetSecurityAdvisories fetches repository security advisories
func (c *GitHubSecurityClient) GetSecurityAdvisories(ctx context.Context, opts AlertListOptions) ([]*github.SecurityAdvisory, *github.Response, error) {
reqOpts := &github.ListRepositorySecurityAdvisoriesOptions{
State: opts.State,
}

advisories, resp, err := c.Client.SecurityAdvisories.ListRepositorySecurityAdvisories(ctx, c.owner, c.repo, reqOpts)
if err != nil {
return nil, resp, fmt.Errorf("failed to fetch security advisories: %w", err)
}

return advisories, resp, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

go-github ListRepositorySecurityAdvisoriesOptions pagination support

💡 Result:

go-github’s ListRepositorySecurityAdvisoriesOptions supports offset-based pagination the same way most “List…” calls do: via embedded github.ListOptions (page / per_page). ListOptions is the standard pagination struct in go-github and defines Page and PerPage. [1]

So you paginate like:

  • set opt.ListOptions.PerPage (and opt.ListOptions.Page starting at 1)
  • call ListRepositorySecurityAdvisories(...)
  • read resp.NextPage
  • loop until resp.NextPage == 0

This is the documented go-github pagination pattern in general. [2]
And ListRepositorySecurityAdvisoriesOptions is present as an options type alongside the advisory-listing methods. [3]

Sources
[1] (pkg.go.dev)
[2] (sources.debian.org)
[3] (pkg.go.dev)

Citations:


🏁 Script executed:

# First, let's examine the file to see the full context and compare with other alert methods
find . -name "security_client.go" -path "*/github/*" | head -5

Repository: flanksource/config-db

Length of output: 101


🏁 Script executed:

# Let's also check if there's a go.mod to see which version of go-github is being used
find . -name "go.mod" | head -1 | xargs cat | grep github

Repository: flanksource/config-db

Length of output: 22311


🏁 Script executed:

cat -n ./scrapers/github/security_client.go

Repository: flanksource/config-db

Length of output: 6316


🏁 Script executed:

# Also search for other alert-related methods to compare pagination patterns
grep -n "func.*Alert\|func.*Advisory" ./scrapers/github/security_client.go

Repository: flanksource/config-db

Length of output: 681


Add pagination support to GetSecurityAdvisories.

GetSecurityAdvisories is the only alert method missing ListOptions pagination. All other methods (GetDependabotAlerts, GetCodeScanningAlerts, GetSecretScanningAlerts) include Page and PerPage from AlertListOptions. Without pagination, results will be limited to the API's default page size.

Suggested fix
 func (c *GitHubSecurityClient) GetSecurityAdvisories(ctx context.Context, opts AlertListOptions) ([]*github.SecurityAdvisory, *github.Response, error) {
 	reqOpts := &github.ListRepositorySecurityAdvisoriesOptions{
 		State: opts.State,
+		ListOptions: github.ListOptions{
+			Page:    opts.Page,
+			PerPage: opts.PerPage,
+		},
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// GetSecurityAdvisories fetches repository security advisories
func (c *GitHubSecurityClient) GetSecurityAdvisories(ctx context.Context, opts AlertListOptions) ([]*github.SecurityAdvisory, *github.Response, error) {
reqOpts := &github.ListRepositorySecurityAdvisoriesOptions{
State: opts.State,
}
advisories, resp, err := c.Client.SecurityAdvisories.ListRepositorySecurityAdvisories(ctx, c.owner, c.repo, reqOpts)
if err != nil {
return nil, resp, fmt.Errorf("failed to fetch security advisories: %w", err)
}
return advisories, resp, nil
}
// GetSecurityAdvisories fetches repository security advisories
func (c *GitHubSecurityClient) GetSecurityAdvisories(ctx context.Context, opts AlertListOptions) ([]*github.SecurityAdvisory, *github.Response, error) {
reqOpts := &github.ListRepositorySecurityAdvisoriesOptions{
State: opts.State,
ListOptions: github.ListOptions{
Page: opts.Page,
PerPage: opts.PerPage,
},
}
advisories, resp, err := c.Client.SecurityAdvisories.ListRepositorySecurityAdvisories(ctx, c.owner, c.repo, reqOpts)
if err != nil {
return nil, resp, fmt.Errorf("failed to fetch security advisories: %w", err)
}
return advisories, resp, nil
}
🤖 Prompt for AI Agents
In `@scrapers/github/security_client.go` around lines 121 - 133,
GetSecurityAdvisories currently omits pagination so results are limited to the
API default; update the request options in GetSecurityAdvisories to copy
AlertListOptions.Page and AlertListOptions.PerPage into the
github.ListRepositorySecurityAdvisoriesOptions (alongside State) before calling
c.Client.SecurityAdvisories.ListRepositorySecurityAdvisories so callers can
control Page and PerPage like the other methods (GetDependabotAlerts,
GetCodeScanningAlerts, GetSecretScanningAlerts).

Comment on lines +94 to +106
result := v1.ScrapeResult{
Type: ConfigTypeGitHubSecurityRepo,
ID: fmt.Sprintf("github-security/%s", repoFullName),
Name: repoFullName,
ConfigClass: "Security",
Config: repo,
Tags: v1.Tags{
{Name: "owner", Value: repoConfig.Owner},
{Name: "repo", Value: repoConfig.Repo},
{Name: "security", Value: "true"},
},
CreatedAt: repo.CreatedAt.GetTime(),
Properties: properties,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix CreatedAt type mismatch.

ScrapeResult.CreatedAt expects *time.Time, but repo.CreatedAt.GetTime() returns time.Time.

🐛 Proposed fix
-	result := v1.ScrapeResult{
+	createdAt := repo.GetCreatedAt()
+	result := v1.ScrapeResult{
 		Type:        ConfigTypeGitHubSecurityRepo,
 		ID:          fmt.Sprintf("github-security/%s", repoFullName),
 		Name:        repoFullName,
 		ConfigClass: "Security",
 		Config:      repo,
 		Tags: v1.Tags{
 			{Name: "owner", Value: repoConfig.Owner},
 			{Name: "repo", Value: repoConfig.Repo},
 			{Name: "security", Value: "true"},
 		},
-		CreatedAt:  repo.CreatedAt.GetTime(),
+		CreatedAt:  &createdAt,
 		Properties: properties,
 	}
🤖 Prompt for AI Agents
In `@scrapers/github/security.go` around lines 94 - 106, ScrapeResult.CreatedAt is
typed as *time.Time but the code passes a time.Time from
repo.CreatedAt.GetTime(); fix by creating a local variable to hold the returned
time (e.g., tm := repo.CreatedAt.GetTime()) and assign CreatedAt to its address
(CreatedAt: &tm) when constructing the v1.ScrapeResult; update the construction
in the block that builds the ScrapeResult (v1.ScrapeResult / CreatedAt) so it
uses the pointer to the local time variable.

Comment on lines +10 to +11
// ComplianceMappings maps each OpenSSF Scorecard check to compliance framework controls
// Based on DataSheet reference: plugins/vite-plugin-security.js
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Incorrect source reference in documentation comment.

The comment references "plugins/vite-plugin-security.js" which appears to be from a different project (JavaScript/Vite). This should reference the actual data source for the compliance mappings.

Suggested fix
 // ComplianceMappings maps each OpenSSF Scorecard check to compliance framework controls
-// Based on DataSheet reference: plugins/vite-plugin-security.js
+// Based on OpenSSF Scorecard checks documentation
 var ComplianceMappings = map[string]ComplianceMapping{
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// ComplianceMappings maps each OpenSSF Scorecard check to compliance framework controls
// Based on DataSheet reference: plugins/vite-plugin-security.js
// ComplianceMappings maps each OpenSSF Scorecard check to compliance framework controls
// Based on OpenSSF Scorecard checks documentation
var ComplianceMappings = map[string]ComplianceMapping{
🤖 Prompt for AI Agents
In `@scrapers/openssf/compliance.go` around lines 10 - 11, The file-level comment
above ComplianceMappings incorrectly cites "plugins/vite-plugin-security.js" as
the data source; update the comment to reference the correct source (e.g., the
OpenSSF Scorecard documentation or the actual mapping source used in this repo)
so it accurately documents where the compliance mapping originates; locate the
top comment near the ComplianceMappings declaration in
scrapers/openssf/compliance.go and replace the incorrect reference with the
proper source name/URL (or remove the irrelevant Vite/Vue reference) while
keeping the explanatory text about mapping OpenSSF Scorecard checks to
compliance framework controls.

Comment on lines 274 to 276
func scorecardCheckNameToKebab(name string) string {
return strings.ToLower(strings.ReplaceAll(name, "-", "-"))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Dead code: scorecardCheckNameToKebab is a no-op.

This function replaces "-" with "-", which has no effect. It's also unused in the file. Remove it or fix the implementation if kebab-case conversion is needed.

Remove unused no-op function
-func scorecardCheckNameToKebab(name string) string {
-	return strings.ToLower(strings.ReplaceAll(name, "-", "-"))
-}

Or if actual conversion is needed (e.g., spaces/underscores to dashes):

func scorecardCheckNameToKebab(name string) string {
	return strings.ToLower(strings.ReplaceAll(name, " ", "-"))
}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func scorecardCheckNameToKebab(name string) string {
return strings.ToLower(strings.ReplaceAll(name, "-", "-"))
}
🤖 Prompt for AI Agents
In `@scrapers/openssf/scorecard.go` around lines 274 - 276, The function
scorecardCheckNameToKebab is a no-op (it replaces "-" with "-" and is unused);
either remove the function completely or implement proper kebab-case conversion
and use it where needed—e.g., transform spaces/underscores to dashes and
lowercase (apply to scorecardCheckNameToKebab), or delete the
scorecardCheckNameToKebab declaration and any references to it if no conversion
is required.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🤖 Fix all issues with AI agents
In `@api/context.go`:
- Around line 165-167: The IsDebug method on ScrapeContext can panic if
scrapeConfig is nil; update ScrapeContext.IsDebug() to guard against a nil
ctx.scrapeConfig and return false when unset, otherwise return
ctx.scrapeConfig.Spec.IsDebug(); reference the ScrapeContext type and IsDebug()
method and ensure compatibility with NewScrapeContext which may produce a
context without scrapeConfig.

In `@api/v1/github_security.go`:
- Around line 47-53: The ParseMaxAge method in GitHubSecurityFilters currently
calls time.ParseDuration which rejects "90d"/"30d" as documented; change
ParseMaxAge to accept a trailing 'd' by detecting a suffix of 'd'
(case-insensitive), parsing the numeric portion as an integer/float, converting
days to hours (days * 24h) and returning time.Duration (or fall back to
time.ParseDuration for other valid suffixes), and ensure errors are returned for
invalid inputs; update logic inside ParseMaxAge to handle both "Nd" inputs and
standard time.ParseDuration-compatible strings and reference the
GitHubSecurityFilters.ParseMaxAge function when locating the change.

In `@api/v1/interface.go`:
- Around line 815-831: The PrettyRow method (ScrapeResult.PrettyRow) currently
performs unconditional printing via clicky.MustPrint and accepts an unused
options parameter; remove the two clicky.MustPrint calls so the method only
constructs and returns the row map, and drop the unused options parameter from
the function signature (or if you prefer to keep it, ensure it's used) so
callers control printing/formatting. Update any call sites of
ScrapeResult.PrettyRow to handle printing/formatting themselves (e.g., call
clicky.MustPrint with desired FormatOptions where needed).

In `@api/v1/openssf.go`:
- Around line 10-11: The MinScore field currently uses *float64 which breaks CRD
generation; change MinScore to a safer type (e.g., *string or *int) and update
any callers/serialization accordingly: replace the declaration of MinScore
(*float64) with either *string (to store "7.5") or *int representing a scaled
value (e.g., 75 for 0.0–10.0 scaled to 0–100), keep the existing yaml/json tags,
and adjust validation/usage logic that reads MinScore (parsing string to float
or dividing the int by the scale) so consumers like validation, defaults, and
API docs remain correct; if you must keep a numeric type, add the kubebuilder
marker // +kubebuilder:validation:Type=number and rerun controller-gen with
crd:allowDangerousTypes=true (not recommended).

In `@api/v1/types.go`:
- Around line 71-74: The ScraperSpec additions (fields GitHubSecurity and
OpenSSFScorecard) are not wired into AllScraperConfigs and ApplyPlugin; update
the AllScraperConfigs map to include entries "githubsecurity": GitHubSecurity{}
and "openssfscorecard": OpenSSFScorecard{}, and modify the ApplyPlugin method to
iterate over s.GitHubSecurity and s.OpenSSFScorecard and call ApplyPlugin on
each item (mirroring the existing loops for GithubActions and Azure) so those
plugin configs are applied; reference the AllScraperConfigs map and the
ApplyPlugin method and the types GitHubSecurity and OpenSSFScorecard when
locating code to change.

In `@cmd/run.go`:
- Around line 108-125: The current code always exports/prints regardless of the
--export flag; update the branch to honor the boolean export flag (e.g.,
exportFlag or export) so the export path only runs when exportFlag is true and
the print path runs when exportFlag is false; specifically, wrap the export loop
that calls exportResource(results, outputDir) and the
logger.Infof("Exported...") inside if exportFlag { ... } and move the
clicky.MustPrint(all.*) calls into the corresponding else branch, and also
validate that when exportFlag is true an outputDir is provided (return a clear
error if empty) to avoid silent failures.

In `@scrapers/github/security_client.go`:
- Around line 23-48: In NewGitHubSecurityClient, after resolving token (the
block that sets token from ctx.HydrateConnectionByURL or
ctx.GetEnvValueFromCache), add a check that token is not empty and return a
descriptive error if it is; for example, if token == "" return
fmt.Errorf("missing GitHub token: no connection %q and no personalAccessToken
provided", config.ConnectionName or config.PersonalAccessToken) so the function
fails fast instead of creating a client that will 401 on API calls.
- Around line 121-133: GetSecurityAdvisories ignores pagination and always
fetches the first page; set the ListRepositorySecurityAdvisoriesOptions to
include the requested page and per-page values from AlertListOptions (map
opts.Page and opts.PerPage into the options used for the API call). Update the
reqOpts used in GetSecurityAdvisories to populate the ListOptions/Page and
PerPage fields (same approach as
GetDependabotAlerts/GetCodeScanningAlerts/GetSecretScanningAlerts) so the call
to c.Client.SecurityAdvisories.ListRepositorySecurityAdvisories honors
pagination.

In `@scrapers/github/security.go`:
- Around line 59-62: The log call is formatting with "%d" but is concatenating
the alert-count strings, so compute an integer total first by parsing the
individual counts from result.Properties.Find("Critical Alerts").Text, "High
Alerts", "Medium Alerts", and "Low Alerts" (use strconv.Atoi, handle errors
defaulting to 0), sum them into a total int, then call
ctx.Logger.V(2).Infof("successfully scraped %s: %d total alerts", repoFullName,
total) instead of concatenating the Text values.

In `@scrapers/processors/json.go`:
- Around line 599-607: The validation currently checks only ExternalChangeID for
standalone inputs but not ExternalID; update the validator in the same block
that inspects input.Changes (near the lo.Filter usage) to also check for any
change where ExternalID == "" when input.ID is empty (standalone change). Use
the same pattern (e.g., lo.Filter or an additional check) referencing
input.Changes, ExternalID, ExternalChangeID and input.ID, and return the same
formatted error (including e.Config and input.Debug()) when either required
field is missing.

In `@scrapers/processors/script.go`:
- Around line 33-36: The current logger.Infof call in the json.Unmarshal error
block logs the entire script output variable s (risking secrets/PII and large
dumps); update the error handling around json.Unmarshal to avoid printing the
full s — instead log the error and either a safely truncated sample (e.g., first
N characters with "…(truncated)") and the total length, or a redacted/hashed
summary of s, and consider simple redaction for common secrets (tokens,
passwords) before including any sample; modify the logger.Infof invocation that
references err and s to use the new truncated/redacted sample and length-only
information.
🧹 Nitpick comments (4)
api/v1/scrapeconfig_types.go (1)

70-76: Potential ambiguity with ScraperSpec.IsDebug() and ScraperSpec.IsTrace() methods.

ScrapeConfig.IsDebug() checks annotations while ScraperSpec.IsDebug() (in types.go lines 178-184) checks the LogLevel field. Both methods exist on related types with the same name but different behavior.

Consider documenting or clarifying the distinction, or using different method names to avoid confusion when both are accessible through the context chain.

scrapers/openssf/compliance.go (1)

110-115: Return defensive copies from GetComplianceMappings.

The returned slices share backing arrays with the global map; external mutation can alter shared mappings. Consider returning copies to keep mappings immutable (and optionally keep the map unexported if it’s not required externally).

♻️ Suggested adjustment
 func GetComplianceMappings(checkName string) ComplianceMapping {
 	if mapping, ok := ComplianceMappings[checkName]; ok {
-		return mapping
+		return ComplianceMapping{
+			SOC2:     append([]string(nil), mapping.SOC2...),
+			NISTSSDF: append([]string(nil), mapping.NISTSSDF...),
+			CIS:      append([]string(nil), mapping.CIS...),
+		}
 	}
 	return ComplianceMapping{}
 }
scrapers/openssf/client.go (2)

72-104: Respect context cancellation during retry backoff.

Line 82–86 sleeps unconditionally; if ctx is canceled, callers still block through the backoff. Consider a select on ctx.Done() to exit promptly.

♻️ Proposed fix (ctx-aware backoff)
 	for attempt := 0; attempt < maxRetries; attempt++ {
 		if attempt > 0 {
 			backoff := time.Duration(attempt*attempt) * time.Second
 			c.ctx.Logger.V(3).Infof("retrying scorecard request for %s (attempt %d/%d) after %v",
 				repoFullName, attempt+1, maxRetries, backoff)
-			time.Sleep(backoff)
+			timer := time.NewTimer(backoff)
+			select {
+			case <-ctx.Done():
+				timer.Stop()
+				return nil, ctx.Err()
+			case <-timer.C:
+			}
 		}

144-168: Retry classification is brittle; prefer typed errors and strings.Contains.

Line 149–152 relies on substring matching, and contains/findSubstring reimplements stdlib logic. Consider using errors.Is/As with net.Error for timeouts and strings.Contains for server errors, and drop custom helpers.

♻️ Proposed fix (typed errors + stdlib Contains)
 import (
 	"context"
 	"encoding/json"
+	"errors"
 	"fmt"
 	"io"
+	"net"
 	"net/http"
+	"strings"
 	"time"

 	"github.com/flanksource/config-db/api"
 )
@@
 func (c *ScorecardClient) isRetryable(err error) bool {
 	if err == nil {
 		return false
 	}
-
-	errStr := err.Error()
-	return contains(errStr, "server error") ||
-	       contains(errStr, "timeout") ||
-	       contains(errStr, "connection")
+	if errors.Is(err, context.DeadlineExceeded) {
+		return true
+	}
+	var netErr net.Error
+	if errors.As(err, &netErr) && (netErr.Timeout() || netErr.Temporary()) {
+		return true
+	}
+	return strings.Contains(err.Error(), "server error")
 }
-
-func contains(s, substr string) bool {
-	return len(s) >= len(substr) && (s == substr || len(s) > len(substr) &&
-		(s[:len(substr)] == substr || s[len(s)-len(substr):] == substr ||
-		 findSubstring(s, substr)))
-}
-
-func findSubstring(s, substr string) bool {
-	for i := 0; i <= len(s)-len(substr); i++ {
-		if s[i:i+len(substr)] == substr {
-			return true
-		}
-	}
-	return false
-}

Comment on lines +165 to +167
func (ctx ScrapeContext) IsDebug() bool {
return ctx.scrapeConfig.Spec.IsDebug()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard against nil ScrapeConfig.
NewScrapeContext can produce a context without scrapeConfig; calling IsDebug() would panic. Return false when unset.

🐛 Proposed fix
 func (ctx ScrapeContext) IsDebug() bool {
-	return ctx.scrapeConfig.Spec.IsDebug()
+	if ctx.scrapeConfig == nil {
+		return false
+	}
+	return ctx.scrapeConfig.Spec.IsDebug()
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (ctx ScrapeContext) IsDebug() bool {
return ctx.scrapeConfig.Spec.IsDebug()
}
func (ctx ScrapeContext) IsDebug() bool {
if ctx.scrapeConfig == nil {
return false
}
return ctx.scrapeConfig.Spec.IsDebug()
}
🤖 Prompt for AI Agents
In `@api/context.go` around lines 165 - 167, The IsDebug method on ScrapeContext
can panic if scrapeConfig is nil; update ScrapeContext.IsDebug() to guard
against a nil ctx.scrapeConfig and return false when unset, otherwise return
ctx.scrapeConfig.Spec.IsDebug(); reference the ScrapeContext type and IsDebug()
method and ensure compatibility with NewScrapeContext which may produce a
context without scrapeConfig.

Comment on lines +47 to +53
// ParseMaxAge converts the MaxAge string to a time.Duration
func (f GitHubSecurityFilters) ParseMaxAge() (time.Duration, error) {
if f.MaxAge == "" {
return 0, nil
}
return time.ParseDuration(f.MaxAge)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, locate and examine the github_security.go file
find . -name "github_security.go" -type f

Repository: flanksource/config-db

Length of output: 186


🏁 Script executed:

# Check if there are any documentation files or tests related to ParseMaxAge
fd -e md -e txt | head -20
git ls-files | grep -i security

Repository: flanksource/config-db

Length of output: 567


🏁 Script executed:

# Search for references to "90d" and "30d" in the codebase
rg "90d|30d" -i

Repository: flanksource/config-db

Length of output: 15027


🏁 Script executed:

# Read the github_security.go file to see the actual implementation
cat -n api/v1/github_security.go | head -60

Repository: flanksource/config-db

Length of output: 2261


🏁 Script executed:

# Search for tests related to ParseMaxAge
fd -e go -type f | xargs grep -l "ParseMaxAge" 2>/dev/null

Repository: flanksource/config-db

Length of output: 237


🌐 Web query:

Go time.ParseDuration supported format "d" day suffix

💡 Result:

Go’s standard library time.ParseDuration does not support a d (day) suffix—the only valid unit suffixes are ns, us (or µs), ms, s, m, h. [1]

The time package also intentionally provides no Day constant (“to avoid confusion across daylight savings time zone transitions”). [1]

What to do instead

  • If you mean exact 24-hour blocks, use hours: 24h, 48h, 720h (for 30 days), etc. [1]
  • If you mean calendar days (variable-length due to DST), use date math like t.AddDate(0, 0, days) rather than time.Duration.

🏁 Script executed:

# Search for tests of ParseMaxAge or any usages of it
rg "ParseMaxAge" -i -A 3 -B 3

Repository: flanksource/config-db

Length of output: 511


🏁 Script executed:

# Check if there are any existing tests for github_security
fd -name "*github*security*test*" -o -name "*test*github*security*"

Repository: flanksource/config-db

Length of output: 295


🏁 Script executed:

# Also check fixtures that might use MaxAge
grep -r "maxAge\|MaxAge" --include="*.yaml" --include="*.yml" --include="*.json"

Repository: flanksource/config-db

Length of output: 458


ParseMaxAge rejects the documented 90d/30d inputs.

Line 43 documents examples "90d" and "30d", but line 52 uses time.ParseDuration, which doesn't accept a d suffix. The only supported unit suffixes are ns, us, µs, ms, s, m, h. Either update the documentation to reflect supported formats (e.g., "2160h" for 90 days, "720h" for 30 days) or implement custom parsing to support the d suffix.

🤖 Prompt for AI Agents
In `@api/v1/github_security.go` around lines 47 - 53, The ParseMaxAge method in
GitHubSecurityFilters currently calls time.ParseDuration which rejects
"90d"/"30d" as documented; change ParseMaxAge to accept a trailing 'd' by
detecting a suffix of 'd' (case-insensitive), parsing the numeric portion as an
integer/float, converting days to hours (days * 24h) and returning time.Duration
(or fall back to time.ParseDuration for other valid suffixes), and ensure errors
are returned for invalid inputs; update logic inside ParseMaxAge to handle both
"Nd" inputs and standard time.ParseDuration-compatible strings and reference the
GitHubSecurityFilters.ParseMaxAge function when locating the change.

Comment on lines +815 to +831
func (s ScrapeResult) PrettyRow(options interface{}) map[string]api.Text {
row := make(map[string]api.Text)
row["ID"] = clicky.Text(s.ID)
row["Name"] = clicky.Text(s.Name)
row["Type"] = clicky.Text(s.Type)
row["Health"] = clicky.Text(string(s.Health))
row["Status"] = clicky.Text(s.Status)
if s.Error != nil {
row["Error"] = clicky.Text(s.Error.Error())
} else {
row["Error"] = clicky.Text("")
}
clicky.MustPrint([]map[string]api.Text{row}, clicky.FormatOptions{JSON: true})
clicky.MustPrint([]map[string]api.Text{row}, clicky.FormatOptions{Pretty: true})

return row
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find and read the file to confirm the current state
fd -t f "interface.go" api/v1/ 2>/dev/null | head -5

Repository: flanksource/config-db

Length of output: 84


🏁 Script executed:

# Read the specific section of the file
wc -l api/v1/interface.go

Repository: flanksource/config-db

Length of output: 89


🏁 Script executed:

# Get the exact content around lines 815-831
sed -n '810,835p' api/v1/interface.go

Repository: flanksource/config-db

Length of output: 740


🏁 Script executed:

# Search for all usages of PrettyRow in the codebase
rg "PrettyRow" -B 2 -A 2 --type go

Repository: flanksource/config-db

Length of output: 706


🏁 Script executed:

# Check the ScrapeResult type definition and its interface
ast-grep --pattern 'type ScrapeResult struct { $$$ }'

Repository: flanksource/config-db

Length of output: 5467


🏁 Script executed:

# Look for any interface that PrettyRow might be implementing
rg "PrettyRow" --type go -B 5 -A 5

Repository: flanksource/config-db

Length of output: 1339


🏁 Script executed:

# Search for PrettyRow interface definition
rg "type PrettyRow" --type go -B 2 -A 5

Repository: flanksource/config-db

Length of output: 47


🏁 Script executed:

# Search for all places where PrettyRow method is called (not defined)
rg "\.PrettyRow\(" --type go -B 2 -A 2

Repository: flanksource/config-db

Length of output: 47


🏁 Script executed:

# Search for other types that implement PrettyRow method
rg "func \(.+\) PrettyRow" --type go

Repository: flanksource/config-db

Length of output: 159


🏁 Script executed:

# Check if there are any tests for this method
rg "PrettyRow" --type go | grep -i test

Repository: flanksource/config-db

Length of output: 47


🏁 Script executed:

# Search for api.PrettyRow usage
rg "api\.PrettyRow" --type go -B 2 -A 2

Repository: flanksource/config-db

Length of output: 460


🏁 Script executed:

# Search for PrettyRow in the api package definition
fd -t f "\.go" | xargs grep -l "type PrettyRow" 2>/dev/null

Repository: flanksource/config-db

Length of output: 49


🏁 Script executed:

# Search more broadly for what might call PrettyRow
rg "PrettyRow" --type go

Repository: flanksource/config-db

Length of output: 254


🏁 Script executed:

# Check the api package imports and see what PrettyRow is
rg "PrettyRow" -B 5 -A 5 --type go

Repository: flanksource/config-db

Length of output: 1339


🏁 Script executed:

# Look for the clicky package to understand MustPrint behavior
rg "clicky\.MustPrint" --type go -B 2 -A 2 | head -50

Repository: flanksource/config-db

Length of output: 636


🏁 Script executed:

# Look for interface PrettyRow definition in vendor or api packages
rg "interface.*PrettyRow\|type.*PrettyRow" --type go -B 2 -A 5

Repository: flanksource/config-db

Length of output: 47


🏁 Script executed:

# Check if ScrapeResult is used as api.PrettyRow (interface check)
rg "var.*api\.PrettyRow" --type go -B 2 -A 2

Repository: flanksource/config-db

Length of output: 47


🏁 Script executed:

# Look for any reflection or dynamic calls to PrettyRow
rg '"PrettyRow"' --type go

Repository: flanksource/config-db

Length of output: 47


🏁 Script executed:

# Search in the api package directly
fd -type f -path "*/api*" -name "*.go" | head -20

Repository: flanksource/config-db

Length of output: 235


🏁 Script executed:

# Look for where Configs field is populated/used from FullScrapeResults
rg "FullScrapeResults" --type go -B 3 -A 3

Repository: flanksource/config-db

Length of output: 934


🏁 Script executed:

# Check imports in interface.go to understand the api package better
head -50 api/v1/interface.go | grep -E "^import|^package"

Repository: flanksource/config-db

Length of output: 84


🏁 Script executed:

# Find where api.PrettyRow is defined - check if it's in a different package
find . -name "*.go" -type f -exec grep -l "type PrettyRow" {} \;

Repository: flanksource/config-db

Length of output: 47


🏁 Script executed:

# Check the full api/v1 directory structure
ls -la api/v1/

Repository: flanksource/config-db

Length of output: 2174


🏁 Script executed:

# Look for api package definition more broadly
find . -path "*/api/*" -name "*.go" -type f | xargs grep -l "PrettyRow" 2>/dev/null | head -10

Repository: flanksource/config-db

Length of output: 86


🏁 Script executed:

# Check if api refers to external package by looking at imports
rg "^import" -A 30 api/v1/interface.go | head -50

Repository: flanksource/config-db

Length of output: 855


🏁 Script executed:

# Search for where ScrapeResult might be assigned to api.PrettyRow
rg "api\.PrettyRow\|ScrapeResult" --type go | grep -E "(Configs|api\.PrettyRow)" | head -20

Repository: flanksource/config-db

Length of output: 47


🏁 Script executed:

# Search for how Configs from FullScrapeResults is used
rg "\.Configs" --type go -B 2 -A 2 | head -50

Repository: flanksource/config-db

Length of output: 408


🏁 Script executed:

# Check for MergeScrapeResults function to see if it converts to api.PrettyRow
rg "MergeScrapeResults" --type go -B 3 -A 20

Repository: flanksource/config-db

Length of output: 2156


🏁 Script executed:

# Check if there's a conversion or assignment from ScrapeResult to api.PrettyRow
rg "api\.PrettyRow" --type go -B 5 -A 5

Repository: flanksource/config-db

Length of output: 842


🏁 Script executed:

# Look for range over Configs to see how api.PrettyRow items are used
rg "for.*Configs" --type go -B 2 -A 5

Repository: flanksource/config-db

Length of output: 6010


🏁 Script executed:

# Look at the full MergeScrapeResults function to understand how Configs is populated
sed -n '850,950p' api/v1/interface.go

Repository: flanksource/config-db

Length of output: 2299


🏁 Script executed:

# Check how clicky package uses PrettyRow interface
python3 << 'EOF'
import subprocess
result = subprocess.run(['rg', 'api.PrettyRow', '-B 5', '-A 5', '--type', 'go'], 
                       capture_output=True, text=True)
print(result.stdout)
if result.returncode != 0:
    print(f"Error: {result.stderr}")
EOF

Repository: flanksource/config-db

Length of output: 155


🏁 Script executed:

# Search for the clicky api.PrettyRow interface definition
python3 << 'EOF'
# Let's look at what happens when clicky.MustPrint gets Configs
# and understand the actual flow
import subprocess

# First find usage
result = subprocess.run(['rg', 'MustPrint.*Configs', '-B 3', '-A 3', '--type', 'go'], 
                       capture_output=True, text=True)
print("=== MustPrint with Configs ===")
print(result.stdout)

# Now search for other PrettyRow implementations
result2 = subprocess.run(['rg', 'func.*PrettyRow', '--type', 'go'], 
                        capture_output=True, text=True)
print("\n=== All PrettyRow implementations ===")
print(result2.stdout)
EOF

Repository: flanksource/config-db

Length of output: 136


🏁 Script executed:

# Let's examine the complete MergeScrapeResults function
sed -n '846,900p' api/v1/interface.go

Repository: flanksource/config-db

Length of output: 1194


🏁 Script executed:

# Check for any explicit calls to PrettyRow method
rg "\.PrettyRow\(" --type go

Repository: flanksource/config-db

Length of output: 47


🏁 Script executed:

# Look at the actual line where r is appended to full.Configs
sed -n '875,895p' api/v1/interface.go

Repository: flanksource/config-db

Length of output: 572


🏁 Script executed:

# Check if api.PrettyRow is an interface
python3 << 'EOF'
# Search for interface definition in the repository
import subprocess
result = subprocess.run(['rg', 'interface.*{', '--type', 'go', '-A', '5'], 
                       capture_output=True, text=True)
lines = result.stdout.split('\n')
# Filter for PrettyRow related interfaces
for i, line in enumerate(lines):
    if 'PrettyRow' in line or (i > 0 and 'PrettyRow' in '\n'.join(lines[max(0,i-5):min(len(lines),i+5)])):
        print(line)
EOF

Repository: flanksource/config-db

Length of output: 47


🏁 Script executed:

# Let's look more carefully at the complete flow - from MergeScrapeResults function
cat api/v1/interface.go | grep -n "func MergeScrapeResults" -A 60

Repository: flanksource/config-db

Length of output: 2127


🏁 Script executed:

# Search for where api.PrettyRow interface might be defined
python3 << 'EOF'
import subprocess

# Search for PrettyRow in all contexts
result = subprocess.run(['rg', 'PrettyRow', '--type', 'go', '-C', '3'], 
                       capture_output=True, text=True)
print(result.stdout[:3000])
EOF

Repository: flanksource/config-db

Length of output: 47


🏁 Script executed:

# Check if there's any explicit call to PrettyRow method with debugging
grep -rn "\.PrettyRow(" . --include="*.go" 2>/dev/null || echo "No explicit calls found"

Repository: flanksource/config-db

Length of output: 88


🏁 Script executed:

# Let me check the actual behavior - look at cmd/run.go to understand the flow
cat cmd/run.go | grep -B 10 -A 10 "clicky.MustPrint(all.Configs)"

Repository: flanksource/config-db

Length of output: 615


🏁 Script executed:

# Look for what happens in clicky when MustPrint is called on PrettyRow items
# Check vendor or go.mod for clicky version
cat go.mod | grep clicky

Repository: flanksource/config-db

Length of output: 115


🏁 Script executed:

# Let's check what api.PrettyRow actually is - look for interface definition in imported packages
python3 << 'EOF'
import subprocess

# Search for any interface that might define PrettyRow
result = subprocess.run(['rg', 'type.*PrettyRow.*interface', '--type', 'go'], 
                       capture_output=True, text=True)
if result.stdout:
    print("Found interface definition:")
    print(result.stdout)
else:
    print("No interface definition found in local code")

# Check if it's imported from clicky
result2 = subprocess.run(['rg', 'from.*clicky', '--type', 'go'], 
                        capture_output=True, text=True)
print("\nClicky imports:")
print(result2.stdout[:1000] if result2.stdout else "None")
EOF

Repository: flanksource/config-db

Length of output: 130


🏁 Script executed:

# Let me search the go.mod to see if there's a way to understand the clicky interface better
# First check if we can find any documentation or usage patterns
rg "clicky" -B 2 -A 2 --type go | grep -A 5 -B 5 "PrettyRow\|interface"

Repository: flanksource/config-db

Length of output: 3480


🏁 Script executed:

# Let's look at what the review is actually saying - check if PrettyRow is being called implicitly
# by clicky when it encounters api.PrettyRow types
python3 << 'EOF'
# Since ScrapeResult is assigned to api.PrettyRow slice and no explicit calls found,
# PrettyRow must be a method that clicky calls automatically.
# Let's verify if clicky.MustPrint triggers the PrettyRow method

# Check the actual flow: when does the PrettyRow method get called?
import subprocess

# Look for usage of clicky.MustPrint with different types
result = subprocess.run(['rg', 'clicky.MustPrint', '-B 2', '-A 2', '--type', 'go'], 
                       capture_output=True, text=True)
print("All clicky.MustPrint calls:")
print(result.stdout)
EOF

Repository: flanksource/config-db

Length of output: 93


Remove unconditional prints from PrettyRow method.

The two clicky.MustPrint calls on lines 827–828 execute whenever this method is invoked, causing duplicate output and preventing callers from controlling formatting. The options parameter is also unused. Let the caller decide when and how to print by returning only the row data.

Suggested fix
 func (s ScrapeResult) PrettyRow(options interface{}) map[string]api.Text {
 	row := make(map[string]api.Text)
 	row["ID"] = clicky.Text(s.ID)
 	row["Name"] = clicky.Text(s.Name)
 	row["Type"] = clicky.Text(s.Type)
 	row["Health"] = clicky.Text(string(s.Health))
 	row["Status"] = clicky.Text(s.Status)
 	if s.Error != nil {
 		row["Error"] = clicky.Text(s.Error.Error())
 	} else {
 		row["Error"] = clicky.Text("")
 	}
-	clicky.MustPrint([]map[string]api.Text{row}, clicky.FormatOptions{JSON: true})
-	clicky.MustPrint([]map[string]api.Text{row}, clicky.FormatOptions{Pretty: true})
-
 	return row
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (s ScrapeResult) PrettyRow(options interface{}) map[string]api.Text {
row := make(map[string]api.Text)
row["ID"] = clicky.Text(s.ID)
row["Name"] = clicky.Text(s.Name)
row["Type"] = clicky.Text(s.Type)
row["Health"] = clicky.Text(string(s.Health))
row["Status"] = clicky.Text(s.Status)
if s.Error != nil {
row["Error"] = clicky.Text(s.Error.Error())
} else {
row["Error"] = clicky.Text("")
}
clicky.MustPrint([]map[string]api.Text{row}, clicky.FormatOptions{JSON: true})
clicky.MustPrint([]map[string]api.Text{row}, clicky.FormatOptions{Pretty: true})
return row
}
func (s ScrapeResult) PrettyRow(options interface{}) map[string]api.Text {
row := make(map[string]api.Text)
row["ID"] = clicky.Text(s.ID)
row["Name"] = clicky.Text(s.Name)
row["Type"] = clicky.Text(s.Type)
row["Health"] = clicky.Text(string(s.Health))
row["Status"] = clicky.Text(s.Status)
if s.Error != nil {
row["Error"] = clicky.Text(s.Error.Error())
} else {
row["Error"] = clicky.Text("")
}
return row
}
🤖 Prompt for AI Agents
In `@api/v1/interface.go` around lines 815 - 831, The PrettyRow method
(ScrapeResult.PrettyRow) currently performs unconditional printing via
clicky.MustPrint and accepts an unused options parameter; remove the two
clicky.MustPrint calls so the method only constructs and returns the row map,
and drop the unused options parameter from the function signature (or if you
prefer to keep it, ensure it's used) so callers control printing/formatting.
Update any call sites of ScrapeResult.PrettyRow to handle printing/formatting
themselves (e.g., call clicky.MustPrint with desired FormatOptions where
needed).

Comment on lines +10 to +11
// MinScore optionally filters repositories by minimum score (0-10)
MinScore *float64 `yaml:"minScore,omitempty" json:"minScore,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix float type to resolve pipeline failures.

The *float64 type causes CRD generation to fail. Kubernetes controller-gen discourages floats due to inconsistent serialization across languages.

Consider using a string type or an integer representing a scaled value (e.g., 0-100 instead of 0.0-10.0):

🔧 Proposed fix using string type
-	// MinScore optionally filters repositories by minimum score (0-10)
-	MinScore *float64 `yaml:"minScore,omitempty" json:"minScore,omitempty"`
+	// MinScore optionally filters repositories by minimum score (0-10), specified as a string (e.g., "7.5")
+	MinScore *string `yaml:"minScore,omitempty" json:"minScore,omitempty"`

Alternatively, if you prefer to keep the float type, add // +kubebuilder:validation:Type=number marker and re-run controller-gen with crd:allowDangerousTypes=true, though this is not recommended for portability.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// MinScore optionally filters repositories by minimum score (0-10)
MinScore *float64 `yaml:"minScore,omitempty" json:"minScore,omitempty"`
// MinScore optionally filters repositories by minimum score (0-10), specified as a string (e.g., "7.5")
MinScore *string `yaml:"minScore,omitempty" json:"minScore,omitempty"`
🧰 Tools
🪛 GitHub Actions: Lint

[error] 11-11: controller-gen CRD generation failed: found float at line 11:12 in api/v1/openssf.go. Serializing floats as strings is recommended; or re-run with crd:allowDangerousTypes=true.

🪛 GitHub Actions: Test

[error] 11-11: controller-gen CRD generation failed: found float, the usage of which is highly discouraged, as support for them varies across languages. Please consider serializing your float as string instead. If you are really sure you want to use them, re-run with crd:allowDangerousTypes=true

🪛 GitHub Check: lint

[failure] 11-11:
found float, the usage of which is highly discouraged, as support for them varies across languages. Please consider serializing your float as string instead. If you are really sure you want to use them, re-run with crd:allowDangerousTypes=true

🪛 GitHub Check: test

[failure] 11-11:
found float, the usage of which is highly discouraged, as support for them varies across languages. Please consider serializing your float as string instead. If you are really sure you want to use them, re-run with crd:allowDangerousTypes=true

🪛 GitHub Check: test-prod

[failure] 11-11:
found float, the usage of which is highly discouraged, as support for them varies across languages. Please consider serializing your float as string instead. If you are really sure you want to use them, re-run with crd:allowDangerousTypes=true

🤖 Prompt for AI Agents
In `@api/v1/openssf.go` around lines 10 - 11, The MinScore field currently uses
*float64 which breaks CRD generation; change MinScore to a safer type (e.g.,
*string or *int) and update any callers/serialization accordingly: replace the
declaration of MinScore (*float64) with either *string (to store "7.5") or *int
representing a scaled value (e.g., 75 for 0.0–10.0 scaled to 0–100), keep the
existing yaml/json tags, and adjust validation/usage logic that reads MinScore
(parsing string to float or dividing the int by the scale) so consumers like
validation, defaults, and API docs remain correct; if you must keep a numeric
type, add the kubebuilder marker // +kubebuilder:validation:Type=number and
rerun controller-gen with crd:allowDangerousTypes=true (not recommended).

Comment on lines +71 to +74
GithubActions []GitHubActions `json:"githubActions,omitempty" yaml:"githubActions,omitempty"`
GitHubSecurity []GitHubSecurity `json:"githubSecurity,omitempty" yaml:"githubSecurity,omitempty"`
OpenSSFScorecard []OpenSSFScorecard `json:"openssfScorecard,omitempty" yaml:"openssfScorecard,omitempty"`
Azure []Azure `json:"azure,omitempty" yaml:"azure,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing entries in AllScraperConfigs map and ApplyPlugin method.

The new GitHubSecurity and OpenSSFScorecard fields are added to ScraperSpec, but the corresponding entries are missing from:

  1. AllScraperConfigs map (lines 15-31) - should include "githubsecurity": GitHubSecurity{} and "openssfscorecard": OpenSSFScorecard{}
  2. ApplyPlugin method (lines 97-167) - should include plugin application loops for both new types, similar to other scrapers
Proposed additions

Add to AllScraperConfigs map:

 var AllScraperConfigs = map[string]any{
 	"aws":            AWS{},
 	"azure":          Azure{},
 	"azuredevops":    AzureDevops{},
 	"exec":           Exec{},
 	"file":           File{},
 	"gcp":            GCP{},
 	"githubactions":  GitHubActions{},
+	"githubsecurity": GitHubSecurity{},
 	"http":           HTTP{},
 	"kubernetes":     Kubernetes{},
 	"kubernetesfile": KubernetesFile{},
 	"logs":           Logs{},
+	"openssfscorecard": OpenSSFScorecard{},
 	"slack":          Slack{},
 	"sql":            SQL{},
 	"terraform":      Terraform{},
 	"trivy":          Trivy{},
 }

Add to ApplyPlugin method (after line 132):

 	for i := range spec.GithubActions {
 		spec.GithubActions[i].BaseScraper = spec.GithubActions[i].BaseScraper.ApplyPlugins(plugins...)
 	}
+
+	for i := range spec.GitHubSecurity {
+		spec.GitHubSecurity[i].BaseScraper = spec.GitHubSecurity[i].BaseScraper.ApplyPlugins(plugins...)
+	}
+
+	for i := range spec.OpenSSFScorecard {
+		spec.OpenSSFScorecard[i].BaseScraper = spec.OpenSSFScorecard[i].BaseScraper.ApplyPlugins(plugins...)
+	}
🤖 Prompt for AI Agents
In `@api/v1/types.go` around lines 71 - 74, The ScraperSpec additions (fields
GitHubSecurity and OpenSSFScorecard) are not wired into AllScraperConfigs and
ApplyPlugin; update the AllScraperConfigs map to include entries
"githubsecurity": GitHubSecurity{} and "openssfscorecard": OpenSSFScorecard{},
and modify the ApplyPlugin method to iterate over s.GitHubSecurity and
s.OpenSSFScorecard and call ApplyPlugin on each item (mirroring the existing
loops for GithubActions and Azure) so those plugin configs are applied;
reference the AllScraperConfigs map and the ApplyPlugin method and the types
GitHubSecurity and OpenSSFScorecard when locating code to change.

Comment on lines +23 to +48
func NewGitHubSecurityClient(ctx api.ScrapeContext, config v1.GitHubSecurity, owner, repo string) (*GitHubSecurityClient, error) {
var token string
if connection, err := ctx.HydrateConnectionByURL(config.ConnectionName); err != nil {
return nil, fmt.Errorf("failed to hydrate connection: %w", err)
} else if connection != nil {
token = connection.Password
} else {
token, err = ctx.GetEnvValueFromCache(config.PersonalAccessToken, ctx.Namespace())
if err != nil {
return nil, fmt.Errorf("failed to get token: %w", err)
}
}

ts := oauth2.StaticTokenSource(
&oauth2.Token{AccessToken: token},
)
tc := oauth2.NewClient(ctx, ts)
client := github.NewClient(tc)

return &GitHubSecurityClient{
ScrapeContext: ctx,
Client: client,
owner: owner,
repo: repo,
}, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

GitHub REST API security alerts code scanning secret scanning authentication requirements

💡 Result:

GitHub REST API: security alerts (code scanning, secret scanning, Dependabot) — auth & access requirements

1) Token types you can use

  • OAuth App token or PAT (classic): supported broadly across these alert APIs (see per-endpoint scope requirements below).
  • Fine-grained tokens / GitHub App tokens: supported for most repo/org alert endpoints, but not for some enterprise-level Dependabot alert endpoints. For example, “List Dependabot alerts for an enterprise” does not work with fine-grained PATs or GitHub App tokens. [3]

2) Scopes / permissions (classic PATs & OAuth tokens)

  • Code scanning alerts: requires security_events (private or public), or public_repo if only accessing public repos. [1]
  • Secret scanning alerts: requires repo or security_events (and public_repo can be sufficient if only accessing public repos). [2]
  • Dependabot alerts: commonly requires security_events (and public_repo can be sufficient if only accessing public repos). Some endpoints (e.g., enterprise-level) accept repo or security_events. [3]

3) Fine-grained PAT / GitHub App permission names (repo-scoped)

  • Code scanning: "Code scanning alerts" repo permission (read for GET/list, write for update). [1]
  • Secret scanning: "Secret scanning alerts" repo permission (read/write depending on endpoint). [2]
  • Dependabot: "Dependabot alerts" repo permission (read/write depending on endpoint). [3]

4) Who is allowed to call the endpoints (role / access gating)

Even with the right token scope/permissions, the caller must have sufficient access:

  • Repository alert visibility/access is generally available to people with write/maintain/admin on the repo (and org owners for org-owned repos). GitHub also lets repo admins/org owners grant additional alert access (with constraints). [4]
  • Some alert endpoints explicitly require elevated roles, e.g.:
    • Secret scanning (org-level listing): caller must be an org admin or security manager. [2]
    • Secret scanning (repo-level listing): caller must be a repo admin (or admin of the owning org). [2]
    • Dependabot (org-level listing): caller must be an org owner or security manager. [3]
    • Dependabot (enterprise-level listing): caller must be an enterprise member, and results are limited to orgs where the caller is an org owner or security manager. [3]

Sources: GitHub REST API docs for Code Scanning [1], Secret Scanning [2], Dependabot Alerts [3], and “Managing security and analysis settings… / Granting access to security alerts” [4].


🏁 Script executed:

find . -name "security_client.go" -type f | head -5

Repository: flanksource/config-db

Length of output: 101


🏁 Script executed:

# Look for the full security_client.go file to understand context
fd security_client.go --type f

Repository: flanksource/config-db

Length of output: 99


🏁 Script executed:

cat -n scrapers/github/security_client.go

Repository: flanksource/config-db

Length of output: 6316


🏁 Script executed:

# Check if there are other places in the codebase that use this client
# to understand the broader context
rg "NewGitHubSecurityClient|GitHubSecurityClient" --type go -A 2 -B 2

Repository: flanksource/config-db

Length of output: 5100


🏁 Script executed:

# Check if token validation patterns exist elsewhere in this project
rg "token == \"\"" --type go

Repository: flanksource/config-db

Length of output: 47


Add token validation to fail fast on missing credentials.

Empty token values proceed silently to client creation, causing all subsequent security alert API calls (Dependabot, code scanning, secret scanning, advisories) to fail with 401 Unauthorized. Validate that a token exists and return an explicit error if neither connection nor personalAccessToken config provides one.

🐛 Proposed fix
 	} else {
 		token, err = ctx.GetEnvValueFromCache(config.PersonalAccessToken, ctx.Namespace())
 		if err != nil {
 			return nil, fmt.Errorf("failed to get token: %w", err)
 		}
 	}
+
+	if token == "" {
+		return nil, fmt.Errorf("github personal access token is required (connection or personalAccessToken)")
+	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func NewGitHubSecurityClient(ctx api.ScrapeContext, config v1.GitHubSecurity, owner, repo string) (*GitHubSecurityClient, error) {
var token string
if connection, err := ctx.HydrateConnectionByURL(config.ConnectionName); err != nil {
return nil, fmt.Errorf("failed to hydrate connection: %w", err)
} else if connection != nil {
token = connection.Password
} else {
token, err = ctx.GetEnvValueFromCache(config.PersonalAccessToken, ctx.Namespace())
if err != nil {
return nil, fmt.Errorf("failed to get token: %w", err)
}
}
ts := oauth2.StaticTokenSource(
&oauth2.Token{AccessToken: token},
)
tc := oauth2.NewClient(ctx, ts)
client := github.NewClient(tc)
return &GitHubSecurityClient{
ScrapeContext: ctx,
Client: client,
owner: owner,
repo: repo,
}, nil
}
func NewGitHubSecurityClient(ctx api.ScrapeContext, config v1.GitHubSecurity, owner, repo string) (*GitHubSecurityClient, error) {
var token string
if connection, err := ctx.HydrateConnectionByURL(config.ConnectionName); err != nil {
return nil, fmt.Errorf("failed to hydrate connection: %w", err)
} else if connection != nil {
token = connection.Password
} else {
token, err = ctx.GetEnvValueFromCache(config.PersonalAccessToken, ctx.Namespace())
if err != nil {
return nil, fmt.Errorf("failed to get token: %w", err)
}
}
if token == "" {
return nil, fmt.Errorf("github personal access token is required (connection or personalAccessToken)")
}
ts := oauth2.StaticTokenSource(
&oauth2.Token{AccessToken: token},
)
tc := oauth2.NewClient(ctx, ts)
client := github.NewClient(tc)
return &GitHubSecurityClient{
ScrapeContext: ctx,
Client: client,
owner: owner,
repo: repo,
}, nil
}
🤖 Prompt for AI Agents
In `@scrapers/github/security_client.go` around lines 23 - 48, In
NewGitHubSecurityClient, after resolving token (the block that sets token from
ctx.HydrateConnectionByURL or ctx.GetEnvValueFromCache), add a check that token
is not empty and return a descriptive error if it is; for example, if token ==
"" return fmt.Errorf("missing GitHub token: no connection %q and no
personalAccessToken provided", config.ConnectionName or
config.PersonalAccessToken) so the function fails fast instead of creating a
client that will 401 on API calls.

Comment on lines +121 to +133
// GetSecurityAdvisories fetches repository security advisories
func (c *GitHubSecurityClient) GetSecurityAdvisories(ctx context.Context, opts AlertListOptions) ([]*github.SecurityAdvisory, *github.Response, error) {
reqOpts := &github.ListRepositorySecurityAdvisoriesOptions{
State: opts.State,
}

advisories, resp, err := c.Client.SecurityAdvisories.ListRepositorySecurityAdvisories(ctx, c.owner, c.repo, reqOpts)
if err != nil {
return nil, resp, fmt.Errorf("failed to fetch security advisories: %w", err)
}

return advisories, resp, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "*.go" -type f | head -20

Repository: flanksource/config-db

Length of output: 465


🏁 Script executed:

# Search for AlertListOptions definition
rg "type AlertListOptions" -A 10

Repository: flanksource/config-db

Length of output: 857


🏁 Script executed:

# Search for usage of ListRepositorySecurityAdvisoriesOptions
rg "ListRepositorySecurityAdvisoriesOptions" -B 2 -A 5

Repository: flanksource/config-db

Length of output: 782


🏁 Script executed:

# Check the security_client.go file structure
cat -n scrapers/github/security_client.go | head -150

Repository: flanksource/config-db

Length of output: 5681


🌐 Web query:

go-github ListRepositorySecurityAdvisoriesOptions ListOptions pagination

💡 Result:

In go-github, pagination for “list” endpoints is done with github.ListOptions (Page, PerPage). You pass it either directly or via embedding in a more specific *Options struct, then loop using resp.NextPage. [1], [2]

So for ListRepositorySecurityAdvisoriesOptions, you paginate the same way (GitHub’s REST endpoint also supports page / per_page). [1], [3]

opt := &github.ListRepositorySecurityAdvisoriesOptions{
    ListOptions: github.ListOptions{PerPage: 100},
}

for {
    advisories, resp, err := client.SecurityAdvisories.
        ListRepositorySecurityAdvisories(ctx, owner, repo, opt)
    if err != nil { return err }

    // ... use advisories ...

    if resp.NextPage == 0 {
        break
    }
    opt.Page = resp.NextPage // (or opt.ListOptions.Page, depending on embedding)
}

Sources: [1] https://sources.debian.org/src/golang-github-google-go-github/48.1.0-2/README.md (Pagination section) • [2] https://pkg.go.dev/github.com/google/go-github/v30/github (ListOptions) • [3] https://docs.github.com/en/rest/security-advisories/repository-advisories (List repository security advisories)


Pagination options ignored for security advisories.

opts.Page and opts.PerPage are not applied, so only the first page is fetched. This creates a data loss risk for repositories with many advisories. All other alert methods in this file (GetDependabotAlerts, GetCodeScanningAlerts, GetSecretScanningAlerts) correctly include ListOptions; this method should follow the same pattern.

Proposed fix
 	reqOpts := &github.ListRepositorySecurityAdvisoriesOptions{
 		State: opts.State,
+		ListOptions: github.ListOptions{
+			Page:    opts.Page,
+			PerPage: opts.PerPage,
+		},
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// GetSecurityAdvisories fetches repository security advisories
func (c *GitHubSecurityClient) GetSecurityAdvisories(ctx context.Context, opts AlertListOptions) ([]*github.SecurityAdvisory, *github.Response, error) {
reqOpts := &github.ListRepositorySecurityAdvisoriesOptions{
State: opts.State,
}
advisories, resp, err := c.Client.SecurityAdvisories.ListRepositorySecurityAdvisories(ctx, c.owner, c.repo, reqOpts)
if err != nil {
return nil, resp, fmt.Errorf("failed to fetch security advisories: %w", err)
}
return advisories, resp, nil
}
// GetSecurityAdvisories fetches repository security advisories
func (c *GitHubSecurityClient) GetSecurityAdvisories(ctx context.Context, opts AlertListOptions) ([]*github.SecurityAdvisory, *github.Response, error) {
reqOpts := &github.ListRepositorySecurityAdvisoriesOptions{
State: opts.State,
ListOptions: github.ListOptions{
Page: opts.Page,
PerPage: opts.PerPage,
},
}
advisories, resp, err := c.Client.SecurityAdvisories.ListRepositorySecurityAdvisories(ctx, c.owner, c.repo, reqOpts)
if err != nil {
return nil, resp, fmt.Errorf("failed to fetch security advisories: %w", err)
}
return advisories, resp, nil
}
🤖 Prompt for AI Agents
In `@scrapers/github/security_client.go` around lines 121 - 133,
GetSecurityAdvisories ignores pagination and always fetches the first page; set
the ListRepositorySecurityAdvisoriesOptions to include the requested page and
per-page values from AlertListOptions (map opts.Page and opts.PerPage into the
options used for the API call). Update the reqOpts used in GetSecurityAdvisories
to populate the ListOptions/Page and PerPage fields (same approach as
GetDependabotAlerts/GetCodeScanningAlerts/GetSecretScanningAlerts) so the call
to c.Client.SecurityAdvisories.ListRepositorySecurityAdvisories honors
pagination.

Comment on lines +59 to +62
results = append(results, result)
ctx.Logger.V(2).Infof("successfully scraped %s: %d total alerts", repoFullName,
result.Properties.Find("Critical Alerts").Text+result.Properties.Find("High Alerts").Text+
result.Properties.Find("Medium Alerts").Text+result.Properties.Find("Low Alerts").Text)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix total-alerts logging (string concat + %d).

The log concatenates strings and formats with %d, so the output is incorrect. Compute an integer total before logging.

🐛 Suggested fix
 import (
 	"fmt"
+	"strconv"
 	"sync"
 	"time"
-			results = append(results, result)
-			ctx.Logger.V(2).Infof("successfully scraped %s: %d total alerts", repoFullName,
-				result.Properties.Find("Critical Alerts").Text+result.Properties.Find("High Alerts").Text+
-				result.Properties.Find("Medium Alerts").Text+result.Properties.Find("Low Alerts").Text)
+			results = append(results, result)
+			totalAlerts := 0
+			for _, name := range []string{"Critical Alerts", "High Alerts", "Medium Alerts", "Low Alerts"} {
+				if p := result.Properties.Find(name); p != nil {
+					if n, err := strconv.Atoi(p.Text); err == nil {
+						totalAlerts += n
+					}
+				}
+			}
+			ctx.Logger.V(2).Infof("successfully scraped %s: %d total alerts", repoFullName, totalAlerts)
🤖 Prompt for AI Agents
In `@scrapers/github/security.go` around lines 59 - 62, The log call is formatting
with "%d" but is concatenating the alert-count strings, so compute an integer
total first by parsing the individual counts from
result.Properties.Find("Critical Alerts").Text, "High Alerts", "Medium Alerts",
and "Low Alerts" (use strconv.Atoi, handle errors defaulting to 0), sum them
into a total int, then call ctx.Logger.V(2).Infof("successfully scraped %s: %d
total alerts", repoFullName, total) instead of concatenating the Text values.

Comment on lines +599 to +607
if len(input.Changes) == 0 {
return input, fmt.Errorf("no id defined for: %s", input.Debug().ANSI())
}
if len(lo.Filter(input.Changes, func(c v1.ChangeResult, _ int) bool {
return c.ExternalChangeID == ""
})) > 0 {
return input, fmt.Errorf("standalone changes must have both an `external_id` and `external_change_id`: %s: %v", input, e.Config)

}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find ChangeResult struct definition first
echo "=== ChangeResult struct definition ==="
rg -n 'type ChangeResult struct' --type go -A 10

echo -e "\n=== ChangeResult instantiation patterns ==="
rg -n 'ChangeResult\{' --type go -C 2

echo -e "\n=== Standalone changes validation/tests ==="
rg -n 'standalone.*change|ExternalChangeID|ExternalID' --type go -g '*_test.go' -C 1

Repository: flanksource/config-db

Length of output: 19954


🏁 Script executed:

# Get full context around lines 599-607 in json.go
echo "=== Context around json.go lines 599-607 ==="
sed -n '590,620p' scrapers/processors/json.go

# Search for other standalone change validations or references
echo -e "\n=== Standalone changes references ==="
rg -n 'standalone' --type go

# Check if ExternalID is used/populated in json processor
echo -e "\n=== ExternalID usage in json processor ==="
rg -n 'ExternalID' scrapers/processors/json.go -C 2

Repository: flanksource/config-db

Length of output: 1179


🏁 Script executed:

# Check git history/comments around this validation
echo "=== Full function context in json.go ==="
grep -n 'func.*Process' scrapers/processors/json.go | head -5

# Get the function signature and full validation logic
sed -n '550,650p' scrapers/processors/json.go

# Look for documentation or comments about standalone changes
echo -e "\n=== Comments/docs about standalone changes ==="
rg -n -B5 'input.ID == ""' scrapers/processors/json.go | head -20

# Check if ExternalID is required elsewhere in change processing
echo -e "\n=== How changes are processed when ExternalID is populated ==="
rg -n 'ExternalID' scrapers/processors/ --type go

Repository: flanksource/config-db

Length of output: 2907


🏁 Script executed:

# Check if ExternalID is used/populated at the ScrapeResult level
echo "=== ScrapeResult struct definition and ExternalID usage ==="
rg -n 'type ScrapeResult struct' --type go -A 20 api/v1/interface.go

# Check if there's logic that depends on ExternalID when processing changes
echo -e "\n=== How ExternalID is processed in db updates ==="
rg -n 'ExternalID' db/update.go -B 2 -A 2

# Check if there are any tests for this validation
echo -e "\n=== Tests for extractAttributes validation ==="
rg -n 'extractAttributes|standalone changes' --type go -g '*_test.go' -B 2 -A 5

Repository: flanksource/config-db

Length of output: 8488


🏁 Script executed:

# Search for the term "standalone" in documentation or comments
echo "=== Standalone changes documentation/comments ==="
rg -n 'standalone' --type go --type md -B 3 -A 3

# Check if there's a pattern where both ExternalID and ExternalChangeID are populated together
echo -e "\n=== Changes with both ExternalID and ExternalChangeID ==="
rg -n 'ExternalID.*ExternalChangeID|ExternalChangeID.*ExternalID' --type go

Repository: flanksource/config-db

Length of output: 783


Validation is incomplete — missing ExternalID check for standalone changes.

When input.ID is empty (standalone changes), the code only validates that ExternalChangeID is present. However, the error message correctly states both external_id and external_change_id are required. For standalone changes without an input ID, ExternalID is needed to identify the associated config item. Add validation for both fields.

Proposed fix
 	if input.ID == "" {
 		if len(input.Changes) == 0 {
 			return input, fmt.Errorf("no id defined for: %s", input.Debug().ANSI())
 		}
 		if len(lo.Filter(input.Changes, func(c v1.ChangeResult, _ int) bool {
-			return c.ExternalChangeID == ""
+			return c.ExternalID == "" || c.ExternalChangeID == ""
 		})) > 0 {
 			return input, fmt.Errorf("standalone changes must have both an `external_id` and `external_change_id`: %s: %v", input, e.Config)
-
 		}
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if len(input.Changes) == 0 {
return input, fmt.Errorf("no id defined for: %s", input.Debug().ANSI())
}
if len(lo.Filter(input.Changes, func(c v1.ChangeResult, _ int) bool {
return c.ExternalChangeID == ""
})) > 0 {
return input, fmt.Errorf("standalone changes must have both an `external_id` and `external_change_id`: %s: %v", input, e.Config)
}
if len(input.Changes) == 0 {
return input, fmt.Errorf("no id defined for: %s", input.Debug().ANSI())
}
if len(lo.Filter(input.Changes, func(c v1.ChangeResult, _ int) bool {
return c.ExternalID == "" || c.ExternalChangeID == ""
})) > 0 {
return input, fmt.Errorf("standalone changes must have both an `external_id` and `external_change_id`: %s: %v", input, e.Config)
}
🤖 Prompt for AI Agents
In `@scrapers/processors/json.go` around lines 599 - 607, The validation currently
checks only ExternalChangeID for standalone inputs but not ExternalID; update
the validator in the same block that inspects input.Changes (near the lo.Filter
usage) to also check for any change where ExternalID == "" when input.ID is
empty (standalone change). Use the same pattern (e.g., lo.Filter or an
additional check) referencing input.Changes, ExternalID, ExternalChangeID and
input.ID, and return the same formatted error (including e.Config and
input.Debug()) when either required field is missing.

Comment on lines 33 to +36
if err := json.Unmarshal([]byte(s), &results); err != nil {
if logger.V(5).Enabled() {
logger.Infof("Failed to unmarshal script output: %v\n%s\n", err, s)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid logging full script output (potential secrets/PII leak).

Even at verbose level, logging raw s can expose credentials/configs and dump large payloads. Consider truncating/redacting or logging only a sample/length.

💡 Suggested fix (truncate output)
-		if logger.V(5).Enabled() {
-			logger.Infof("Failed to unmarshal script output: %v\n%s\n", err, s)
-		}
+		if logger.V(5).Enabled() {
+			const maxLogBytes = 4096
+			out := s
+			if len(out) > maxLogBytes {
+				out = out[:maxLogBytes] + "...(truncated)"
+			}
+			logger.Infof("Failed to unmarshal script output: %v\n%s\n", err, out)
+		}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if err := json.Unmarshal([]byte(s), &results); err != nil {
if logger.V(5).Enabled() {
logger.Infof("Failed to unmarshal script output: %v\n%s\n", err, s)
}
if err := json.Unmarshal([]byte(s), &results); err != nil {
if logger.V(5).Enabled() {
const maxLogBytes = 4096
out := s
if len(out) > maxLogBytes {
out = out[:maxLogBytes] + "...(truncated)"
}
logger.Infof("Failed to unmarshal script output: %v\n%s\n", err, out)
}
🤖 Prompt for AI Agents
In `@scrapers/processors/script.go` around lines 33 - 36, The current logger.Infof
call in the json.Unmarshal error block logs the entire script output variable s
(risking secrets/PII and large dumps); update the error handling around
json.Unmarshal to avoid printing the full s — instead log the error and either a
safely truncated sample (e.g., first N characters with "…(truncated)") and the
total length, or a redacted/hashed summary of s, and consider simple redaction
for common secrets (tokens, passwords) before including any sample; modify the
logger.Infof invocation that references err and s to use the new
truncated/redacted sample and length-only information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant