Skip to content

Conversation

@mrkeshav-05
Copy link
Contributor

Proposed change

This PR update github sync pipeline to fetch all commits, pull requests and issues from the last 365 days across OWASP org.
also updates member snapshot generation to ensure contribution heatmaps and entity stats accurately reflect the full yearly activity window.

Resolves #3184

Checklist

  • Required: I read and followed the contributing guidelines
  • Required: I ran make check-test locally and all tests passed
  • I used AI for code, documentation, or tests in this PR

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 7, 2026

Summary by CodeRabbit

  • New Features

    • Extended GitHub data retrieval window from 30 days to 365 days for milestones, issues, and pull requests.
    • Default date range now uses a rolling 365-day window (end = today, start = 365 days ago) instead of fixed calendar defaults.
  • Documentation / Tests

    • Updated CLI help text and tests to reflect the new 365-day default window.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Switches GitHub sync defaults and lookback windows to a rolling 365-day range: CLI defaults now compute start = now(UTC) - 365 days and end = now(UTC); filtering for milestones, issues, and pull requests uses a 365-day fallback instead of shorter or fixed-calendar defaults. Tests updated accordingly.

Changes

Cohort / File(s) Summary
GitHub Sync Command
backend/apps/github/management/commands/github_sync_user.py
Added timedelta import; updated --start-at and --end-at help text; replaced fixed-calendar default computation with end = now(UTC) and start = end - 365 days.
Sync Command Tests
backend/tests/apps/github/management/commands/github_sync_user_test.py
Updated test expectations for help text: --start-at now states "Defaults to 365 days ago from today" and --end-at now states "Defaults to today".
Data Filtering Logic
backend/apps/github/common.py
Increased default lookback from 30 days to 365 days for milestones, issues, and pull requests; renamed month_agoyear_ago and updated references; no other control-flow changes.
Filtering Tests
backend/tests/apps/github/common_test.py
Renamed and adjusted a test to reflect 365-day initial-sync fallback; updated test timestamps (e.g., now - 200 days, now - 400 days) and docstring.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The PR title directly summarizes the main change: updating data sync to include contributions from the last 365 days, which matches the primary objective of the changeset.
Description check ✅ Passed The PR description is relevant and describes the key changes: updating the GitHub sync pipeline to fetch commits, pull requests, and issues from the last 365 days, plus updating member snapshot generation.
Linked Issues check ✅ Passed The code changes fully address issue #3184's objectives: default sync window extended from 30/365 days in common.py and management command now defaults to 365-day lookback instead of Jan-Oct calendar window.
Out of Scope Changes check ✅ Passed All changes are scoped to updating the data sync timeframe from 30 days to 365 days and adjusting default date ranges; no unrelated modifications detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between af62e49 and 162a82f.

📒 Files selected for processing (1)
  • backend/tests/apps/github/common_test.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-12-31T05:17:39.659Z
Learnt from: kart-u
Repo: OWASP/Nest PR: 3101
File: backend/apps/common/extensions.py:92-98
Timestamp: 2025-12-31T05:17:39.659Z
Learning: In this codebase, import OperationType for GraphQL operations from the graphql-core package rather than from strawberry. Use 'from graphql import OperationType'. Strawberry re-exports via graphql-core internally, so relying on strawberry's API may be brittle. Apply this rule to all Python files that deal with GraphQL operation types; ensure imports come from graphql (graphql-core) and not from strawberry packages. This improves compatibility and avoids coupling to strawberry's internals.

Applied to files:

  • backend/tests/apps/github/common_test.py
📚 Learning: 2026-01-01T17:48:23.963Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2948
File: backend/apps/owasp/management/commands/owasp_generate_community_snapshot_video.py:41-47
Timestamp: 2026-01-01T17:48:23.963Z
Learning: In Django code, be aware that a QuerySet's boolean evaluation (e.g., if not queryset) runs a database query to determine emptiness. While it is technically valid to use the queryset in a boolean context, use queryset.exists() for existence checks to avoid unnecessary queries and improve performance. Applicable broadly to Python/Django files rather than just this specific path.

Applied to files:

  • backend/tests/apps/github/common_test.py
🧬 Code graph analysis (1)
backend/tests/apps/github/common_test.py (3)
backend/tests/apps/github/models/repository_test.py (1)
  • mock_gh_repository (31-48)
backend/tests/apps/github/management/commands/github_sync_user_test.py (1)
  • mock_repo (60-62)
backend/apps/github/models/repository.py (1)
  • latest_updated_issue (137-139)
🔇 Additional comments (1)
backend/tests/apps/github/common_test.py (1)

236-253: LGTM!

The test correctly validates the 365-day fallback behavior for initial sync:

  • gh_issue_recent at 200 days (within window) is synced
  • gh_issue_ancient at 400 days (outside window) is skipped

The chosen values provide clear boundary separation from the 365-day threshold, making the test robust against timing edge cases.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
backend/apps/owasp/management/commands/owasp_create_member_snapshot.py (1)

90-90: Remove redundant local imports of timedelta.

Since timedelta is now imported at the module level (line 5), these local imports within the methods are redundant.

♻️ Proposed fix

Remove line 90:

-        from datetime import timedelta
-
         # Initialize all dates in range with 0

Remove line 253:

-        from datetime import timedelta
-
         # Initialize all dates in range with 0

Also applies to: 253-253

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 45521fb and 2cc3a34.

📒 Files selected for processing (3)
  • backend/apps/github/management/commands/github_sync_user.py
  • backend/apps/owasp/management/commands/owasp_create_member_snapshot.py
  • backend/tests/apps/github/management/commands/github_sync_user_test.py
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-12-18T05:39:42.678Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2948
File: backend/apps/owasp/management/commands/owasp_generate_community_snapshot_video.py:40-40
Timestamp: 2025-12-18T05:39:42.678Z
Learning: In Django management commands, prefer using self.stdout.write(...) over print(...) for user-facing stdout output. This aligns with Django conventions and improves testability. When emitting messages, consider using self.stdout.write and, for styled messages, use self.style.SUCCESS/ERROR as appropriate to maintain consistent command output formatting. Apply this guideline to all Python files within any project's management/commands directory.

Applied to files:

  • backend/apps/owasp/management/commands/owasp_create_member_snapshot.py
  • backend/tests/apps/github/management/commands/github_sync_user_test.py
  • backend/apps/github/management/commands/github_sync_user.py
📚 Learning: 2025-12-31T05:17:39.659Z
Learnt from: kart-u
Repo: OWASP/Nest PR: 3101
File: backend/apps/common/extensions.py:92-98
Timestamp: 2025-12-31T05:17:39.659Z
Learning: In this codebase, import OperationType for GraphQL operations from the graphql-core package rather than from strawberry. Use 'from graphql import OperationType'. Strawberry re-exports via graphql-core internally, so relying on strawberry's API may be brittle. Apply this rule to all Python files that deal with GraphQL operation types; ensure imports come from graphql (graphql-core) and not from strawberry packages. This improves compatibility and avoids coupling to strawberry's internals.

Applied to files:

  • backend/apps/owasp/management/commands/owasp_create_member_snapshot.py
  • backend/tests/apps/github/management/commands/github_sync_user_test.py
  • backend/apps/github/management/commands/github_sync_user.py
📚 Learning: 2026-01-01T17:48:23.963Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2948
File: backend/apps/owasp/management/commands/owasp_generate_community_snapshot_video.py:41-47
Timestamp: 2026-01-01T17:48:23.963Z
Learning: In Django code, be aware that a QuerySet's boolean evaluation (e.g., if not queryset) runs a database query to determine emptiness. While it is technically valid to use the queryset in a boolean context, use queryset.exists() for existence checks to avoid unnecessary queries and improve performance. Applicable broadly to Python/Django files rather than just this specific path.

Applied to files:

  • backend/apps/owasp/management/commands/owasp_create_member_snapshot.py
  • backend/tests/apps/github/management/commands/github_sync_user_test.py
  • backend/apps/github/management/commands/github_sync_user.py
📚 Learning: 2026-01-01T18:57:05.007Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2948
File: backend/apps/owasp/video.py:189-215
Timestamp: 2026-01-01T18:57:05.007Z
Learning: In the OWASP backend area, maintain the established pattern: when dealing with sponsors, include all entries from Sponsor.objects.all() (including NOT_SPONSOR) and perform in-memory sorting using the same criteria/pattern used by the GraphQL sponsor query implemented in backend/apps/owasp/api/internal/queries/sponsor.py. Apply this behavior consistently to files in backend/apps/owasp (not just video.py), and ensure code paths that render sponsor lists follow this in-code sorting approach rather than pre-filtering NOT_SPONSOR entries before sorting.

Applied to files:

  • backend/apps/owasp/management/commands/owasp_create_member_snapshot.py
🔇 Additional comments (7)
backend/tests/apps/github/management/commands/github_sync_user_test.py (1)

158-158: LGTM! Test expectations correctly updated.

The test assertions now verify that the help text reflects the new rolling 365-day window defaults instead of fixed calendar dates.

Also applies to: 163-163

backend/apps/owasp/management/commands/owasp_create_member_snapshot.py (3)

5-5: LGTM! Import addition supports rolling 365-day window.

The timedelta import is necessary for the new default date range calculation.


44-44: LGTM! Help text accurately reflects the new defaults.

The updated help text clearly communicates the rolling 365-day window behavior to users.

Also applies to: 49-49


323-325: LGTM! Default date range correctly implements rolling 365-day window.

The logic properly calculates the start date as 365 days before the end date (today), which aligns with the PR objective to capture all contributions from the last year.

backend/apps/github/management/commands/github_sync_user.py (3)

4-4: LGTM! Import addition supports rolling 365-day window.

The timedelta import is necessary for the new default date range calculation.


42-42: LGTM! Help text accurately reflects the new defaults.

The updated help text clearly communicates the rolling 365-day window behavior to users and is consistent with the implementation in owasp_create_member_snapshot.py.

Also applies to: 47-47


207-209: LGTM! Default date range correctly implements rolling 365-day window.

The implementation properly calculates the start date as 365 days before today, ensuring all contributions from the last year are synchronized. This directly addresses issue #3184 and aligns with the PR objective.

coderabbitai[bot]
coderabbitai bot previously approved these changes Jan 7, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
backend/apps/github/common.py (2)

79-83: Consider using the year_ago variable for consistency.

The logic correctly extends the lookback window to 365 days, aligning with the PR objectives. However, for consistency with the issues and pull requests sections (which use the year_ago variable defined at line 105), consider moving the year_ago definition earlier and using it here as well.

♻️ Suggested refactor for consistency

Move the year_ago definition before the milestones section and use it:

     if not repository.is_archived:
+        year_ago = timezone.now() - td(days=365)
+
         # GitHub repository milestones.
         kwargs = {
             "direction": "desc",
             "sort": "updated",
             "state": "all",
         }
 
         until = (
             latest_updated_milestone.updated_at
             if (latest_updated_milestone := repository.latest_updated_milestone)
-            else timezone.now() - td(days=365)
+            else year_ago
         )

Then remove the duplicate definition at line 105.


82-82: Monitor API rate limits and sync performance after deployment.

Extending the lookback window from 30 to 365 days will significantly increase the volume of data fetched per sync operation. Consider monitoring:

  • GitHub API rate limit consumption: The API has rate limits (typically 5000 requests/hour for authenticated requests). For organizations with many active repositories, this 12x increase in lookback window could approach or exceed these limits.
  • Sync duration: Initial syncs and syncs for repositories without prior data will fetch up to 365 days of history, potentially increasing sync times substantially.
  • Database growth: Storing a full year of historical data will increase storage requirements.

If rate limits or performance issues emerge, consider:

  • Implementing exponential backoff and retry logic for rate limit errors
  • Batching repository syncs to spread API calls over time
  • Adding progress tracking and resumption capability for long-running syncs
  • Monitoring specific high-activity repositories that might consume disproportionate API quota

Also applies to: 116-116, 167-167

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2cc3a34 and af62e49.

📒 Files selected for processing (1)
  • backend/apps/github/common.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-12-31T05:17:39.659Z
Learnt from: kart-u
Repo: OWASP/Nest PR: 3101
File: backend/apps/common/extensions.py:92-98
Timestamp: 2025-12-31T05:17:39.659Z
Learning: In this codebase, import OperationType for GraphQL operations from the graphql-core package rather than from strawberry. Use 'from graphql import OperationType'. Strawberry re-exports via graphql-core internally, so relying on strawberry's API may be brittle. Apply this rule to all Python files that deal with GraphQL operation types; ensure imports come from graphql (graphql-core) and not from strawberry packages. This improves compatibility and avoids coupling to strawberry's internals.

Applied to files:

  • backend/apps/github/common.py
📚 Learning: 2026-01-01T17:48:23.963Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2948
File: backend/apps/owasp/management/commands/owasp_generate_community_snapshot_video.py:41-47
Timestamp: 2026-01-01T17:48:23.963Z
Learning: In Django code, be aware that a QuerySet's boolean evaluation (e.g., if not queryset) runs a database query to determine emptiness. While it is technically valid to use the queryset in a boolean context, use queryset.exists() for existence checks to avoid unnecessary queries and improve performance. Applicable broadly to Python/Django files rather than just this specific path.

Applied to files:

  • backend/apps/github/common.py
🔇 Additional comments (3)
backend/apps/github/common.py (3)

105-105: Good variable extraction for reuse.

The year_ago variable efficiently avoids repeating the 365-day calculation for both issues and pull requests sections.


113-117: LGTM!

The change correctly extends the issues lookback window to 365 days using the year_ago variable, aligning with the PR objectives to capture complete yearly activity for contribution heatmaps.


164-168: LGTM!

The change correctly extends the pull requests lookback window to 365 days using the year_ago variable, consistent with the issues section and aligned with the PR objectives.

coderabbitai[bot]
coderabbitai bot previously approved these changes Jan 11, 2026
@mrkeshav-05
Copy link
Contributor Author

Hi @arkid15r ,
I’ve revised the PR to correctly handle the daily sync.
Please let me know if this now matches the expected behavior.

coderabbitai[bot]
coderabbitai bot previously approved these changes Jan 12, 2026
Copy link
Collaborator

@ahmedxgouda ahmedxgouda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work. Please address these comments

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these updates here are unrelated changes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will remove the unwanted files

@mrkeshav-05
Copy link
Contributor Author

Hi @ahmedxgouda , PR is ready for review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not syncing the commits, right? Don't we need to write the logic for that?

Copy link
Collaborator

@ahmedxgouda ahmedxgouda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update data sync to include all contributions within last 365 days

3 participants