-
-
Notifications
You must be signed in to change notification settings - Fork 423
implement commit syncing and extend sync window to 365 days #3219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
implement commit syncing and extend sync window to 365 days #3219
Conversation
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughAdded bulk-saving of GitHub commits, extended sync horizon fallback from 30 to 365 days for commits, milestones, issues, and pull requests, introduced a chunk-size constant for commit bulk saves, added Repository.latest_commit property, and updated tests to expect the 365-day fallback. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
🧹 Recent nitpick comments
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (1)
🧰 Additional context used🧠 Learnings (2)📚 Learning: 2025-12-31T05:17:39.659ZApplied to files:
📚 Learning: 2026-01-01T17:48:23.963ZApplied to files:
🧬 Code graph analysis (2)backend/apps/github/models/repository.py (1)
backend/apps/github/common.py (3)
🔇 Additional comments (4)
✏️ Tip: You can disable this entire section by setting Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI Agents
In @backend/apps/github/management/commands/github_sync_user.py:
- Line 47: The help text for the end date argument in the management command
contains an extra space before "Defaults"; locate the add_argument call (in the
Command class / add_arguments method) where help="End date (YYYY-MM-DD).
Defaults to today." is defined and remove the extra space so it reads "End date
(YYYY-MM-DD). Defaults to today.".
- Around line 207-209: Fix the comment formatting above the default date logic:
change the comment that precedes default_end/default_start from "#Default to
last 365 days" to "# Default to last 365 days" (i.e., add a space after the `#`)
so it follows standard Python comment style.
In @backend/apps/owasp/management/commands/owasp_create_member_snapshot.py:
- Line 44: Fix the extra space in the help text for the start date argument:
locate the add_argument call that defines the start date help (e.g., the
'--start-date' or 'start_date' argument in the Command class in
owasp_create_member_snapshot.py) and change the string from "Start date
(YYYY-MM-DD). Defaults to 365 days ago." to "Start date (YYYY-MM-DD). Defaults
to 365 days ago." so there is only a single space before "365".
🧹 Nitpick comments (1)
backend/apps/owasp/management/commands/owasp_create_member_snapshot.py (1)
90-90: Optional: Remove redundant local import.Now that
timedeltais imported at the module level (line 5), the local import on line 90 withingenerate_heatmap_datais redundant. Similarly, line 253 ingenerate_communication_heatmap_dataalso has a redundant import.🔎 Proposed cleanup
Remove the local import on line 90:
- from datetime import timedelta - # Initialize all dates in range with 0And also remove the local import on line 253 (not shown in this segment but present in the file):
- from datetime import timedelta - # Initialize all dates in range with 0
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
backend/apps/github/management/commands/github_sync_user.pybackend/apps/owasp/management/commands/owasp_create_member_snapshot.py
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-12-18T05:39:42.678Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2948
File: backend/apps/owasp/management/commands/owasp_generate_community_snapshot_video.py:40-40
Timestamp: 2025-12-18T05:39:42.678Z
Learning: In Django management commands, prefer using self.stdout.write(...) over print(...) for user-facing stdout output. This aligns with Django conventions and improves testability. When emitting messages, consider using self.stdout.write and, for styled messages, use self.style.SUCCESS/ERROR as appropriate to maintain consistent command output formatting. Apply this guideline to all Python files within any project's management/commands directory.
Applied to files:
backend/apps/owasp/management/commands/owasp_create_member_snapshot.pybackend/apps/github/management/commands/github_sync_user.py
📚 Learning: 2025-12-31T05:17:39.659Z
Learnt from: kart-u
Repo: OWASP/Nest PR: 3101
File: backend/apps/common/extensions.py:92-98
Timestamp: 2025-12-31T05:17:39.659Z
Learning: In this codebase, import OperationType for GraphQL operations from the graphql-core package rather than from strawberry. Use 'from graphql import OperationType'. Strawberry re-exports via graphql-core internally, so relying on strawberry's API may be brittle. Apply this rule to all Python files that deal with GraphQL operation types; ensure imports come from graphql (graphql-core) and not from strawberry packages. This improves compatibility and avoids coupling to strawberry's internals.
Applied to files:
backend/apps/owasp/management/commands/owasp_create_member_snapshot.pybackend/apps/github/management/commands/github_sync_user.py
📚 Learning: 2026-01-01T17:48:23.963Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2948
File: backend/apps/owasp/management/commands/owasp_generate_community_snapshot_video.py:41-47
Timestamp: 2026-01-01T17:48:23.963Z
Learning: In Django code, be aware that a QuerySet's boolean evaluation (e.g., if not queryset) runs a database query to determine emptiness. While it is technically valid to use the queryset in a boolean context, use queryset.exists() for existence checks to avoid unnecessary queries and improve performance. Applicable broadly to Python/Django files rather than just this specific path.
Applied to files:
backend/apps/owasp/management/commands/owasp_create_member_snapshot.pybackend/apps/github/management/commands/github_sync_user.py
📚 Learning: 2026-01-01T18:57:05.007Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2948
File: backend/apps/owasp/video.py:189-215
Timestamp: 2026-01-01T18:57:05.007Z
Learning: In the OWASP backend area, maintain the established pattern: when dealing with sponsors, include all entries from Sponsor.objects.all() (including NOT_SPONSOR) and perform in-memory sorting using the same criteria/pattern used by the GraphQL sponsor query implemented in backend/apps/owasp/api/internal/queries/sponsor.py. Apply this behavior consistently to files in backend/apps/owasp (not just video.py), and ensure code paths that render sponsor lists follow this in-code sorting approach rather than pre-filtering NOT_SPONSOR entries before sorting.
Applied to files:
backend/apps/owasp/management/commands/owasp_create_member_snapshot.py
🔇 Additional comments (5)
backend/apps/owasp/management/commands/owasp_create_member_snapshot.py (3)
5-5: LGTM: Import addition supports the new rolling window logic.The addition of
timedeltato the top-level imports is necessary for calculating the 365-day default window and aligns with the PR objectives.
49-49: LGTM: Help text accurately reflects the new default.The help text correctly describes the new behavior where
--end-atdefaults to today.
323-325: LGTM: Rolling 365-day window implemented correctly.The default date logic now uses a dynamic 365-day window ending today, replacing the previous fixed calendar range. This aligns with the PR objectives and ensures consistent contribution views across the platform.
backend/apps/github/management/commands/github_sync_user.py (2)
4-4: LGTM: Import addition supports the new rolling window logic.The addition of
timedeltato the imports is necessary for calculating the 365-day default window and mirrors the change in the relatedowasp_create_member_snapshot.pycommand.
42-42: LGTM: Help text accurately reflects the new default.The help text correctly describes the new behavior where
--start-atdefaults to 365 days ago.
backend/apps/owasp/management/commands/owasp_create_member_snapshot.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
backend/apps/owasp/management/commands/owasp_create_member_snapshot.py (1)
90-90: Remove redundant local imports oftimedelta.Since
timedeltais already imported at the module level (line 5), the local imports withingenerate_heatmap_dataandgenerate_communication_heatmap_dataare unnecessary and can be removed for cleaner code.🔎 Proposed fix
Remove the local import statements on lines 90 and 253:
def generate_heatmap_data(self, commits, pull_requests, issues, start_at, end_at) -> dict: """Generate heatmap data from contributions within the date range. ... """ - from datetime import timedelta - # Initialize all dates in range with 0 heatmap_data: dict[str, int] = {}def generate_communication_heatmap_data(self, messages, start_at, end_at) -> dict: """Generate communication heatmap data from public Slack channels within date range. ... """ - from datetime import timedelta - # Initialize all dates in range with 0 heatmap_data: dict[str, int] = {}Also applies to: 253-253
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
backend/apps/github/management/commands/github_sync_user.pybackend/apps/owasp/management/commands/owasp_create_member_snapshot.pybackend/tests/apps/github/management/commands/github_sync_user_test.py
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-12-18T05:39:42.678Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2948
File: backend/apps/owasp/management/commands/owasp_generate_community_snapshot_video.py:40-40
Timestamp: 2025-12-18T05:39:42.678Z
Learning: In Django management commands, prefer using self.stdout.write(...) over print(...) for user-facing stdout output. This aligns with Django conventions and improves testability. When emitting messages, consider using self.stdout.write and, for styled messages, use self.style.SUCCESS/ERROR as appropriate to maintain consistent command output formatting. Apply this guideline to all Python files within any project's management/commands directory.
Applied to files:
backend/apps/github/management/commands/github_sync_user.pybackend/apps/owasp/management/commands/owasp_create_member_snapshot.pybackend/tests/apps/github/management/commands/github_sync_user_test.py
📚 Learning: 2025-12-31T05:17:39.659Z
Learnt from: kart-u
Repo: OWASP/Nest PR: 3101
File: backend/apps/common/extensions.py:92-98
Timestamp: 2025-12-31T05:17:39.659Z
Learning: In this codebase, import OperationType for GraphQL operations from the graphql-core package rather than from strawberry. Use 'from graphql import OperationType'. Strawberry re-exports via graphql-core internally, so relying on strawberry's API may be brittle. Apply this rule to all Python files that deal with GraphQL operation types; ensure imports come from graphql (graphql-core) and not from strawberry packages. This improves compatibility and avoids coupling to strawberry's internals.
Applied to files:
backend/apps/github/management/commands/github_sync_user.pybackend/apps/owasp/management/commands/owasp_create_member_snapshot.pybackend/tests/apps/github/management/commands/github_sync_user_test.py
📚 Learning: 2026-01-01T17:48:23.963Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2948
File: backend/apps/owasp/management/commands/owasp_generate_community_snapshot_video.py:41-47
Timestamp: 2026-01-01T17:48:23.963Z
Learning: In Django code, be aware that a QuerySet's boolean evaluation (e.g., if not queryset) runs a database query to determine emptiness. While it is technically valid to use the queryset in a boolean context, use queryset.exists() for existence checks to avoid unnecessary queries and improve performance. Applicable broadly to Python/Django files rather than just this specific path.
Applied to files:
backend/apps/github/management/commands/github_sync_user.pybackend/apps/owasp/management/commands/owasp_create_member_snapshot.pybackend/tests/apps/github/management/commands/github_sync_user_test.py
📚 Learning: 2026-01-01T18:57:05.007Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2948
File: backend/apps/owasp/video.py:189-215
Timestamp: 2026-01-01T18:57:05.007Z
Learning: In the OWASP backend area, maintain the established pattern: when dealing with sponsors, include all entries from Sponsor.objects.all() (including NOT_SPONSOR) and perform in-memory sorting using the same criteria/pattern used by the GraphQL sponsor query implemented in backend/apps/owasp/api/internal/queries/sponsor.py. Apply this behavior consistently to files in backend/apps/owasp (not just video.py), and ensure code paths that render sponsor lists follow this in-code sorting approach rather than pre-filtering NOT_SPONSOR entries before sorting.
Applied to files:
backend/apps/owasp/management/commands/owasp_create_member_snapshot.py
🔇 Additional comments (7)
backend/apps/owasp/management/commands/owasp_create_member_snapshot.py (3)
5-5: LGTM! Import statement correctly updated.The
timedeltaimport has been properly added to support the new 365-day rolling window logic.
44-44: Help text accurately reflects the new defaults.The updated help descriptions correctly document the 365-day rolling window behavior.
Also applies to: 49-49
323-325: Rolling 365-day window implemented correctly.The default date logic properly calculates a 365-day period ending today, aligning with the PR objectives to replace the fixed calendar range with a dynamic rolling window.
backend/tests/apps/github/management/commands/github_sync_user_test.py (1)
158-163: Test assertions correctly updated to reflect new defaults.The test properly verifies that the command's help text documents the 365-day rolling window behavior. Good coverage of the argument documentation.
backend/apps/github/management/commands/github_sync_user.py (3)
4-4: LGTM! Import statement correctly updated.The
timedeltaimport has been properly added to enable the new 365-day rolling window logic.
42-42: Help text accurately reflects the new defaults.The updated help descriptions correctly document the 365-day rolling window behavior for both start and end dates.
Also applies to: 47-47
207-209: Rolling 365-day window implemented correctly.The default date logic properly calculates a 365-day period ending today, successfully replacing the fixed Jan 1–Oct 1 calendar range with a dynamic rolling window that aligns with the aggregation logic used elsewhere in the platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
backend/apps/github/common.py (1)
79-83: Consider consolidating the 365-day threshold calculation.The 365-day threshold is computed twice: inline at line 82 for milestones and again at line 105 as
year_agofor issues/PRs. You could defineyear_agoearlier and reuse it for all three, reducing duplication:♻️ Suggested consolidation
if not repository.is_archived: + year_ago = timezone.now() - td(days=365) + # GitHub repository milestones. kwargs = { "direction": "desc", "sort": "updated", "state": "all", } until = ( latest_updated_milestone.updated_at if (latest_updated_milestone := repository.latest_updated_milestone) - else timezone.now() - td(days=365) + else year_ago ) # ... milestone sync loop ... # GitHub repository issues. project_track_issues = repository.project.track_issues if repository.project else True - year_ago = timezone.now() - td(days=365)Also applies to: 105-105
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
backend/apps/github/common.pybackend/tests/apps/github/common_test.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-12-31T05:17:39.659Z
Learnt from: kart-u
Repo: OWASP/Nest PR: 3101
File: backend/apps/common/extensions.py:92-98
Timestamp: 2025-12-31T05:17:39.659Z
Learning: In this codebase, import OperationType for GraphQL operations from the graphql-core package rather than from strawberry. Use 'from graphql import OperationType'. Strawberry re-exports via graphql-core internally, so relying on strawberry's API may be brittle. Apply this rule to all Python files that deal with GraphQL operation types; ensure imports come from graphql (graphql-core) and not from strawberry packages. This improves compatibility and avoids coupling to strawberry's internals.
Applied to files:
backend/tests/apps/github/common_test.pybackend/apps/github/common.py
📚 Learning: 2026-01-01T17:48:23.963Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2948
File: backend/apps/owasp/management/commands/owasp_generate_community_snapshot_video.py:41-47
Timestamp: 2026-01-01T17:48:23.963Z
Learning: In Django code, be aware that a QuerySet's boolean evaluation (e.g., if not queryset) runs a database query to determine emptiness. While it is technically valid to use the queryset in a boolean context, use queryset.exists() for existence checks to avoid unnecessary queries and improve performance. Applicable broadly to Python/Django files rather than just this specific path.
Applied to files:
backend/tests/apps/github/common_test.pybackend/apps/github/common.py
🧬 Code graph analysis (1)
backend/tests/apps/github/common_test.py (3)
backend/tests/apps/github/models/repository_test.py (1)
mock_gh_repository(31-48)backend/tests/apps/github/management/commands/github_sync_user_test.py (1)
mock_repo(60-62)backend/apps/github/models/repository.py (1)
latest_updated_issue(137-139)
🔇 Additional comments (4)
backend/tests/apps/github/common_test.py (1)
236-253: LGTM!The test correctly validates the 365-day fallback behavior for initial sync:
- 300 days is within the 365-day window → synced
- 400 days is outside the 365-day window → skipped
The boundary values appropriately exercise the new threshold.
backend/apps/github/common.py (3)
79-83: LGTM!The milestones fallback correctly uses a 365-day window for initial sync, consistent with the PR objectives.
105-117: LGTM!Good approach defining
year_agoonce and reusing it for both issues and pull requests fallbacks. The naming clearly conveys the 365-day window intent.
164-168: LGTM!Pull request sync correctly uses the shared
year_agofallback, maintaining consistency with the issues sync window.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
backend/apps/github/constants.py (1)
14-14: Consider grouping with otherGITHUB_constants.The new constant follows the naming convention but is placed after the
OWASP_constants. For better organization, consider moving it near the otherGITHUB_*constants (lines 5-9).Suggested placement
GITHUB_ACTIONS_USER_LOGIN = "actions-user" GITHUB_GHOST_USER_LOGIN = "ghost" GITHUB_ITEMS_PER_PAGE = 100 +GITHUB_COMMITS_BULK_SAVE_CHUNK_SIZE = 50 GITHUB_REPOSITORY_RE = re.compile("^https://github.com/([^/]+)/([^/]+)(/.*)?$") GITHUB_USER_RE = re.compile("^https://github.com/([^/]+)/?$") OWASP_FOUNDATION_LOGIN = "OWASPFoundation" OWASP_GITHUB_IO = "owasp.github.io" OWASP_LOGIN = "owasp" -GITHUB_COMMITS_BULK_SAVE_CHUNK_SIZE = 50backend/apps/github/common.py (1)
110-114: Useyear_agovariable for consistency.Line 113 recalculates
timezone.now() - td(days=365)instead of reusing theyear_agovariable defined on line 74. For consistency with lines 146 and 197, and to avoid potential timing drift, use the existing variable.Suggested fix
until = ( latest_updated_milestone.updated_at if (latest_updated_milestone := repository.latest_updated_milestone) - else timezone.now() - td(days=365) + else year_ago )backend/tests/apps/github/common_test.py (1)
237-254: Test correctly validates 365-day fallback for issues, but consider adding commit sync coverage.The test values (300 and 400 days) are well-chosen boundary values. However, the new commit bulk-save logic (lines 76-101 in
common.py) lacks test coverage for:
- Verifying
Commit.bulk_saveis called- Validating chunk-size batching behavior
- Testing incremental sync when a
latest_commitexistsWould you like me to help generate additional test cases for the commit syncing functionality?
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
backend/apps/github/common.pybackend/apps/github/constants.pybackend/tests/apps/github/common_test.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-12-31T05:17:39.659Z
Learnt from: kart-u
Repo: OWASP/Nest PR: 3101
File: backend/apps/common/extensions.py:92-98
Timestamp: 2025-12-31T05:17:39.659Z
Learning: In this codebase, import OperationType for GraphQL operations from the graphql-core package rather than from strawberry. Use 'from graphql import OperationType'. Strawberry re-exports via graphql-core internally, so relying on strawberry's API may be brittle. Apply this rule to all Python files that deal with GraphQL operation types; ensure imports come from graphql (graphql-core) and not from strawberry packages. This improves compatibility and avoids coupling to strawberry's internals.
Applied to files:
backend/apps/github/constants.pybackend/tests/apps/github/common_test.pybackend/apps/github/common.py
📚 Learning: 2026-01-01T17:48:23.963Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2948
File: backend/apps/owasp/management/commands/owasp_generate_community_snapshot_video.py:41-47
Timestamp: 2026-01-01T17:48:23.963Z
Learning: In Django code, be aware that a QuerySet's boolean evaluation (e.g., if not queryset) runs a database query to determine emptiness. While it is technically valid to use the queryset in a boolean context, use queryset.exists() for existence checks to avoid unnecessary queries and improve performance. Applicable broadly to Python/Django files rather than just this specific path.
Applied to files:
backend/apps/github/constants.pybackend/tests/apps/github/common_test.pybackend/apps/github/common.py
🧬 Code graph analysis (2)
backend/tests/apps/github/common_test.py (3)
backend/tests/apps/github/models/repository_test.py (1)
mock_gh_repository(31-48)backend/tests/apps/github/management/commands/github_sync_user_test.py (1)
mock_repo(60-62)backend/apps/github/models/repository.py (1)
latest_updated_issue(137-139)
backend/apps/github/common.py (1)
backend/apps/github/models/commit.py (1)
Commit(11-110)
🔇 Additional comments (5)
backend/apps/github/common.py (4)
15-17: LGTM!The new imports for
GITHUB_COMMITS_BULK_SAVE_CHUNK_SIZEandCommitare correctly added to support the bulk commit syncing functionality.
74-101: LGTM! Well-structured bulk commit syncing.The implementation correctly:
- Uses the 365-day fallback for initial sync when no commits exist
- Applies incremental sync based on latest commit timestamp
- Batches saves to avoid memory issues with large repositories
- Flushes remaining commits after the loop
143-147: LGTM!The 365-day fallback for issues correctly uses the
year_agovariable defined earlier.
194-198: LGTM!The 365-day fallback for pull requests correctly uses the
year_agovariable, consistent with the issues fallback.backend/tests/apps/github/common_test.py (1)
18-18: LGTM!Adding the
Commitmock is necessary to support the new commit syncing logic insync_repository.
|
@arkid15r I have updated the PR with the requested changes and implemented the commit syncing logic. Could you please take another look? |
|



Proposed change
Resolves #3184
This PR addresses incomplete contribution heatmaps by fixing two core issues in the GitHub synchronization logic.
Before:

After :

Checklist
make check-testlocally and all tests passed