This is an AI-generated tool created as a one-off hackathon project.
A comprehensive GitHub activity tracking and reporting solution consisting of two integrated components:
- argh-sync: A Go-based tool for scraping GitHub issue activity from multiple repositories into a local SQLite database with support for incremental updates
- argh.py: A Python-based activity report generator that creates detailed, customizable reports with optional LLM-powered analysis
Together, these tools allow for efficient tracking, offline analysis, and insightful reporting of GitHub activity across your repositories.
- Multiple Repository Support: Track issues from any number of GitHub repositories
- Incremental Updates: Efficiently sync only new or updated issues since the last sync
- Pull Request Support: Store pull requests alongside issues with proper identification
- SQLite Database: Lightweight, zero-configuration database stored in a single file
- Flexible Date Ranges: Generate reports for any time period with precise date control
- Repository Filtering: Focus on specific repositories or analyze activity across all
- Contributor Analytics: Track and rank contributors based on various activity metrics
- LLM-Powered Insights: Optional AI analysis of significant developments with technical context and implications
- Performance: Written in Go (sync) and Python (reporting) for excellent performance characteristics
- Flexible Scheduling: Run manual updates or automate via cron/scheduler for regular reports
- Markdown Output: Generate clean, formatted reports with clickable links to issues and PRs
- Go 1.16 or higher
- GitHub Personal Access Token (for private repositories or to avoid rate limits)
- Clone the repository:
git clone https://github.com/yourusername/argh.git
cd argh
- Build the application:
go build -o argh-sync ./cmd
For the activity report generator, you'll need:
pip install sqlite3 click requests chatlasThe application uses a JSON configuration file. You can create a default configuration file by running:
./argh-sync -init
This will create a config.json file with the following structure:
{
"github_token": "",
"database_path": "github_issues.db",
"repositories": ["example/repo"]
}github_token: Your GitHub Personal Access Token (can be left empty if using the environment variable)database_path: Path to the SQLite database file (can be absolute or relative to the config file)repositories: List of repositories to track in the format "owner/name"
You can set your GitHub token using the environment variable:
export ARGH_GITHUB_TOKEN=your_github_token_here
This is the recommended approach as it keeps your token out of configuration files.
To add a repository to your configuration:
./argh-sync -add-repo owner/repository
To sync all repositories in your configuration:
./argh-sync -sync-all
To sync a specific repository:
./argh-sync -sync-repo owner/repository
By default, the application looks for config.json in the current directory. You can specify a different configuration file:
./argh-sync -config /path/to/config.json -sync-all
The argh.py script can generate GitHub activity reports from the database with the following features:
- Filtering by date range or looking back a specific number of days
- Filtering by specific repositories
- Including all contributors with accurate activity counts
- Generating markdown-formatted reports with clickable links
- Advanced LLM analysis focusing on significant developments and their implications
python argh.py --days 7Options:
--db-path TEXT Path to SQLite database (default: github_issues.db)
--output TEXT Path to save the report (default: print to stdout)
--days INTEGER Number of days to include in the report (default: 7)
--start-date TEXT Start date for the report (format: YYYY-MM-DD). Overrides --days if specified.
--end-date TEXT End date for the report (format: YYYY-MM-DD). Defaults to today if not specified.
--repositories TEXT Comma-separated list of repositories to include (default: all)
--llm-api-key TEXT API key for the LLM (default: LLM_API_KEY environment variable)
--llm-model TEXT Model name for the LLM (default: claude-3-7-sonnet-latest)
--llm-provider [anthropic|openai]
LLM provider to use (default: anthropic)
--dry-run Don't actually send to LLM, just show what would be sent
--custom-prompt TEXT Custom prompt to use for the LLM (overrides the default)
--verbose Include additional details like comment bodies in the report
--help Show this message and exit.
Generate a report for the last 7 days:
python argh.pyGenerate a report for a specific date range with end of day inclusivity:
python argh.py --start-date 2025-03-15 --end-date 2025-03-25Generate a report for specific repositories:
python argh.py --repositories "owner/repo1,owner/repo2"Generate a report and analyze with OpenAI:
python argh.py --llm-provider openai --llm-model gpt-4-turboGenerate a detailed verbose report (includes comment bodies):
python argh.py --verbose --output full_report.mdPreview the LLM prompt without making API calls:
python argh.py --dry-runThe activity report can use LLM capabilities (Claude or OpenAI) to generate insightful analysis of the GitHub activity. The enhanced reports include:
- Comprehensive Metrics: Accurate counts of issues, PRs, comments and complete contributor statistics
- Significant Developments Analysis: In-depth analysis of important changes including:
- Explanation of WHY changes are being made and problems being solved
- Technical insights and architectural implications
- Connection of individual changes to broader themes or project goals
- Future impact assessment of current work
To generate these enhanced reports, ensure you have:
- An API key for either Anthropic (Claude) or OpenAI
- The
chatlasPython package installed:pip install chatlas
Set your API key via environment variable:
export LLM_API_KEY=your_api_keyOr provide it directly:
python argh.py --llm-api-key your_api_keyThe argh.py script automatically includes a ranked list of all contributors in the report:
python argh.pyThis will show all contributors along with a breakdown of their activity (issues created, PRs submitted, and comments made).
Generate a report for a specific custom date range:
python argh.py --start-date 2025-03-15 --end-date 2025-03-25The end dates include activity until the end of the day (23:59:59) and start dates begin at the start of the day (00:00:00).
Focus on specific repositories by providing a comma-separated list:
python argh.py --repositories "owner/repo1,owner/repo2"Generate a detailed verbose report that includes comment bodies:
python argh.py --verbose --output full_report.mdFor more insightful analysis, you can use LLM capabilities:
-
Install the chatlas package:
pip install chatlas
-
Run the activity report script with LLM parameters:
python argh.py --llm-api-key your_api_key
-
Preview the LLM prompt without making API calls:
python argh.py --dry-run
-
Use different LLM providers:
python argh.py --llm-provider openai --llm-model gpt-4-turbo
The SQLite database contains the following tables:
repositories: Information about each repository being trackedusers: GitHub users who have created issues, PRs, or commentsissues: Issues and pull requests (withis_pull_requestflag)comments: Comments on issues and pull requestslabels: Issue/PR labelsissue_labels: Mapping between issues and labelssync_metadata: Information about the last sync time for each repository
Here are some useful SQL queries you can run directly on the database:
SELECT
issues.number,
issues.title,
repositories.full_name as repo,
COUNT(comments.id) as comment_count
FROM
issues
JOIN
repositories ON issues.repository_id = repositories.id
LEFT JOIN
comments ON issues.id = comments.issue_id
GROUP BY
issues.id
ORDER BY
comment_count DESC
LIMIT 10;SELECT
users.login,
COUNT(DISTINCT CASE WHEN issues.is_pull_request = 0 THEN issues.id ELSE NULL END) as issues_opened,
COUNT(DISTINCT CASE WHEN issues.is_pull_request = 1 THEN issues.id ELSE NULL END) as prs_opened,
COUNT(comments.id) as comments_made
FROM
users
LEFT JOIN
issues ON users.id = issues.user_id
LEFT JOIN
comments ON users.id = comments.user_id
GROUP BY
users.id
ORDER BY
(issues_opened + prs_opened + comments_made) DESC
LIMIT 10;SELECT
repositories.full_name as repo,
issues.number,
issues.title,
issues.created_at,
issues.closed_at,
julianday(issues.closed_at) - julianday(issues.created_at) as days_to_resolve
FROM
issues
JOIN
repositories ON issues.repository_id = repositories.id
WHERE
issues.state = 'closed'
AND issues.is_pull_request = 0
ORDER BY
days_to_resolve DESC
LIMIT 20;