This project is a from-scratch writing of a data pipeline by Jacob Olness for the new MT Legislative Bill Explorer that scrapes, processes, and organizes data from the Montana Legislature's bill tracker for use in Montana Free Press' Capitol Tracker. It automates downloading, parsing, and transforming legislative data including bills, committees, votes, amendments, and PDFs to work with the Capitol Tracker data expectations formed by the state's decades-old previous bill tracker.
Montana Free press is a 501(c)(3) nonprofit newsroom that aims to provide Montanans with in-depth, nonpartisan news coverage.
A live version of the 2025 tracker can be found at https://projects.montanafreepress.org/capitol-tracker-2025/
Pipeline runs automatically via GitHub Actions set up in .github/workflows/data.yml. There are cron jobs set up for active hours during the session and a reduced rate of 1x/hr for after Sine Die. Comment one out and uncomment the other to switch between them.
Wherever possible caching has been implemented to minimize load on the state's servers while helping to provide a service to the public in accordance with Montana Constitution Article II, § 9's "Right To Know" provision. For example —
- PDFs are only downloaded if the latest version isn't stored locally
- The GitHub Actions pipeline runs only during the day
legislative-interface-{year}
(modify this to match the forked repo url)
git clone https://github.com/mtfreepress/legislative-interface.git
cd legislative-interfacepython3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txtchmod +x ./execute.sh./execute.shThe execute.sh script runs the entire data collection and processing pipeline in order. Here's what each step does:
| Script | Purpose |
|---|---|
interface/get-bill-data.py |
Downloads raw bill data from the Montana Legislature API |
interface/split-bills.py |
Splits the large bill JSON into individual files for easier processing |
interface/get-legislators.py |
Downloads legislator data and roster information |
interface/get-all-committees.py |
Downloads all committee data (standing and non-standing) |
interface/get-agencies.py |
Downloads state agency data |
interface/generate-bill-list.py |
Creates a list of bills for input into other scripts |
| Script | Purpose |
|---|---|
interface/get-legal-review-notes.py |
Downloads legal review notes and veto letters for bills |
interface/get-fiscal-review-notes.py |
Downloads fiscal notes and rebuttals for bills |
interface/get-bill-text-pdf.py |
Downloads bill text PDFs |
interface/get-amendments.py |
Downloads bill amendments and amendment PDFs |
| Script | Purpose |
|---|---|
interface/compress-pdfs.py |
Compresses downloaded PDFs to save space |
python interface/compress-pdfs.py {path/to/pdf-directory}
| Script | Purpose |
|---|---|
interface/get-bill-hearings.py |
Downloads committee hearing data for bills |
interface/get-votes-json.py |
Downloads vote data for bills (2025+ sessions) |
interface/get-executive-actions-json.py |
Downloads executive actions data |
interface/get-committees.py |
Downloads committee data by ID |
interface/match-votes-actions.py |
Matches votes with bill actions for analysis |
| Script | Purpose |
|---|---|
process/process-committees.py |
Processes committee data, applies whitelist filtering, generates committee statistics |
process/process-bills.py |
Processes bill data into the final format needed for downstream use |
Largely deprecated as of July 2025 — appears that the state has manually moved at least vote counts over to the new json system dating back to 1999.
| Script | Purpose |
|---|---|
interface/get-pdf-votesheets.py |
Downloads vote sheet PDFs (pre-2025 sessions only) |
process/process-vote-pdfs.py |
Parses vote PDFs into JSON (pre-2025 sessions only) |
| Script | Purpose |
|---|---|
process/merge-actions.py |
Merges actions and votes data |
- Session Configuration: Edit the variables at the top of
execute.shfor different legislative sessions - Committee Filtering: Edit
COMMITTEE_WHITELISTinprocess/process-committees.pyto control which committees are processed - Committee Display Names:
interface/downloads/committee_mapping.csvmaps committee keys to display names
legislative-interface/
├── execute.sh # Main pipeline script
├── interface/ # Data collection scripts
│ ├── downloads/ # Raw downloaded data
│ ├── get-*.py # Data download scripts
│ ├── /raw-data-dirs # Data output with more than 1 output file (ie split bills/votes etc)
│ └── *.json # Output files (with only 1 output like the entire json of all bills)
├── process/ # Data processing scripts
│ ├── cleaned/ # Processed output data
│ └── process-*.py # Data transformation scripts
└── requirements.txt # Python dependencies
For new legislative sessions, update these variables in execute.sh:
sessionId=2 # Session identifier for 2025 for some reason
sessionOrdinal=20251 # Session ordinal number (special session would be 20252)
legislatureOrdinal=69 # Legislature number (2027 will be Montana's 70th legislative session)-
Missing committees in output: Check the
COMMITTEE_WHITELISTinprocess/process-committees.pyand ensure filenames match expected keys. -
API rate limiting: The scripts use connection limits and user-agent headers to be respectful of the state's API.
-
File path issues: All scripts use relative paths from their directory location to work with the
execute.shrunner.
- Python 3.13+
- aiohttp: For async HTTP requests
- requests: For synchronous HTTP requests
- PyPDF2 or similar: For PDF processing (if using legacy vote parsing—should be unneccesary but just in case™)
- The Montana Legislature's API and website structure can change between sessions
- Scripts are designed to be modular - you can run individual components if needed
- The pipeline includes extensive error handling and file existence checks
- PDF downloads include caching to avoid re-downloading existing files
- All scripts log their progress and execution time
- Designed to run via GitHub Actions on a schedule
- Can be run manually with
bash execute.sh - Includes timing measurements for performance monitoring
"New" BSD License (aka "3-clause"). See LICENSE for details.
When adding new scripts or modifying existing ones:
- Follow the existing naming convention (
get-*.pyfor downloads,process-*.pyfor processing) - Add the script to
execute.shin the appropriate phase - Update this README with a description of what the script does
- Use relative paths and the established directory structure
For the next legislative session (2027):
- After forking delete the old data from last session
- Update session variables
- Check API endpoints for functionality
- Change the URL for Capitol Tracker's legislators
interface/get-legislators.py(There is a TODO right above the line) or change it to manage those annotations in this repo instead.