GitHub - code-for-montana/pyrs990: A utility to extract and serialize IRS Form 990 data on nonprofit organizations.

It's a pun. Get it?

A Python application and library that can grab all sorts of IRS Form 990 data on non-profit organizations and put it into a format that can be consumed easily by other applications.

Up and Running

The instructions below should allow you to get the software working for your purpose (user or developer). If you run into trouble please, please let us know so that we can update the instructions (or fix the bug you ran into).

User

Pip Install

Requires Python 3.8 or greater

You can install PyRS990 easily using pip: pip install pyrs990 or pip install --user pyrs990 if you're not in a virtual environment and don't want a global install.

Docker

Grab one of our docker images from https://hub.docker.com/repository/docker/codeformontana/pyrs990 and run it sort of like this:

docker run --mount src="${PWD}/data",target=/data,type=bind codeformontana/pyrs990:latest --help

Instead of --help at the end, add your own command line arguments to make it do whatever it is you want it to do. You can assume that /data is where any output should be stored for commands that require a path, files written there will then show up in your current directory outside the Docker container.

Clone the Code

Requires Python 3.8 or greater

You can also clone the repo to use it, but this probably isn't the way to go for non-developers.

Make sure you have Python 3.8 available
Install Poetry if you don't already have it
Clone the whole repo, cd into the pyrs990 directory
Install dependencies - poetry install
Run it, some very simple examples are below:
1. poetry run pyrs990 --zip 59801 --use-disk-cache
2. poetry run pyrs990 --load-filters examples/has-a-website.json --use-disk-cache
3. ...more examples coming soon
Run the commands again, notice the cache speedup
The cache is set to ./.pyrs990-cache/

Developer

This project uses Poetry because it's pretty slick and does a lot of stuff automatically and the developers are not usually Python people, so that's great!

Make sure you have Python 3.8 available
Install Poetry if you don't already have it
Clone the whole repo, cd into the pyrs990 directory
Install dependencies - poetry install
If you need to add dependencies:
1. poetry add coolpkg
Make a pull request!

Development

Docker

We ship Docker containers that are pre-configured to run the application. To build new container images use make docker-build. To push them to Docker Hub, use make docker-push. Be sure to do any necessary version updates (see below) before rebuilding the container images as they will be tagged using the latest version number.

Release Process

# Assuming you've got the latest master and a clean working directory!
# Bump the version
make version-patch # or "major" or "minor"
git commit -a 'Bump version for release'
git push
# Maybe wait for CI to pass, or do it manually below
make analyze check-format check
# Now we make all the things public
make build publish
make docker-build docker-push

Versioning

Increment the version using make version-{major, minor, patch} or update it manually in the pyproject.toml file. This is the source of truth, everything else will update automatically based on it. If you modify the version manually, be sure to run make store-version to update the code.

Since PyRS990 is both an application and a library, we try to stick to semantic version rules. Increment the major for breaking changes, the minor for new features, and the patch for bug fixes and other "behind the scenes" changes.

About the Data

Right now we pull data that originated with the IRS (hence the silly name) but we get it from a couple sources and information about what is actually available is a little spread out as well.

Structure

There are two indices used to narrow down the list of filing documents that must be downloaded a satisfy a given query. The first is an annual index (we refer to it as "Annual" or "Annual Index" in the code). This index contains all filings processed by the IRS for a given calendar year.

Note that this does not necessarily have anything to do with the filing year. An organization might, for example, file its 2016 990 in either 2017 or 2018 (or even later). There is a field, described below, called tax_period that reflects the filing period. In the future, we intend to further abstract this so that it is easier to use.

The annual index also contains a field called object_id that tells us where to find the XML document that corresponds to that row in the index. PyRS990 abstracts this away, but it is still good to be aware of it.

The second index is the "Exempt Organizations Business Master File" distributed by the IRS. We refer to it as the "BMF Index". This index provides the physical address of each organization, along with some other helpful information. This allows the data to be queried by state, zip code, and so on, which greatly reduces the number of filing documents that must be downloaded for many queries.

Indices may be used to query filing documents from the command line using various options. Note that there are options for both indices and for the filing documents themselves. If possible, it is a good idea to try to use as many index fields as you can to reduce the number of files you have to download.

See the example queries for more information.

Index Fields

The index fields available for filtering are listed below. Note that, in general, the BMF index may be a bit more reliable since it points directly at the filing data files and doesn't require joining on the EIN, which we haven't entirely figured out yet (there seem to be EIN values missing from one or the other index in some cases).

BMF Index:

EIN - used to join indices
NAME
ICO
STREET
CITY
STATE
ZIP
GROUP
SUBSECTION
AFFILIATION
CLASSIFICATION
RULING
DEDUCTIBILITY
FOUNDATION
ACTIVITY
ORGANIZATION
STATUS
TAX_PERIOD
ASSET_CD
INCOME_CD
FILING_REQ_CD
PF_FILING_REQ_CD
ACCT_PD
ASSET_AMT
INCOME_AMT
REVENUE_AMT
NTEE_CD
SORT_NAME

Annual Index:

RETURN_ID
FILING_TYPE
EIN - used for joining indices
TAX_PERIOD
SUB_DATE
TAXPAYER_NAME
RETURN_TYPE
DLN
OBJECT_ID - points to filing data

Sources

The IRS BMF index files are hosted by the IRS directly and are available by state and region.

Descriptions of the variables contained in the files and the process used to build them are also available (it is also linked from the page above).

The annual index files come from an AWS S3 bucket managed by the IRS. The contents of the bucket are described there.

There is also a readme that demonstrates how to download the files here (it is also linked from the page above):

The filing documents themselves also come from this same AWS S3 bucket in XML format. For the extremely XML-savvy, you can checked out the schema documentation on the IRS website. PyRS990 abstracts this away, however, so there's no real need to understand it if you only want to access the data in a convenient format.

Finally, while not strictly a data source, the IRSx documentation created by ProPublica contains descriptions of many of the filing fields in a simple, readable format. For developers, PyRS990 has been designed to work with the exact XPath selectors listed in the IRSx documentation, so if you want to add a field to the Filing object, this is the place to look first.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
examples		examples
fixtures		fixtures
pyrs990		pyrs990
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
mypy.ini		mypy.ini
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
pyrs990_header.png		pyrs990_header.png
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Up and Running

User

Pip Install

Docker

Clone the Code

Developer

Development

Docker

Release Process

Versioning

About the Data

Structure

Index Fields

Sources

About

Releases

Packages

Languages

code-for-montana/pyrs990

Folders and files

Latest commit

History

Repository files navigation

Up and Running

User

Pip Install

Docker

Clone the Code

Developer

Development

Docker

Release Process

Versioning

About the Data

Structure

Index Fields

Sources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages