Basketball Reference is a resource to aggregate statistics on NBA teams, seasons, players, and games. This package provides methods to acquire data for all these categories in pre-parsed and simplified formats.
The primary API used currently is for stats.nba.com, but the website blocks too many requests, hindering those who want to acquire a lot of data. Additionally, scrapers for Basketball Reference do exist, but none of them load dynamically rendered content. These scrapers can only acquire statically loaded content, preventing those who want statistics in certain formats (for example, Player Advanced Stats Per Game).
Most of the scrapers use outdated methodologies of scraping from 'https://widgets.sports-reference.com/'
. This is outdated and Basketball Reference no longer acquires their data from there. Additionally, Sports Reference recently instituted a rate limiter preventing users from making an excess of 20 requests/minute. This package abstracts the waiting logic to ensure you never hit this threshold.
To deploy this CDK Application you will need the following
Note: A Makefile
is included in this project to wrap commands.
You can review the available commands using make help
To initialize the environment to run this sample:
$ make init
NOTE: To add additional dependencies, for example other libraries, you will need to run a poetry
command.
$ poetry add <package_name>
At this point you can now build the Python package for this code.
$ make build
There are basic unit tests included that can be run using the following command:
$ make test
Unit Tests for this project are located in the tests
folder
Alternatively, you can just clone this repo and import the libraries at your own discretion.
This package can also capture dynamically rendered content that is being added to the page via JavaScript, rather than baked into the HTML. To achieve this, it uses Python Selenium. Please refer to their installation instructions and ensure you have Chrome webdriver installed in and in your PATH
variable.
Currently, the package contains 5 modules: teams
, players
, seasons
, box_scores
, pbp
, shot_charts
, and injury_report
.
The package will be expanding to include other content as well, but this is a start.
For full details on the API please refer to the documentation.
This repository is a fork of the work done in this repository. This repository can build a package but it is not currently published anywhere
- upgraded to use Python 3.12
- changed to use Poetry for project/package management
- fixed and updated unit tests
- adding new methods to enhance functionality
- bumped the package version to 3.x.x because of the number of changes