forked from Jman4190/nba-sql
-
Notifications
You must be signed in to change notification settings - Fork 25
Home
Matthew Pope edited this page Apr 10, 2021
·
3 revisions
This is the nba-sql database.
The project grew out of the desire for a free NBA dataset that is queryable using SQL.
- Existing solutions, like nba_api looked interesting and feature rich, but had several issues with relational queryability.
- Existing databases are built off of similar data but are hidden behind paywalls. Providing the code (and potentially only the code) to build a similar database is desirable over a centrally hosted database with pay walled access. Most (if not all) personal use cases do not need the power and reliability of a hosted database. Hosting locally has speed and agility so far.
- Existing websites are rich in features for player and team comparisons but doing analysis is extremely cumbersome. The data may go back further, but it is impossible to use their data with tools like Apache Superset or Tableau.
Go to the Releases section to find the latest release. This is a file named nba.sql.xz
, compressed using xz. The following bash commands can be used to uncompress the file and load it into your database.
xz -d nba.sql.xz
psql -U <USERNAME> <DBNAME> < nba.sql
- Reduce data duplication.
- The NBA APIs return some data items excessively. I can only assume this is to reduce the number of requests required to populate their webpage. Things like
player_name
,age
,team_name
etc. are returned with most API requests. If included in the database this would require extra space. So these values are abstracted away into generalplayer
andteam
tables.
- The NBA APIs return some data items excessively. I can only assume this is to reduce the number of requests required to populate their webpage. Things like
- Efficient indexing.
- We want to be able to query this data fast, and as a side effect of the first goal, only include unique data. We use composite primary keys in several places, which places strict uniqueness constraints on the data.
- Ease of use.
- If our current schema poses issues, please file an issue. An open discussion of how this data is organized is welcome.
I'm not very good with organizing wikis, so check the side bar for available pages.