Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safety data schema #2735

Open
wants to merge 7 commits into
base: dev
Choose a base branch
from
Open

Conversation

ziv17
Copy link
Collaborator

@ziv17 ziv17 commented Dec 13, 2024

Please review the suggested schema. If the direction is OK I will prepare poc tables and prepare POC of the /involved (replacing the /accidents) query.

@codecov-commenter
Copy link

codecov-commenter commented Dec 13, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 81.39535% with 8 lines in your changes missing coverage. Please review.

Project coverage is 52.75%. Comparing base (400af13) to head (005fee4).
Report is 148 commits behind head on dev.

Files with missing lines Patch % Lines
anyway/flask_app.py 20.00% 8 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##              dev    #2735      +/-   ##
==========================================
- Coverage   53.22%   52.75%   -0.47%     
==========================================
  Files         119      122       +3     
  Lines        9924    10256     +332     
==========================================
+ Hits         5282     5411     +129     
- Misses       4642     4845     +203     
Flag Coverage Δ
unittests 52.75% <81.39%> (-0.47%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ziv17 ziv17 force-pushed the safety-data-schema branch from e9cc41d to 005fee4 Compare December 23, 2024 18:10
@ziv17
Copy link
Collaborator Author

ziv17 commented Dec 23, 2024

Hi @atalyaalon , @citizen-dror ,
I added initial involved and accidents tables, and sample code that loads some data to the tables, and a simple query. The vehicles is not there yet.

@data-for-change data-for-change deleted a comment from LIRONBEN3 Dec 29, 2024
@atalyaalon atalyaalon linked an issue Dec 29, 2024 that may be closed by this pull request
@atalyaalon
Copy link
Collaborator

atalyaalon commented Dec 31, 2024

@ziv17 this pr looks like a great start!

I suggest the following changes before merging it:

Tables structure: We have 2 options -

  1. Three data tables OR one flat table

Three data tables:

The safety data tables will 3 data tables rather than 2 (added vehicle): safety_data_accident, safety_data_involved, safety_data_vehicles.
Each of these tables is a smaller table created by (markers, involved, vehicles tables. It contains numeric fields only.

Each of these tables will contain the following fields for join to hold:
accident_id (in markers it's id, in other tables it's accident_id)
provider_code
provider_and_id
accident_year

We can use foreign key constraints (hence relationships) and index these for each table for a better performance, like this example

The safety_data_accident, safety_data_involved, safety_data_vehicles will be the facts table, and we can also create relationships with dimension tables (Road Type, Accident Type, etc) meaning we'll have foreign key constraints for each dimension table (using star schema)

Note that I considered using materialized views, but they cannot directly maintain relationships.

One flat table
We can create one flat table created (either directly from involved that has a relationship with markers and vehicles or from from involved_markers_hebrew) and make sure that the relevant numeric fields are present in the table.

In addition I suggest that this table will have relationships with dimension tables (Road Type, Accident Type, etc) meaning we'll have foreign key constraints for each dimension table (using star schema)

  1. ETL flow: I suggest adding the generation/table update to the code of this ETL flow either during or after cbs-import-from-s3 task.

  2. Just to be on the same page - the columns we need for the Safety Data API:

3.1 value fields to be joined with Dimension tables for hebrew/english:
injury_severity
injured_type
age_group
sex
population_type
day_in_week
day_night
accident_yishuv_symbol (Hebrew field is accident_yishuv_name)
street1
street2
location_accuracy
road_segment_id (Hebrew field is road_segment_name)
accident_type
vehicle_vehicle_type
road_type
one_lane
multi_lane
road_width
speed_limit

3.2 fields w/o hebrew:
accident_year
accident_timestamp
road1
road2
latitude
longitude

3.3 id fields:
accident_id
provider_code
provider_and_id
id (the id in involved table) - replaces _id in current API (it's unique in Involved table)

3.4 Fields to be calculated (Calculation will be sent in a different issue)
vehicles

Note - the fields needed to create relationships with dimension tables, in order to query the hebrew fields are:indexed in all tables:
provider_code
accident_year
id (value) in facts table
Hence provider_code and accident_year should be indexed in all tables with such relationships

@ziv17
Copy link
Collaborator Author

ziv17 commented Jan 1, 2025

Hi @atalyaalon ,
Why do we need provider_and_id field? Aren't provider_code, accident_id and accident_year suffice?

@atalyaalon
Copy link
Collaborator

atalyaalon commented Jan 2, 2025

Hi @atalyaalon , Why do we need provider_and_id field? Aren't provider_code, accident_id and accident_year suffice?

The are suffice. We don't need this field in safety data. (Perhaps in the future we'll remove it from the "hebrew" tables, however right now it's used in various queries so we need to make sure we keep functionality beforehand, and that it's not needed by the data team)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Safety Data new API
3 participants