Performance Manager is an application to measure rail performance on the MBTA transit system.
column name | data type | nullable | description |
---|---|---|---|
pm_event_id | integer | false | auto increment primary key |
service_date | integer | false | |
pm_trip_id | integer | false | integer key used to join event record to trip data in vehicle_trips table |
stop_id | string | false | |
stop_sequence | small integer | false | |
canonical_stop_sequence | small integer | false | stop sequence based on "typical" route trip as defined in route_patterns.txt table |
sync_stop_sequence | small integer | false | stop_sequence that is consistent across all branches of a trunk for a particular parent_station |
parent_station | string | false | |
previous_trip_stop_pm_event_id | integer | true | pm_event_id of previous stop of pm_trip_id grouping |
next_trip_stop_pm_event_id | integer | true | pm_event_id of next stop of pm_trip_id grouping |
vp_move_timestamp | integer | true | earliest moving-status timestamp found from GTFS-RT Vehicle Position events |
vp_stop_timestamp | integer | true | earliest stopped-status timestamp found from GTFS-RT Vehicle Position events |
tu_stop_timestamp | integer | true | earliest timestamp found from GTFS-RT Trip Update events |
travel_time_seconds | integer | true | seconds of time the vehicle spends traveling to station |
dwell_time_seconds | integer | true | seconds of time that vehicle spends waiting at station |
headway_trunk_seconds | integer | true | departure to departure, parent_station wait time for vehicles traveling on trunk_route_id |
headway_branch_seconds | integer | true | departure to departure, parent_station wait time for vehicles traveling on branch_route_id |
updated_on | timestamp | false | timestamp field that is auto updated on any record change |
column name | data type | nullable | description |
---|---|---|---|
pm_trip_id | integer | false | auto increment key used to join trips to records in vehicle_events table |
direction_id | boolean | false | |
route_id | string | false | |
branch_route_id | string | true | |
trunk_route_id | string | true | |
service_date | integer | false | |
start_time | integer | false | |
vehicle_id | string | false | |
stop_count | small integer | true | |
trip_id | string | true | |
vehicle_label | string | true | |
vehicle_consist | string | true | |
direction | string | true | |
direction_destination | string | true | |
static_trip_id_guess | string | true | matched trip_id from GTFS static_trips table |
static_start_time | integer | true | start_time of static_trip_id_guess trip |
static_stop_count | small integer | true | expected stop count from static_trip_id_guess trip |
first_last_station_match | boolean | false | true if static_trip_id_guess is exact match to trip_id |
static_version_key | integer | false | GTFS static schedule version key for trip |
updated_on | timestamp | false | timestamp field that is auto updated on any record change |
column name | data type | nullable | description |
---|---|---|---|
pk_id | integer | false | auto incremented primary key |
feed_start_date | integer | false | |
feed_end_date | integer | false | |
feed_version | string | false | |
feed_active_date | integer | false | integer date pulled from feed_version value |
static_version_key | integer | false | key used to link GTFS static schedule versions between tables |
created_on | datetime | false | datetime field that is auto updated on record insert |
column name | data type | nullable | description |
---|---|---|---|
pk_id | integer | false | auto incremented primary key |
route_id | string | false | |
branch_route_id | string | true | |
trunk_route_id | string | true | |
service_id | string | false | |
trip_id | string | false | |
direction_id | boolean | false | |
block_id | string | true | |
static_version_key | integer | false | key used to link GTFS static schedule versions between tables |
column name | data type | nullable | description |
---|---|---|---|
pk_id | integer | false | auto incremented primary key |
route_id | string | false | |
agency_id | small integer | false | |
route_short_name | string | true | |
route_long_name | string | true | |
route_desc | string | true | |
route_type | small integer | false | |
route_sort_order | integer | false | |
route_fare_class | string | false | |
line_id | string | true | |
static_version_key | integer | false | key used to link GTFS static schedule versions between tables |
column name | data type | nullable | description |
---|---|---|---|
pk_id | integer | false | auto incremented primary key |
stop_id | string | false | |
stop_name | string | false | |
stop_desc | string | true | |
platform_code | string | true | |
platform_name | string | true | |
parent_station | string | true | |
static_version_key | integer | false | key used to link GTFS static schedule versions between tables |
column name | data type | nullable | description |
---|---|---|---|
pk_id | integer | false | auto incremented primary key |
trip_id | string | false | |
arrival_time | integer | false | |
departure_time | integer | false | |
scheduled_travel_time_seconds | integer | true | expected travel time for trip/stop generated by performance manager |
scheduled_headway_trunk_seconds | integer | true | expected trunk headway for parent_station generated by performance manager |
scheduled_headway_branch_seconds | integer | true | expected branch headways for parent_station generated by performance manager |
stop_id | string | false | |
stop_sequence | small integer | false | |
static_version_key | integer | false | key used to link GTFS static schedule versions between tables |
column name | data type | nullable | description |
---|---|---|---|
pk_id | integer | false | auto incremented primary key |
service_id | string | false | |
monday | boolean | false | |
tuesday | boolean | false | |
wednesday | boolean | false | |
thursday | boolean | false | |
friday | boolean | false | |
saturday | boolean | false | |
sunday | boolean | false | |
start_date | integer | false | |
end_date | integer | false | |
static_version_key | integer | false | key used to link GTFS static schedule versions between tables |
column name | data type | nullable | description |
---|---|---|---|
pk_id | integer | false | auto incremented primary key |
service_id | string | false | |
date | integer | false | |
exception_type | small integer | false | |
holiday_name | string | true | |
static_version_key | integer | false | key used to link GTFS static schedule versions between tables |
column name | data type | nullable | description |
---|---|---|---|
pk_id | integer | false | auto incremented primary key |
route_id | string | false | |
direction_id | boolean | false | |
direction | string | false | |
direction_destination | string | false | |
static_version_key | integer | false | key used to link GTFS static schedule versions between tables |
column name | data type | nullable | description |
---|---|---|---|
pk_id | integer | false | auto incremented primary key |
route_id | string | false | |
direction_id | boolean | false | |
route_pattern_typicality | smallint | true | |
representative_trip_id | string | false | |
static_version_key | integer | false | key used to link GTFS static schedule versions between tables |
GTFS-realtime (GTFS-RT) is provided by MBTA as an industry standard for distributing realtime transit data.
The Performance Manager application uses two MBTA GTFS-RT feeds:
Trip Updates are predictive data used to supplement Vehicle Positions data when the Vehicle Positions feed does not record an expected stop event.
All timestamp values used, or created by, Performance Manager are saved as whole second integers.
The CTD Delta application is responsible for reading GTFS-RT updates from the MBTA V3 API and saving them to an AWS S3 Bucket, as gzipped JSON files, for use by LAMP.
The LAMP Ingestion application aggregates gzipped GTFS-RT update files, saved on S3 by Delta, into partitioned parquet files that are also saved to an S3 bucket. The parquet files are partitioned by GTFS-RT feed type and grouped into hourly chunks.
The Performance Manager application reads the GTFS-RT partitioned parquet files for the Vehicle Positions and Trip Updates feeds and passes them through an aggregation and manipulation pipeline before inserting them into the vehicle_events and vehicle_trips table schemas described above.
When reading the GTFS-RT parquet files, records related to BUS data are removed to reduce database table sizes and processing time.
Performance Manager matches all GTFS-RT records to their applicable GTFS Static Schedule with the static_version_key
field in the vehicle_trips table.
The static_version_key
field allows GTFS-RT records to join to any GTFS Static data found in the static_
database tables.
The GTFS-RT partitioned parquet files have a large amount of redundant timestamp information that is not reasonable to save in an RDS database schema.
Performance Manager compresses GTFS-RT Vehicle Positions event records to store in the vehicle_events database table.
Initially, Vehicle Positions events are grouped by unique trip-stop columns:
- parent_station
- route_id
- service_date
- trip_id
For each trip_stop event, the earliest vehicle_timestamp
for a current_status
indicating the vehicle is stopped is saved as the vp_stop_timestamp
in the vehicle_events table. The earliest vehicle_timestamp
for a current_status
indicating the vehicle is moving is saved as the vp_move_timestamp
.
This process is repeated for Trip Updates event records. Trip Updates can only record stopped status timestamps, so the earliest arrival-timestamp
for each trip-stop event is saved as the tu_stop_timestamp
in the vehicle_events table.
For calculations/metrics that require a vehicle stop timestamp, Performance Manager defaults to using the vp_stop_timestamp
value generated from the Vehicle Positions GTFS-RT feed. If a vp_stop_timestamp
is not available for a trip-stop record, then the tu_stop_timestamp
value is used.
Travel times represent the amount of time a vehicle spent moving to the current station, from the previous station.
Travel times are saved as travel_time_seconds
in the vehicle_events table:
t_stop = coalesce(vp_stop_timestamp, tu_stop_timestamp)
t_move = vp_move_timestamp
travel_time_seconds = t_move - t_stop
Any travel_time_seconds
calculated as a negative value is not saved.
Dwell times represent the amount of time a vehicle spent waiting at a station. A dwell time is not calculated for the first or last stop of a trip.
Dwell times are saved as dwell_time_seconds
in the vehicle_events table:
t_dwell_start = coalesce(vp_stop_timestamp, tu_stop_timestamp) # for current stop of vehicle
t_station_depart = vp_move_timestamp # for next vehicle move event i.e. current station departure
t_dwell_end = t_station_depart
dwell_time_seconds = t_dwell_start - t_dwell_end
Any dwell_time_seconds
calculated as a negative value is not saved.
Headways represent the platform wait time a rider would experience on a route. This is a "departure-to-departure" calculation, equivalent to the duration between a vehicle leaving a parent_station
and the previous vehicle on the route leaving the same parent_station
in the same direction. The first stop at a parent_station
for each service_date
will not have a calculated headway.
Headways are calculated for branch_route_id
and trunk_route_id
designations and saved as headway_branch_seconds
and headway_trunk_seconds
, respectively, in the vehicle_events table:
t_station_depart = vp_move_timestamp # for next vehicle move event i.e. current station departure
t_headway_start = t_station_depart
t_headway_end = t_station_depart # for previous vehicle at parent_station on same branch/trunk_route_id in same direction
headway_branch/trunk_seconds = t_headway_start - t_headway_end
If the branch_route_id
for an event is NULL
, then headway_branch_seconds
will also be NULL
GTFS Static Zip files are generated by MBTA for internal and external distribution. When a new GTFS Static Zip file is generated, the CTD Delta application writes it to an AWS S3 bucket for use by LAMP.
The LAMP Ingestion application converts GTFS Zip files, saved on S3 by Delta, to partitioned parquet files that are also saved to an S3 bucket.
The Performance Manager application reads the GTFS partitioned parquet files, from S3, and inserts the contents into the static database schema tables described above.
During database insertion, GTFS records related to BUS data are removed from the static_routes, static_trips, and static_stop_times tables. This is done to reduce database table sizes.
For each GTFS Static schedule the following metrics are pre-calculated and stored in the static_stop_times table:
- scheduled_travel_time_seconds
- scheduled_headway_trunk_seconds
- scheduled_headway_branch_seconds
The business logic of these calculations follows the same rules as the GTFS-RT metrics, except that headway metrics are partitioned by parent_station
, service_id
, direction_id
and trunk/branch_route_id
.
For information regarding data exports, refer to https://performancedata.mbta.com and our Data Dictionary documentation.
The Performance Manager application image is described in a Dockerfile. This image is used for local testing and AWS deployment.
The docker-compose.yml file, found in the project root directory, describes a local environment for testing the performance manager application. This environment includes a local PostgreSQL database and seed_metadata
application that can be used to configure / migrate the local database and seed it with selected AWS S3 object paths.
To build application images run the following in the project root directory:
docker-compose build
To seed the local PostgreSQL database with AWS S3 objects paths, add selected parameters to ENTRYPOINT
for the seed_metadata
application in docker-compose.yml and run the following in the project root directory:
docker-compose up seed_metadata
--clear-rt
(reset Real-Time RDS tables, leaving GTFS Static tables intact)--clear-static
(reset Real-Time and Static RDS tables, full database reset)--seed-file
(path to json seed file, inside of docker image, that will be loaded intometadata_log
table of the metadata RDS.)
To start a local version of the performance manager application run the following in the project root directory:
docker-compose up performance_manager
The performance manager application will chronologically process any un-processed S3 object paths contained in the metadata_log
table of the metadata RDS.