-
Notifications
You must be signed in to change notification settings - Fork 0
Database Schema
The database is SQLite-based and inherits some of its structure from its spreadsheet predecessor. As, in my day job, I have to work with a database with no constraints beyond primary keys and nullity checks, I've somewhat gone to town (perhaps excessively) to show how even something as lightweight as SQLite can do things properly.
SQLite doesn't have a datetime
type, so all temporal data is stored as
its respective Unix time (i.e., an integer). When times do not have a
date component, we normalise them about the Unix epoch (e.g., so 12:34
is 1970-01-01 12:34
, which is Unix time 45240
); dates without times
are normalised to midnight.
When we need to model minimum an maximum dates, we can take the minimum
to be -1
(i.e., 1969-12-31 23:59:59) and the maximum, arbitrarily, at
10000000000
(2286-11-20 17:46:40).
Stations in the UK, amongst other things, are all given a three
character code, which is standardised by the government. For example,
London Waterloo is WAT
; the coding tends to be semantically similar to
the stations' actual names. While it's not necessary to use these
standard codes, it makes things easier and, either way, the table forces
you to at least enter a three character string in upper case.
create table stations (
code text primary key not null check (length(code) = 3) check (code = upper(code)),
name text not null
);
stations.code
is used as a foreign key in subsequent tables.
Train routes are modelled as a (potentially cyclic) directed graph with a star topology. This model serves our purpose.
All routes start from an origin station at a particular time. Train
companies change their routes periodically, so an origin must be given a
validity period, whose expiry date can be left null
if this data is
not yet known. Thus, all route origins ("trains") are constrained to be
unique with regard to these three data (station, departure time and
validity start).
Consequently, the system can't model routes that depart from a station at the same time: As we are confining ourselves to monitoring a small number of set routes, this is of little concern. Moreover, the system does not allow for different routes on different days of the week (e.g., distinguishing between weekday and weekend routes): Again, as the purpose of the system is to log commutes, this isn't a huge problem.
create table trains (
id integer primary key asc autoincrement,
stationFrom text not null constraint validOrigin references stations (code),
departT integer not null,
validFrom integer not null,
validTo integer default null,
constraint validValidity check (validTo is null or validTo > validFrom),
constraint uniqueOrigin unique (stationFrom, departT, validFrom)
);
The route each train takes is defined by creating waypoint records. Each
waypoint must have an arrival time after the train's departure time
(constrained with a trigger) and can only be visited once. (Note that
the waypoint can be the origin station.) If the train's arrival time
crosses midnight, then it must be stored with a date of 1970-01-02
.
create table routes (
train integer not null references trains (id),
stationTo text not null constraint validWaypoint references stations (code),
arriveT integer not null,
constraint uniqueRoute primary key (train, stationTo)
);
create trigger validateWaypoint
before insert on routes for each row
when (select departT from trains where id = new.train) >= new.arriveT
begin
select raise(abort, 'Train cannot go back in time!');
end;
Recording journeys along set routes is now a simple case of entering the departure station -- with the journey date and the train's scheduled departure time -- and the destination and actual arrival date and time. Optionally, a free text field exists to record any excuses given by the guard on why the service is so bad, as well as a field to mark the journey as subject to cancellations.
You'll note that this signature departs somewhat from the route setup we defined earlier. Indeed, this table is not fully normalised: This is for the sake of user input and output. You can see -- without having to do complicated joins -- the nature of each journey.
That said, logic does exist in the database to constrain input appropriately. Notably, the destination station must be different -- and at a later time -- than the departure station. (This implies that the cyclicity of the route graph cannot be manifested in journey logs.) Moreover, the table has a trigger that ensures the route entered is valid, allowing for the possibility of catching a train not at its origin (e.g., catching a Waterloo to Kingston train at Wimbledon, providing that route is set).
create table log (
stationFrom text not null constraint validStationFrom references stations (code),
stationTo text not null constraint validStationTo references stations (code),
scheduleT integer not null unique,
arriveT integer not null,
excuse text,
cancelled integer not null default 0 check (cancelled in (0, 1)),
constraint uniqueLog primary key (stationFrom, stationTo, scheduleT),
constraint differentStations check (stationFrom != stationTo),
constraint journeyTime check (arriveT > scheduleT)
);
create trigger validateRoute
before insert on log for each row
when (
select count(*)
from (
select id as train, stationFrom, departT from trains
union
select train, stationTo as stationFrom, arriveT as departT from routes
) as origin
join routes as destination
on destination.train = origin.train
and destination.stationTo = new.stationTo
join trains
on trains.id = origin.train
where origin.stationFrom = new.stationFrom
and origin.departT = strftime('%s', '1970-01-01 ' || time(new.scheduleT, 'unixepoch'))
and new.scheduleT between trains.validFrom and ifnull(trains.validTo, 10000000000)
) = 0
begin
select raise(abort, 'Not a valid route!');
end;
As mentioned, the journey log is not fully normalised. This will thus
have to rely on logic on the application side to properly query. An
example of such logic can be found in the journeys
view, which is a
simple report of each journey's lateness.
create view journeys
select date(log.scheduleT, 'unixepoch') as date,
time(log.scheduleT, 'unixepoch') as time,
origin.name as origin,
destination.name as destination,
(
strftime('%s', '1970-01-01 ' || time(log.arriveT, 'unixepoch')) +
strftime('%s', date(log.arriveT, 'unixepoch')) - strftime('%s', date(log.scheduleT, 'unixepoch')) -
routes.arriveT
) / 60 as minutesLate,
log.excuse,
case log.cancelled
when 1 then 'Yes'
else 'No'
end as cancelled
from log
join stations as origin
on origin.code = log.stationFrom
join stations as destination
on destination.code = log.stationTo
join trains
on trains.stationFrom = log.stationFrom
and trains.departT = strftime('%s', '1970-01-01 ' || time(log.scheduleT, 'unixepoch'))
and log.scheduleT between trains.validFrom and ifnull(trains.validTo, 10000000000)
left join trains as latestRoute
on latestRoute.stationFrom = trains.stationFrom
and latestRoute.departT = trains.departT
and log.scheduleT between latestRoute.validFrom and ifnull(latestRoute.validTo, 10000000000)
and latestRoute.validFrom > trains.validFrom
join routes
on routes.train = trains.id
and routes.stationTo = log.stationTo
where latestRoute.id is null
order by log.scheduleT;
As you can see, the logic required to determine the correct route is quite involved.
[BUG: This view only shows journeys that departed from a recorded origin, it doesn't allow for intraroute journeys.]