Database Schema

The database is SQLite-based and inherits some of its structure from its spreadsheet predecessor. As, in my day job, I have to work with a database with no constraints beyond primary keys and nullity checks, I've somewhat gone to town (perhaps excessively) to show how even something as lightweight as SQLite can do things properly.

A Note on Times and Dates

SQLite doesn't have a datetime type, so all temporal data is stored as its respective Unix time (i.e., an integer). When times do not have a date component, we normalise them about the Unix epoch (e.g., so 12:34 is 1970-01-01 12:34, which is Unix time 45240); dates without times are normalised to midnight.

When we need to model minimum an maximum dates, we can take the minimum to be -1 (i.e., 1969-12-31 23:59:59) and the maximum, arbitrarily, at 10000000000 (2286-11-20 17:46:40).

Stations

Stations in the UK, amongst other things, are all given a three character code, which is standardised by the government. For example, London Waterloo is WAT; the coding tends to be semantically similar to the stations' actual names. While it's not necessary to use these standard codes, it makes things easier and, either way, the table forces you to at least enter a three character string in upper case.

create table stations (
  code text primary key not null check (length(code) = 3) check (code = upper(code)),
  name text not null
);

stations.code is used as a foreign key in subsequent tables.

Modelling Train Routes

Train routes are modelled as a (potentially cyclic) directed graph with a star topology. This model serves our purpose.

All routes start from an origin station at a particular time. Train companies change their routes periodically, so an origin must be given a validity period, whose expiry date can be left null if this data is not yet known. Thus, all route origins ("trains") are constrained to be unique with regard to these three data (station, departure time and validity start).

Consequently, the system can't model routes that depart from a station at the same time: As we are confining ourselves to monitoring a small number of set routes, this is of little concern. Moreover, the system does not allow for different routes on different days of the week (e.g., distinguishing between weekday and weekend routes): Again, as the purpose of the system is to log commutes, this isn't a huge problem.

create table trains (
  id          integer primary key asc autoincrement,
  stationFrom text    not null constraint validOrigin references stations (code),
  departT     integer not null,
  validFrom   integer not null,
  validTo     integer default null,

  constraint validValidity check (validTo is null or validTo > validFrom),
  constraint uniqueOrigin unique (stationFrom, departT, validFrom)
);

The route each train takes is defined by creating waypoint records. Each waypoint must have an arrival time after the train's departure time (constrained with a trigger) and can only be visited once. (Note that the waypoint can be the origin station.) If the train's arrival time crosses midnight, then it must be stored with a date of 1970-01-02.

create table routes (
  train     integer not null references trains (id),
  stationTo text    not null constraint validWaypoint references stations (code),
  arriveT   integer not null,

  constraint uniqueRoute primary key (train, stationTo)
);

create trigger validateWaypoint
before insert on routes for each row
when (select departT from trains where id = new.train) >= new.arriveT
begin
  select raise(abort, 'Train cannot go back in time!');
end;

Journey Log

Recording journeys along set routes is now a simple case of entering the departure station -- with the journey date and the train's scheduled departure time -- and the destination and actual arrival date and time. Optionally, a free text field exists to record any excuses given by the guard on why the service is so bad, as well as a field to mark the journey as subject to cancellations.

You'll note that this signature departs somewhat from the route setup we defined earlier. Indeed, this table is not fully normalised: This is for the sake of user input and output. You can see -- without having to do complicated joins -- the nature of each journey.

That said, logic does exist in the database to constrain input appropriately. Notably, the destination station must be different -- and at a later time -- than the departure station. (This implies that the cyclicity of the route graph cannot be manifested in journey logs.) Moreover, the table has a trigger that ensures the route entered is valid, allowing for the possibility of catching a train not at its origin (e.g., catching a Waterloo to Kingston train at Wimbledon, providing that route is set).

create table log (
  stationFrom text    not null constraint validStationFrom references stations (code),
  stationTo   text    not null constraint validStationTo references stations (code),
  scheduleT   integer not null unique,
  arriveT     integer not null,
  excuse      text,
  cancelled   integer not null default 0 check (cancelled in (0, 1)),
  
  constraint uniqueLog         primary key (stationFrom, stationTo, scheduleT),
  constraint differentStations check (stationFrom != stationTo),
  constraint journeyTime       check (arriveT > scheduleT)
);

create trigger validateRoute
before insert on log for each row
when (
  select count(*)
  from   (
           select id as train, stationFrom, departT from trains
           union
           select train, stationTo as stationFrom, arriveT as departT from routes
         ) as origin
  join   routes as destination
  on     destination.train     = origin.train
  and    destination.stationTo = new.stationTo
  join   trains
  on     trains.id             = origin.train
  where  origin.stationFrom    = new.stationFrom
  and    origin.departT        = strftime('%s', '1970-01-01 ' || time(new.scheduleT, 'unixepoch'))
  and    new.scheduleT         between trains.validFrom and ifnull(trains.validTo, 10000000000)
) = 0
begin
  select raise(abort, 'Not a valid route!');
end;

As mentioned, the journey log is not fully normalised. This will thus have to rely on logic on the application side to properly query. An example of such logic can be found in the journeys view, which is a simple report of each journey's lateness.

create view journeys  
  select    date(log.scheduleT, 'unixepoch') as date,
            time(log.scheduleT, 'unixepoch') as time,
  
            origin.name as origin,
            destination.name as destination,
  
            (
              strftime('%s', '1970-01-01 ' || time(log.arriveT, 'unixepoch')) +
              strftime('%s', date(log.arriveT, 'unixepoch')) - strftime('%s', date(log.scheduleT, 'unixepoch')) -
              routes.arriveT
            ) / 60 as minutesLate,
  
            log.excuse,
  
            case log.cancelled
              when 1 then 'Yes'
              else 'No'
            end as cancelled
  
  from      log
  
  join      stations as origin
  on        origin.code      = log.stationFrom
  join      stations as destination
  on        destination.code = log.stationTo
  
  join      trains
  on        trains.stationFrom = log.stationFrom
  and       trains.departT     = strftime('%s', '1970-01-01 ' || time(log.scheduleT, 'unixepoch'))
  and       log.scheduleT      between trains.validFrom and ifnull(trains.validTo, 10000000000)
  
  left join trains as latestRoute
  on        latestRoute.stationFrom = trains.stationFrom
  and       latestRoute.departT     = trains.departT
  and       log.scheduleT           between latestRoute.validFrom and ifnull(latestRoute.validTo, 10000000000)
  and       latestRoute.validFrom   > trains.validFrom
  
  join      routes
  on        routes.train     = trains.id
  and       routes.stationTo = log.stationTo
  
  where     latestRoute.id is null
  
  order by  log.scheduleT;

As you can see, the logic required to determine the correct route is quite involved.

[BUG: This view only shows journeys that departed from a recorded origin, it doesn't allow for intraroute journeys.]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database Schema

Database Schema

A Note on Times and Dates

Stations

Modelling Train Routes

Journey Log

Clone this wiki locally