Skip to content

Longitudinal Updates

Billy Charlton edited this page Apr 4, 2017 · 2 revisions

Longitudinal data, sometimes referred to as panel data, track the same sample at different points in time. The sample can consist of individuals, households, establishments, and so on. In contrast, repeated cross-sectional data, which also provides long-term data, gives the same survey to different samples over time.

Longitudinal data

Separation of datasets into data dimensions and data fact tables enables longitudinal data updates. The "dimensions" generally don't change: who a person is, the extent of a CMP segment, etc. "Fact tables" are the more transactional-type data points: the recorded speed at a given time on a CMP segment, the number of trips on a bus route, etc.

In the case where the dimensions themselves are changing, very smart people have come up with "temporal patterns" for recording change in a database.

  • A long read but fascinating source is here: https://martinfowler.com/eaaDev/timeNarrative.html
  • The TL;DR version is that you can add just two timestamp fields to your dimension tables, which record when something occurred, and when you found out about the change, and that covers just about every case.
    • For example, if you have a person record and that person has a home address attached, and then the address changes, you can record the new address, when that address became the true address, and when you "found out about it". You need both if you want to answer questions like "Where did Bob live on April 1st" as well as "What address did we use when we sent Bob an invoice on April 15th?"

I really don't think we need to grief over this. If we need to add the timestamp fields, it is easy to do so when necessary.

Cross-sectional Updates

A more typical case is cross-sectional updates: which is basically just additional data, collected from new samples, over time. Our database system can easily digest new samples. New data will simply require additional records in the appropriate dimension and fact tables.

  • For example, new CMP data in 2017 will provide additional congestion data at existing locations. This data should be represented as additional rows in the database CMP fact tables.
  • Fast-trips isn't really cross-sectional, but one can imagine new model runs as simply additional data being fed into the database.
Clone this wiki locally