README
Path: README
Modified: Tue Mar 18 17:47:08 GMT 2003
TopLevel

Bi-Temporal PostgreSQL - A PostgreSQL Specific Bi-Temporal Database Emulator


Author:Ara Howard ahoward@fsl.noaa.gov or howardat@colorado.edu

Introduction


The BtPgsql module contains a set of classes used to manage bitemporal relations stored in a PostgreSQL (www.postgresql.org) database.

Prerequisites


Topics


The Bi-Temporal Problem Domain


It is tedious to explain the theory of bitemporal databases without some real world examples and in all of the following examples we will consider a data model which records the elevation and a description of meteorlogical weather stations. Each station is assumed to be designated by a unique non-null four letter code. In a traditional RDBMS the SQL used to construct a relation (table) used to store information about these stations might look like :

  create table stations
    (
      sid text,
      elevation integer,
      description text,

      primary key (sid)
    );

The stations relation would contain (in a traditional RDBMS) tuples (rows) resembling :

  stations
  ------------------------------------------------
  sid  | elevation | description
  ---  |           |
  ------------------------------------------------
       |           |
  PANC | 40        | ANCHORAGE INTL ARPT
  KDEN | 1656      | DENVER INTL
       |           |

Valid Time Relations


Valid time relations have tuples, identified by a primary key, which change over time. In otherwords the relation's primary key is not unique within the table but is unique in the table when any given point in time is considered. For example, a valid time stations relation might contain the tuples :

  stations
  -----------------------------------------------------------
  sid  | elevation | description         | vt_begin  | vt_end
  ---  |           |                     |           |
  -----------------------------------------------------------
       |           |                     |           |
  PANC | 40        | ANCHORAGE INTL ARPT | -infinity | infinity
  KDEN | 1600      | DENVER INTL         | -infinity | 1999
  KDEN | 1656      | DENVER INTL         | 1999      | infinity
       |           |                     |           |

Where vt_begin (valid time begining) is an inclusive timestamp demarking the instant the fact became true in the real world and vt_end (valid time ending) is an exclusive timestamp demarking the instant the fact stopped being true in the the real world. The stations relation shows that station PANC sits at 40 meters above sea level and that this has been true from the beginning of time (-infinity), and will remain, as is best known, true forever (infinity).

It is often the case in valid time relations that the time a fact became true is not known and so it is simply entered as true since the beginning of time.

The tuples describing KDEN are more interesting; they show that KDEN was (in the real world) situated at an elevation of 1600 meters above sea level starting from -infinity (we don't really know when), but that in 1999 it moved to 1656 meters above sea level, and has remained there since. This is shown by the fact the the vt_end time is infinity, implying that we do not know, at present, when this fact will stop being true.

Graphically, one might envision this history as :

  KDEN ->
          -infinity
            |
            |    1600 meters
            |
          1999
            |
            |    1656 meters
            |
            V

          infinity

Or more compactly as :

  KDEN -> [-i  (1600)  1999)[1999  (1656)  i)

Where the symbols '[' and ')' designate the respective inclusive starting and exclusive ending points of a temporal period, and (xxxx) implies the datum indicated by the primary key for that period. This notation borrows from the mathmatical notation for a range where :

  [0 - 42)

would indicate all those numbers 0,1,2,3,...,41 but not, unfortunately, including 42 (www.bbc.co.uk/cult/hitchhikers/).

This table would allow such queries to be made as :

What was the elevation of KDEN in 1998?

  select elevation from
    stations
    where
      sid = "KDEN" and
      vt_begin <= "1998" and    -- inclusive
      "1998" < vt_end;          -- exclusive

or :

What is the elevation of KDEN now?

  select elevation from
    stations
    where
      sid = "KDEN" and
      vt_begin <= now()  and    -- inclusive
       now() < vt_end;          -- exclusive

to be easily answered with simple sql queries.

Notice that for these queries to be meaningfull it is absolutly critical that the following situation never arise :

  stations
  -----------------------------------------------------------
  sid  | elevation | description         | vt_begin  | vt_end
  ---  |           |                     |           |
  -----------------------------------------------------------
       |           |                     |           |
  KDEN | 1600      | DENVER INTL         | -infinity | 1999
  KDEN | 1656      | DENVER INTL         | 1980      | infinity
       |           |                     |           |

Notice that in the above situation TWO tuples would be associated with the primary key KDEN for the period [1990 - 1999), implying that it was simoultaneously at an elevation of 1600 and 1656 meters above sea level! This violates the semantic notion of a primary key - that it uniquely identifies a tuple and is non-null. For valid time relations is it necesary to ammend this definition to read thus :

The primary key of a valid time relation is non-null and uniquely identifies a tuple given a specific moment in time.

Later we will see how the BtPgsql::Relation class is able to interpret and enforce this meaning on temporal relations.

We now begin to see why the normal RDBMS referential integrity constructs (like primary key) are insufficient to handle valid time relations

Further complicating matters is the fact that valid time relations make possible the notion of updating certain rows over a range of time. For example, imagine that it was discoved today that the elevation of KDEN had actually been 1597 meters above sea level during the period lasting from 1999 to 2000 while the station was operating at a temporary location before it's move the present location which sits at 1656 meters above sea level.

Such information can be modeled in a valid time relation thus :

  stations
  -----------------------------------------------------------
  sid  | elevation | description         | vt_begin  | vt_end
  ---  |           |                     |           |
  -----------------------------------------------------------
       |           |                     |           |
  PANC | 40        | ANCHORAGE INTL ARPT | -infinity | infinity
  KDEN | 1600      | DENVER INTL         | -infinity | 1999
  KDEN | 1597      | DENVER INTL         | 1999      | 2000
  KDEN | 1656      | DENVER INTL         | 2000      | infinity
       |           |                     |           |

Affecting this update is another matter entirely!

These types operation can certainly be done via standard SQL, but each update must be hand written - an error prone and tedious task. We will see later that a Ruby BtPgsql::Relation can handle the details of this update automatically.

One last concept, and major source of complexity, with valid time relations is the concept of valid time foreign keys. In a non-temporal RDBMS the concept of a foreign key referes to a field in one relation which refers to a primary key field in another (possibly the same) relation. Thus, by extension a foreign key, like a primary key, uniquely identifies exactly one tuple, though in another table. Foreign key relationships between two valid time relations have a slightly different definition :

A valid time foreign key must identify a set of tuples in the referenced relation whose valid time periods completely cover that of the referencing tuple

That's a somewhat confusing definition.

This example should clarify. Assume a valid time stations relation :

  stations
  -----------------------------------------------------------
  sid  | elevation | description         | vt_begin  | vt_end
  ---  |           |                     |           |
  -----------------------------------------------------------
       |           |                     |           |
  KDEN | 1600      | DENVER INTL         | -infinity | 2000
  KDEN | 1656      | DENVER INTL         | 2000      | infinity
       |           |                     |           |

where sid is the valid time primary key.

Further, assume a station_managers relation :

  station_managers
  -----------------------------------------------------------
  mid | sid  | name         | vt_begin  | vt_end
  --- | ^^^  |              |           |
  -----------------------------------------------------------
      |      |              |           |
  42  | KDEN | Ford Prefect | -infinity | infinity
      |      |              |           |

where mid is the valid time primary key and sid is a valid time foreign key referencing stations.

This is a valid relationship because, at any given moment, the foreign key sid refers to exactly one tuple in stations. Note that this foreign key identifies a set of tuples spanning the entire timeline of the referencing tuple.

Personally, I like to visualize this as in :

  station_managers

    42, KDEN  ->   [-infinity        --        infinity)
                   |                                   |
                   |           IS COVERED BY           |
  stations         |                                   |
                   |                                   |
    KDEN      ->   [-infinity -- 2000)[2000 -- infinity)

But consider if stations was instead :

  stations
  -----------------------------------------------------------
  sid  | elevation | description         | vt_begin  | vt_end
  ---  |           |                     |           |
  -----------------------------------------------------------
       |           |                     |           |
  KDEN | 1600      | DENVER INTL         | -infinity | 1990
  KDEN | 1656      | DENVER INTL         | 2000      | infinity
       |           |                     |           |

And station_managers was, as before :

  station_managers
  -----------------------------------------------------------
  mid | sid  | name         | vt_begin  | vt_end
  --- | ^^^  |              |           |
  -----------------------------------------------------------
  42  | KDEN | Ford Prefect | -infinity | infinity

Note the vt_end of 1990 in the first tuple of stations indicating that KDEN did not exist during the period [1990 - 2000)!

One could visualize this as :

  station_managers

    42, KDEN  ->     [-infinity          --          infinity)
                     |                                       |
                     |              NOT COVERED BY           |
  stations           |              ^^^                      |
                     |                                       |
    KDEN      ->     [-infinity -- 1990)    [2000 -- infinity)
                                        ^^^^

Notice the gap!

This gap shows an error in the valid time referential model - namely that Ford Prefect was the manager of a non-existent station during the period [1990 - 2000). This is a type of referential itegrity not easily managable within the confines of ANSI-92 SQL.

We will see later how a BtPgsql::Relation can prevent this sort of referential itegrity anomoly from occuring.

Transaction Time Relations


Transaction time relations are similar to valid time relations except that they track the state of the actual relations, not that of the real world. These are sometimes known as audit logs as these types of relations can be used to track the history of a table's inserts, updates, and deletions and can, therefore, be queried as to the particular state of a table at some instant in the past.

Their structure is also similar to valid time relations, for example :

  stations
  -----------------------------------------------------------
  sid  | elevation | description         | tt_start | tt_stop
  ---  |           |                     |          |
  -----------------------------------------------------------
       |           |                     |          |
  PANC | 40        | ANCHORAGE INTL ARPT | 1970     | infinity
  KDEN | 1600      | DENVER INTL         | 1970     | 1999
  KDEN | 1656      | DENVER INTL         | 2000     | infinity
       |           |                     |          |

Shows that tuple identified by station PANC, at 40 meters above sea level, was inserted at the inclusive transaction start time (tt_start) of 1970, and has never been deleted, shown by an exclusive transaction stop time (tt_stop) of infinity.

The transaction time relation stations also shows that the tuple identified by station KDEN was initially inserted into the stations relation in 1970, but was then virtually deleted from the relation in 1999 and then reinserted into stations 2000 where it has remained since. Note that ther are no actual deletions done from transaction time relations! It is this property which allows one to reconstruct a relation from a prior period using a simple SQL statement such as :

  select * from
    stations
    where
      tt_start <= 1975 and  -- inclusive
      1975 < tt_stop;       -- exclusive

which would yield the relation as it existed in 1975.

Using the graphical notation from above we can show this transaction time relation with respect to KDEN as :

  KDEN ->
          1970
            |
            |    1600 meters
            |
          1999

                  !! NO DATA HERE !!
          2000
            |
            |    1656 meters
            |
            V

or more compactly as :

  KDEN -> [1970  (1600)  1999)    [2000  (1656)  i)
                              ^^^^

Important observations regarding transaction time relations are

This last feature can be quite usefull. For example, assume we know, at the present moment, that KDEN will be changing locations in 2004 and that the elevation will then be 1777 meters above sea level. We may then execute the following at any point prior to that :

  -- virtually delete the extant KDEN row in the future

  update stations
    set tt_stop = "2004"
    where
      sid = "KDEN" and
      tt_stop = "infinity";
 -- insert a row which will not "appear" untill 2004

  insert into stations
    values ("KDEN", "1777", "DENVER INTL", "2004", "infinity");

Resulting in stations becoming :

  stations
  -----------------------------------------------------------
  sid  | elevation | description         | tt_start | tt_stop
  ---  |           |                     |          |
  -----------------------------------------------------------
       |           |                     |          |
  PANC | 40        | ANCHORAGE INTL ARPT | 1970     | infinity
  KDEN | 1600      | DENVER INTL         | 1970     | 1999
  KDEN | 1656      | DENVER INTL         | 2000     | 2004
  KDEN | 1777      | DENVER INTL         | 2004     | infinity
       |           |                     |          |

Which will commit the information and yet processes which may execute queries like :

  -- get the current elevation of KDEN

  select * from
    stations
      where
        sid = "KDEN" and
        tt_start <= now() and
        now() < tt_stop;

will continute to see the elevation of 1656 untill 2004 has arrived, at which point the 1777 tuple will become visible using the same query as above. This feature can be used to maintain a single relation for both testing and production software versions since production processes may ask for the currently active tuple while testing processes may ask for the latest tuple even if that tuple technically will not exist untill a future date. If testing is successfull production processes will automatically begin using the new tuple when the specified time has elapsed.

We will not cover transaction time relations further because the next section describes the focus of BtPgsql, bi-temporal Relations, which encompass both vaild time relations and transaction time relations in one incredibly difficult to understand bundle ;-).

Bi-Temporal Relations


Bi-temporal relations model both the history of a relation and the history of the facts modeled by that relation - they track the history of the table and record a history of a reality which changes over time (is there any other?). In otherwords they at once exihibit all the traits of valid time and transaction time relations.

For example, consider this sequence of events committed against a bi-temporal relation :

Event 0) The year is 1970 and it is decided that, as is best known, the KDEN station is sitting at 1600 meters. It is not really known when this became true (the station began operating) but it is condsidered to have been in existence from the beginning of time. It is not known when the station will move and so a vt_end time of infinity is stored. This fact is entered with a tt_start time of NOW (1970) and the tt_stop field is left as infinity, since the tuple has not yet been virtually deleted.

  stations
  ---------------------------------------------------------------------------------
  sid  | elevation | description         | vt_begin  | vt_end   | tt_start | tt_stop
  ---  |           |                     |           |          |          |
  ---------------------------------------------------------------------------------
       |           |                     |           |          |          |
  KDEN | 1600      | DENVER INTL         | -infinity | infinity | 1970     | infinity
       |           |                     |           |          |          |

Event 1) In 1980 it is discovered that the station actually moved in 1979, and that it is now located at an elevation of 1656 meters. The relation can model this as :

  stations
  ---------------------------------------------------------------------------------
  sid  | elevation | description         | vt_begin  | vt_end   | tt_start | tt_stop
  ---  |           |                     |           |          |          |
  ---------------------------------------------------------------------------------
       |           |                     |           |          |          |
  KDEN | 1600      | DENVER INTL         | -infinity | infinity | 1970     | 1980
  KDEN | 1600      | DENVER INTL         | -infinity | 1979     | 1980     | infinity
  KDEN | 1656      | DENVER INTL         | 1979      | infinity | 1980     | infinity
       |           |                     |           |          |          |

This set of tuples would allow the following queries to be answered :

What was the elevation of KDEN in 1965 as was best known in 1973?

  select * from
    stations
      where
        sid = "KDEN" and
        vt_begin <= "1965" and
        "1965" < vt_end and
        tt_start <= "1973" and
        "1973" < tt_stop;

Which would identify this tuple :

  stations
  ---------------------------------------------------------------------------------
  sid  | elevation | description         | vt_begin  | vt_end   | tt_start | tt_stop
  ---  |           |                     |           |          |          |
  ---------------------------------------------------------------------------------
       |           |                     |           |          |          |
  KDEN | 1600      | DENVER INTL         | -infinity | infinity | 1970     | 1980
       |           |                     |           |          |          |

Or this query :

What was the elevation of KDEN in 1981 as is best known now?

  select * from
    stations
      where
        sid = "KDEN" and
        vt_begin <= "1981" and
        "1981" < vt_end and
        tt_start <= now() and
        now() < tt_stop;

Which would identify this tuple :

  stations
  ---------------------------------------------------------------------------------
  sid  | elevation | description         | vt_begin  | vt_end   | tt_start | tt_stop
  ---  |           |                     |           |          |          |
  ---------------------------------------------------------------------------------
       |           |                     |           |          |          |
  KDEN | 1656      | DENVER INTL         | 1979      | infinity | 1980     | infinity
       |           |                     |           |          |          |

Hopefully it is clear at this point that bi-temporal relations model information about the history of the real world and also when this information was inserted/deleted/updated in a relation. These two timelines, that of the real world (valid time) and that of the relation (transaction time) are completely orthogonal.

It is easy to imagine all of this becoming very confusing, especially when relationships (foreign keys) between bi-temporal relations are considered. It would be beyond the scope of this doccument to completely described the semantics of all possible bi-temporal relationships but note that BtPgsql handles all of them automatically for you in most cases and that all that is required by users of the classes in this module is an understanding of what to do - BtPgsql takes care of how.

BtPgsql - Bi-Temporal Management Made Easy(ier)


The How


Before moving along to what BtPgsql can do, the reader would be advised to puruse the explanation provided of how it accomplishes it's function described in the DESIGN overview.

The What


At this point we are in a position to discuss some of the things BtPgsql can do for you. This part the doccumentation defers to that of the BtPgsql module and it's fundemental class - BtPgsql::Relation. We simply enumerate here, some of the more important methods of this class :

Summary


BtPgsql can substantially reduce the headaches associated with managing bi-temporal relational database tables with hand written sql.

Suggested Reading


The inspiration for BtPgsql was the excellent, Developing Time-Oriented Database Applications in SQL by Richard T. Snodgras. Several logic errors in that text have been correctly implemented in BtPgsql and surely several have been introduced - it is a difficult set of rules to correctly code.