= Bi-Temporal PostgreSQL - A PostgreSQL Specific Bi-Temporal Database Emulator
---
Author:: Ara Howard mailto:ahoward@fsl.noaa.gov or mailto:howardat@colorado.edu
== Introduction
---
The BtPgsql module contains a set of classes used to manage bitemporal
relations stored in a PostgreSQL (http://www.postgresql.org) database.
== Prerequisites
---
* Running PostgreSQL installation (http://www.postgresql.org/)
* Ruby >= 1.6.6 installation (http://www.ruby-lang.org)
* Ruby PostgreSQL binding from the RAA (http://raa.ruby-lang.org/list.rhtml?name=postgres)
* Basic understanding of Relational Database Management Systems (RDBMSs) (http://techdocs.postgresql.org/college/002_referentialintegrity/index.php) - Note: this link requires libflash.so configured in your browser.
* Patience!
== Topics
---
* DISCLAIMER - Provisions for using this software
* README - A general overiew of the problem domain (This file)
* INSTALL - Installation instructions
* DESIGN - Design overview
* BtPgsql - Details of the API
* TODO - It just keeps growing
* BUGS - Hopefull not growing (*Please* report any to me)
* DEPENDS - A short list of them
* TUTORIAL - A simple tutorial
== The Bi-Temporal Problem Domain
---
It is tedious to explain the theory of bitemporal databases without some real
world examples and in all of the following examples we will consider a data
model which records the elevation and a description of meteorlogical weather
stations. Each station is assumed to be designated by a unique non-null four
letter code. In a traditional RDBMS the SQL used to construct a relation
(table) used to store information about these stations might look like :
create table stations
(
sid text,
elevation integer,
description text,
primary key (sid)
);
The stations relation would contain (in a traditional RDBMS) tuples (rows) resembling :
stations
------------------------------------------------
sid | elevation | description
--- | |
------------------------------------------------
| |
PANC | 40 | ANCHORAGE INTL ARPT
KDEN | 1656 | DENVER INTL
| |
=== Valid Time Relations
---
Valid time relations have tuples, identified by a primary key, which change
over time. In otherwords the relation's primary key is *not* unique within
the table but *is* unique in the table when any given point in time is
considered. For example, a valid time stations relation might contain the
tuples :
stations
-----------------------------------------------------------
sid | elevation | description | vt_begin | vt_end
--- | | | |
-----------------------------------------------------------
| | | |
PANC | 40 | ANCHORAGE INTL ARPT | -infinity | infinity
KDEN | 1600 | DENVER INTL | -infinity | 1999
KDEN | 1656 | DENVER INTL | 1999 | infinity
| | | |
Where _vt_begin_ (valid time begining) is an *inclusive* timestamp demarking the
instant the fact became true in the real world and _vt_end_ (valid time ending)
is an *exclusive* timestamp demarking the instant the fact stopped being true
in the the real world. The stations relation shows that station _PANC_ sits at
40 meters above sea level and that this has been true from the beginning of
time (-_infinity_), and will remain, as is best known, true forever (_infinity_).
It is often the case in valid time relations that the time a fact became true
is not known and so it is simply entered as true since the beginning of time.
The tuples describing _KDEN_ are more interesting; they show that _KDEN_ was
(in the real world) situated at an elevation of 1600 meters above sea
level starting from -_infinity_ (we don't really know when), but that in
1999 it moved to 1656 meters above sea level, and has remained
there since. This is shown by the fact the the _vt_end_ time is _infinity_,
implying that we do not know, at present, when this fact will stop being true.
Graphically, one might envision this history as :
KDEN ->
-infinity
|
| 1600 meters
|
1999
|
| 1656 meters
|
V
infinity
Or more compactly as :
KDEN -> [-i (1600) 1999)[1999 (1656) i)
Where the symbols '[' and ')' designate the respective inclusive
starting and exclusive ending points of a temporal period, and
(xxxx) implies the datum indicated by the primary key for that
period. This notation borrows from the mathmatical notation for a range where
:
[0 - 42)
would indicate all those numbers 0,1,2,3,...,41 but not, unfortunately,
including 42 (http://www.bbc.co.uk/cult/hitchhikers/).
This table would allow such queries to be made as :
What was the elevation of KDEN in 1998?
select elevation from
stations
where
sid = "KDEN" and
vt_begin <= "1998" and -- inclusive
"1998" < vt_end; -- exclusive
or :
What is the elevation of KDEN now?
select elevation from
stations
where
sid = "KDEN" and
vt_begin <= now() and -- inclusive
now() < vt_end; -- exclusive
to be easily answered with simple sql queries.
Notice that for these queries to be meaningfull it is absolutly *critical*
that the following situation *never* arise :
stations
-----------------------------------------------------------
sid | elevation | description | vt_begin | vt_end
--- | | | |
-----------------------------------------------------------
| | | |
KDEN | 1600 | DENVER INTL | -infinity | 1999
KDEN | 1656 | DENVER INTL | 1980 | infinity
| | | |
Notice that in the above situation _TWO_ tuples would be associated with the
primary key _KDEN_ for the period [1990 - 1999), implying that it was
simoultaneously at an elevation of 1600 and 1656 meters above
sea level! This violates the semantic notion of a primary key - that it
uniquely identifies a tuple and is non-null. For valid time relations is it
necesary to ammend this definition to read thus :
The primary key of a valid time relation is non-null and uniquely
identifies a tuple given a specific moment in time.
Later we will see how the BtPgsql::Relation class is able to interpret and
enforce this meaning on temporal relations.
We now begin to see why the normal RDBMS referential integrity constructs
(like primary key) are insufficient to handle valid time relations
Further complicating matters is the fact that valid time relations make
possible the notion of updating certain rows over a range of time. For
example, imagine that it was discoved today that the elevation of _KDEN_ had
actually been 1597 meters above sea level during the period lasting
from 1999 to 2000 while the station was operating at a temporary
location before it's move the present location which sits at 1656
meters above sea level.
Such information can be modeled in a valid time relation thus :
stations
-----------------------------------------------------------
sid | elevation | description | vt_begin | vt_end
--- | | | |
-----------------------------------------------------------
| | | |
PANC | 40 | ANCHORAGE INTL ARPT | -infinity | infinity
KDEN | 1600 | DENVER INTL | -infinity | 1999
KDEN | 1597 | DENVER INTL | 1999 | 2000
KDEN | 1656 | DENVER INTL | 2000 | infinity
| | | |
Affecting this update is another matter entirely!
These types operation can certainly be done via standard SQL, but each update
must be hand written - an error prone and tedious task. We will see later
that a Ruby BtPgsql::Relation can handle the details of this update
automatically.
One last concept, and major source of complexity, with valid time relations is
the concept of valid time foreign keys. In a non-temporal RDBMS the concept
of a foreign key referes to a field in one relation which refers to a primary
key field in another (possibly the same) relation. Thus, by extension a
foreign key, like a primary key, uniquely identifies exactly one tuple, though
in another table. Foreign key relationships between two valid time relations
have a slightly different definition :
A valid time foreign key must identify a set of tuples in the referenced
relation whose valid time periods completely _cover_ that of the referencing
tuple
That's a somewhat confusing definition.
This example should clarify. Assume a valid time stations relation :
stations
-----------------------------------------------------------
sid | elevation | description | vt_begin | vt_end
--- | | | |
-----------------------------------------------------------
| | | |
KDEN | 1600 | DENVER INTL | -infinity | 2000
KDEN | 1656 | DENVER INTL | 2000 | infinity
| | | |
where _sid_ is the valid time primary key.
Further, assume a station_managers relation :
station_managers
-----------------------------------------------------------
mid | sid | name | vt_begin | vt_end
--- | ^^^ | | |
-----------------------------------------------------------
| | | |
42 | KDEN | Ford Prefect | -infinity | infinity
| | | |
where _mid_ is the valid time primary key and _sid_ is a valid time foreign
key referencing stations.
This is a valid relationship because, at any given moment, the foreign
key _sid_ refers to exactly one tuple in stations. Note that this foreign key
identifies a *set* of tuples spanning the entire timeline of the referencing
tuple.
Personally, I like to visualize this as in :
station_managers
42, KDEN -> [-infinity -- infinity)
| |
| IS COVERED BY |
stations | |
| |
KDEN -> [-infinity -- 2000)[2000 -- infinity)
But consider if stations was instead :
stations
-----------------------------------------------------------
sid | elevation | description | vt_begin | vt_end
--- | | | |
-----------------------------------------------------------
| | | |
KDEN | 1600 | DENVER INTL | -infinity | 1990
KDEN | 1656 | DENVER INTL | 2000 | infinity
| | | |
And station_managers was, as before :
station_managers
-----------------------------------------------------------
mid | sid | name | vt_begin | vt_end
--- | ^^^ | | |
-----------------------------------------------------------
42 | KDEN | Ford Prefect | -infinity | infinity
Note the vt_end of 1990 in the first tuple of stations indicating that KDEN
did not exist during the period [1990 - 2000)!
One could visualize this as :
station_managers
42, KDEN -> [-infinity -- infinity)
| |
| NOT COVERED BY |
stations | ^^^ |
| |
KDEN -> [-infinity -- 1990) [2000 -- infinity)
^^^^
Notice the *gap*!
This gap shows an error in the valid time referential model - namely that
Ford Prefect was the manager of a non-existent station during the
period [1990 - 2000). This is a type of referential itegrity not
easily managable within the confines of ANSI-92 SQL.
We will see later how a BtPgsql::Relation can prevent this sort of referential
itegrity anomoly from occuring.
=== Transaction Time Relations
---
Transaction time relations are similar to valid time relations except that
they track the state of the actual relations, not that of the real
world. These are sometimes known as audit logs as these types of relations
can be used to track the history of a table's inserts, updates, and deletions
and can, therefore, be queried as to the particular state of a table at some
instant in the past.
Their structure is also similar to valid time relations, for example :
stations
-----------------------------------------------------------
sid | elevation | description | tt_start | tt_stop
--- | | | |
-----------------------------------------------------------
| | | |
PANC | 40 | ANCHORAGE INTL ARPT | 1970 | infinity
KDEN | 1600 | DENVER INTL | 1970 | 1999
KDEN | 1656 | DENVER INTL | 2000 | infinity
| | | |
Shows that tuple identified by station _PANC_, at 40 meters above sea
level, was *inserted* at the *inclusive* transaction start time (_tt_start_)
of 1970, and has never been deleted, shown by an *exclusive*
transaction stop time (_tt_stop_) of infinity.
The transaction time relation _stations_ also shows that the tuple identified
by station _KDEN_ was initially inserted into the stations relation in
1970, but was then virtually deleted from the relation in
1999 and then reinserted into stations 2000 where it has
remained since. Note that ther are no actual deletions done from
transaction time relations! It is this property which allows one to
*_reconstruct_* a relation from a prior period using a simple SQL statement
such as :
select * from
stations
where
tt_start <= 1975 and -- inclusive
1975 < tt_stop; -- exclusive
which would yield the relation as it existed in 1975.
Using the graphical notation from above we can show this transaction time
relation with respect to _KDEN_ as :
KDEN ->
1970
|
| 1600 meters
|
1999
!! NO DATA HERE !!
2000
|
| 1656 meters
|
V
or more compactly as :
KDEN -> [1970 (1600) 1999) [2000 (1656) i)
^^^^
Important observations regarding transaction time relations are
* they model the state of the database, *not* the external (modeled) reality
* they have similar issues regarding primary keys, foreign keys, and
referential integrity as valid time relations
* tt_start is always greater then, or equal to, _NOW_ - this is because
although we cannot travel back in time to enter a tuple (_tt_start_ < _NOW_)
we *can* enter a tuple with a _tt_start_ time in the future, thereby
commiting an insertion to a future date.
This last feature can be quite usefull. For example, assume we know, at the
present moment, that KDEN will be changing locations in 2004 and that the
elevation will then be 1777 meters above sea level. We may then execute the
following at any point prior to that :
-- virtually delete the extant KDEN row in the future
update stations
set tt_stop = "2004"
where
sid = "KDEN" and
tt_stop = "infinity";
-- insert a row which will not "appear" untill 2004
insert into stations
values ("KDEN", "1777", "DENVER INTL", "2004", "infinity");
Resulting in stations becoming :
stations
-----------------------------------------------------------
sid | elevation | description | tt_start | tt_stop
--- | | | |
-----------------------------------------------------------
| | | |
PANC | 40 | ANCHORAGE INTL ARPT | 1970 | infinity
KDEN | 1600 | DENVER INTL | 1970 | 1999
KDEN | 1656 | DENVER INTL | 2000 | 2004
KDEN | 1777 | DENVER INTL | 2004 | infinity
| | | |
Which will commit the information and yet processes which may execute queries
like :
-- get the current elevation of KDEN
select * from
stations
where
sid = "KDEN" and
tt_start <= now() and
now() < tt_stop;
will *continute* to see the elevation of 1656 untill 2004 has
arrived, at which point the 1777 tuple will become visible using the
same query as above. This feature can be used to maintain a single relation
for both testing and production software versions since production processes
may ask for the currently active tuple while testing processes may ask
for the latest tuple even if that tuple technically will not exist
untill a future date. If testing is successfull production processes will
automatically begin using the new tuple when the specified time has elapsed.
We will not cover transaction time relations further because the next section
describes the focus of BtPgsql, bi-temporal Relations, which encompass
both vaild time relations and transaction time relations in one incredibly
difficult to understand bundle ;-).
=== Bi-Temporal Relations
---
Bi-temporal relations model both the history of a relation and the history of
the facts modeled by that relation - they track the history of the table and
record a history of a reality which changes over time (is there any other?).
In otherwords they at once exihibit all the traits of valid time and
transaction time relations.
For example, consider this sequence of events committed against a bi-temporal
relation :
Event 0)
The year is 1970 and it is decided that, as is best known, the
KDEN station is sitting at 1600 meters. It is not really known
when this became true (the station began operating) but it is condsidered to
have been in existence from the beginning of time. It is not known when the
station will move and so a vt_end time of infinity is stored.
This fact is entered with a tt_start time of *NOW* (1970) and
the tt_stop field is left as infinity, since the tuple has not
yet been virtually deleted.
stations
---------------------------------------------------------------------------------
sid | elevation | description | vt_begin | vt_end | tt_start | tt_stop
--- | | | | | |
---------------------------------------------------------------------------------
| | | | | |
KDEN | 1600 | DENVER INTL | -infinity | infinity | 1970 | infinity
| | | | | |
Event 1)
In 1980 it is discovered that the station actually moved in 1979, and that it
is now located at an elevation of 1656 meters. The relation can model this as
:
stations
---------------------------------------------------------------------------------
sid | elevation | description | vt_begin | vt_end | tt_start | tt_stop
--- | | | | | |
---------------------------------------------------------------------------------
| | | | | |
KDEN | 1600 | DENVER INTL | -infinity | infinity | 1970 | 1980
KDEN | 1600 | DENVER INTL | -infinity | 1979 | 1980 | infinity
KDEN | 1656 | DENVER INTL | 1979 | infinity | 1980 | infinity
| | | | | |
This set of tuples would allow the following queries to be answered :
What was the elevation of KDEN in 1965 as was best known in 1973?
select * from
stations
where
sid = "KDEN" and
vt_begin <= "1965" and
"1965" < vt_end and
tt_start <= "1973" and
"1973" < tt_stop;
Which would identify this tuple :
stations
---------------------------------------------------------------------------------
sid | elevation | description | vt_begin | vt_end | tt_start | tt_stop
--- | | | | | |
---------------------------------------------------------------------------------
| | | | | |
KDEN | 1600 | DENVER INTL | -infinity | infinity | 1970 | 1980
| | | | | |
Or this query :
What was the elevation of KDEN in 1981 as is best known now?
select * from
stations
where
sid = "KDEN" and
vt_begin <= "1981" and
"1981" < vt_end and
tt_start <= now() and
now() < tt_stop;
Which would identify this tuple :
stations
---------------------------------------------------------------------------------
sid | elevation | description | vt_begin | vt_end | tt_start | tt_stop
--- | | | | | |
---------------------------------------------------------------------------------
| | | | | |
KDEN | 1656 | DENVER INTL | 1979 | infinity | 1980 | infinity
| | | | | |
Hopefully it is clear at this point that bi-temporal relations model
information about the history of the real world and also when this information
was inserted/deleted/updated in a relation. These two timelines, that of the
real world (valid time) and that of the relation (transaction time) are
completely orthogonal.
It is easy to imagine all of this becoming very confusing, especially when
relationships (foreign keys) between bi-temporal relations are considered. It
would be beyond the scope of this doccument to completely described the
semantics of all possible bi-temporal relationships but note that BtPgsql
handles all of them automatically for you in most cases and that all that is
required by users of the classes in this module is an understanding of *what*
to do - BtPgsql takes care of *how*.
== BtPgsql - Bi-Temporal Management Made Easy(ier)
---
=== The _How_
---
Before moving along to *what* BtPgsql can do, the reader would be advised to
puruse the explanation provided of *how* it accomplishes it's function
described in the DESIGN overview.
=== The _What_
---
At this point we are in a position to discuss some of the things BtPgsql can
do for you. This part the doccumentation defers to that of the BtPgsql
module and it's fundemental class - BtPgsql::Relation. We simply enumerate
here, some of the more important methods of this class :
* BtPgsql::Relation.seq_insert
* BtPgsql::Relation.seq_update
* BtPgsql::Relation.seq_delete
* BtPgsql::Relation.seq_push
== Summary
---
BtPgsql can substantially reduce the headaches associated with managing
bi-temporal relational database tables with hand written sql.
== Suggested Reading
---
The inspiration for BtPgsql was the excellent, Developing Time-Oriented
Database Applications in SQL by Richard T. Snodgras. Several logic errors
in that text have been correctly implemented in BtPgsql and surely several
have been introduced - it is a difficult set of rules to correctly code.