Decomposed storage has several advantages over traditional horizonal storage, including better space-efficiency when storing sparse data and more efficient schema modifications. Furthermore, PostgreDynamic implements decomposed storage at a low level in the database engine and automatically converts data to the standard horizontal format, which provides a signifigant advantage over vertical storage in the ease of querying and maintaining the data.
PostgreDynamic was designed with YCMI's SenseLab and TrialDB databases as target applications, but it is suitable for any database application that stores heterogeneous data or applications where the database schema is frequently changed.
The following describes our implementation of the decomposed storage algoritms in detail.
Let r be an n+1-ary relation consisting of n attributes plus an object identifier. To implement the decomposed storage of r, we create n two-column ``attribute'' tables, and a single one-column ``object'' table. Each two-column table stores pairs of object identifiers and attribute values. Null values of an attribute are not explicitly stored; their presence is inferred by the absence of an object-attribute pair in the attribute table. The one-column table stores a list of all object identifiers present in the relation. Finally, the correspondence between r, r's object table, and r's attribute tables are stored in a system catalog.
The object table was not present in Copeland and Khoshafian's original decomposed storage model. The addition of the object table allows us to quickly determine if a row is present in a relation, avoiding a potential scan of each of the attribute tables. It also allows us to maintain a single virtual location of a tuple set to the physical location of it's object identifier in the object table, and enables us to use a left outer join instead of a full outer join when computing query results. Additionally, the object table was also devised to allow future versions of the dynamic table infrastructure to provide row level security and versioning, which are present in the EAV/CR data model.
We implemented dynamic tables at the heap-access layer of PostgreSQL, which implements operations on individual tuples in a relation, namely inserting, updating, deleting, and retrieving a tuple. When the database user creates a new dynamic table, we create the attribute and object tables in a hidden system namespace, exposing only the virtual, horizontal schema given as the table's definition in the CREATE TABLE statement.
To retrieve a tuple for a query on r, we first scan the object table to find a qualifying object identifier, o. If found, we use o to query each of r's attribute tables. If the pair (o, a) is present in an attribute table for some value a, we set the corresponding attribute in the returned tuple equal to a. Otherwise, the tuple's attribute value is set to NULL. To facilitate the efficient lookup of object IDs in the attribute tables, we maintain a B-tree index on object IDs for each table.
Likewise, inserting a new tuple into r is implemented by inserting a new identifier into r's object table, then for each of the new tuple's attributes, if the new attribute value is non-NULL, we insert the new object ID and attribute value pair into r's corresponding attribute table. For deletes, given an object ID, we simply delete each tuple from r's object and attribute tables. Updates are slightly more complex in that the operation to perform is dependent on both the old and new values of each attribute. The semantics for updating an attribute a to a' in r are shown below:
|a is NULL||a is non-NULL|
|a' is NULL||No action||Delete a|
|a' is non-NULL||Insert a'||Update a to a'|
Action performed when updating attribute a to a' on a dynamic table.
Given the evolving nature of the data we intend to store, it is important that schema modifications on dynamic tables are implemented efficiently and that we avoid the cost of copying an entire table whenever possible. We support the schema operations of adding and removing columns, and changing a column's type, in addition to trivial schema operations such as renaming a column or renaming the entire table. We implement the non-trivial operations as follows: