next up previous
Next: Extending a Semistructured Data Up: Motivation and Background Previous: Features of Properties in

Contrast With Existing Semistructured Data Models

Our proposed model is not the only one capable of representing meta-data; existing semistructured data models with simple string labels can also explicitly capture meta-data. For example, the ``property'' information in a label could be encoded by splitting an edge into separate data and meta-data edges, with the properties branching from the end of the meta-data edge. But there are at least two problems with this approach of encoding the meta-data together with the data.

First, meta-data has no special status in such a model, so a query that involves a wildcard (which matches any label) may unintentionally access meta-data rather than data. A user could formulate a query that follows only data edges, but this is challenging and, we believe, unnecessary. It should not be left to the user to guess how the meta-data is represented in the database and to write queries to explicitly avoid such data.

A second, more fundamental problem, is that some of the meta-data has special semantics that must be accounted for in queries. For instance, assume that in a semistructured database with simple string labels, the transaction time for an object is represented as a ttime edge from that node. As discussed above, a path is only valid if its edges are concurrent in the database--any other semantics is incorrect. Below we give a Lorel-like query to correctly retrieve only movie.star.names that are concurrent (assuming that the INTERSECT operation computes the intersection of two time intervals).

SELECT N
FROM movie M, M.star S, S.name N
WHERE NOT_EMPTY(M.ttime INTERSECT 
            S.ttime INTERSECT N.ttime)
The WHERE clause tests the transaction times of objects along the path to ensure that they are concurrent.

Although a user may explicitly formulate each query to correctly manipulate the transaction time and other properties, such a strategy has several highly undesirable features. First, all properties must be accounted for in all queries. For example, the query given above is incorrect since it does not correctly handle the security property. Second, the semantics of a property cannot be enforced. For example, a user could simply omit the WHERE clause in the query given above, or test some other condition on transaction time. The query will run to completion and return a result. But since the semantics of the transaction time property has not been observed, the result may include fictive paths. Third, naive users cannot formulate queries. A user has to know which properties exist, be familiar with the semantics of those properties, and must appropriately contend with all properties in every query. Fourth, queries become brittle. Even correctly formed queries will have a short shelf-life since adding a new property, or deleting an existing one, can break existing queries.

In summary, it is theoretically possible, but unattractive and beyond the capabilities of users to represent and query properties using an ordinary semistructured database. The extensible data model presented in this paper can be viewed as, and perhaps can even be implemented as, a layer on top of a normal semistructured data model. The layer implements the semantics for each property and correctly translates queries and results between the user and the underlying database.



next up previous
Next: Extending a Semistructured Data Up: Motivation and Background Previous: Features of Properties in

Copyright © 1998. Curtis E. Dyreson, Michael H. B&. All rights reserved.