Many labels consist of several properties. For example, the two edges from the &Willis node shown in Figure 2 have the same value for the name property, but different transaction times. The most common property is name--only in unusual circumstances will an edge be unnamed.
The ability to accommodate the schema irregularities found in web data is an important feature of a semistructured (or unstructured) data model. In keeping with this requirement, the data model presented in the paper has several features worth mentioning.
One feature is that a property found in one label can be missing from other labels. In Figure 2, the transaction time property is in only a few of the labels. Generally, a property is missing because it is don't care information, as in, this property is missing because we don't care if it is present, it is not germane to or will not improve the description of the data.
Another feature is that a property can be specified as being required. A required property is required to be matched in a query to gain access to nodes below that edge, but otherwise is just like any other property.1 The security property on the edge to &Color of Night is a required property (indicated by affixing an `!' to the property name). It is meant to indicate that a user must have a matching security clearance, i.e., an appropriate certificate, to traverse that edge. Further details on required properties are presented in Section 3.2.2.
There are few restrictions on the properties in labels. Common properties may be shared by a number of labels. Meta-data is often specified for a bag or container for a collection of objects . Since a label is a set, it can easily be shared, in part or in whole, among a number of labels. In addition, multiple edges may connect the same pair of nodes with overlapping or redundant labels. Requiring labels to contain disjoint descriptions would be an unnecessary restriction.
Multiple properties in a label can capture more data semantics, but they break existing query languages. To take one example, consider the path from &movies through &Star Wars IV to the misspelled value Bruce Wilis. It would be easy to retrieve that path by using an appropriate regular expression over the name property in each label (e.g., movie.stars.name). While this is a path, it is not a valid path since the transaction times of the first and last edges in the path are disjoint: when the first edge in the path was inserted, the final edge was already deleted. So at no time did the two edges coexist in the current database state.
This paper offers a collection of query language operators that support a more correct manipulation of the extended edge labels. Each operator is extensible in the sense that the semantics of properties are not fixed in the data model; rather, the meaning is supplied by a database designer. For instance, to test the validity of a path, the transaction time property will be tested quite differently than the name property. Several new query operators are also described. Match matches a so-called path regular expression to the labels along a path in the semistructure. Collapse collapses entire paths to single edges that have their labels computed from the labels on each edge in the path. Coalesce computes the value of a property which is distributed among a number of different labels on edges between the same pair of nodes. Finally, Slice restructures the labels along paths by slicing a portion from selected properties in each label.