next up previous
Next: Path Match Up: Retrieving Information From Semistructures Previous: Retrieving Information From Semistructures

   
Path Validity

Only some of the paths in the semistructure are valid. Intuitively a path is valid if it transits through properties that share some ``commonality.'' This commonality is computed by collapsing the labels on the path.

Consider the case of a path to a movie star's name. One such path is shown in Figure 3, composed by the solid lines. Intuitively, the path forms a virtual edge from &movies to Bruce Willis. In the figure, the virtual edge is depicted as a dashed line. The virtual edge should have a label that describes it, just like any other edge. This label is determined by collapsing the labels along the path into a single label.


  
Figure 3: A (Virtual) Edge for the Name of a Movie Star
\scalebox{.6}{\includegraphics{idraw/virtualEdge.ps}}

The operation described below collapses a path by recursively collapsing the labels along the path. A pair of labels is collapsed by determining their common properties. If only one of the labels has some property, that property is propagated to the collapsed label. A missing property in a label is interpreted as ``don't care information,'' meaning that any value of the missing property is acceptable for the label. For properties that appear in both labels, a property-specific collapsing constructor is used to compute the value of the property. This constructor could result in an undefined value, which signifies that these labels do not have any commonality for that property. The path is collapsed backwards, that is, from the sink to the source, which effectively means that each collapsing constructor is left-associative.

[ ${{\it ClPt\/}\xspace_{\Gamma}}:
{\it PATH} \rightarrow {\it EDGE}$]

Collapse path ( ${{\it ClPt\/}\xspace_{\Gamma}}$) takes a path and computes the label for the virtual edge between the first and last nodes in the path. The operation is extensible in that it depends on the semantics of the properties as given by $\Gamma$. Each constructor ${\it PrCl\/}\xspace_p$ in $\Gamma$ is property-specific and is used to collapse a pair of property values for property p. In this operation, required properties are treated the same as other properties.


		${\it ClPt\/}\xspace_{\Gamma}(v~{\mathop{\longrightarrow}\limits^{\cal L}}~w) \stackrel{\scriptstyle \triangle}{=}v~{\mathop{\longrightarrow}\limits^{{\cal L}}}~w$

${\it ClPt\/}\xspace_{\Gamma}( v~{\mathop{\longrightarrow}\limits^{{\cal L}_1}}~...
...{\longrightarrow}\limits^{{\cal L}_2}}~w) \stackrel{\scriptstyle \triangle}{=}~$ $v~{\mathop{\longrightarrow}\limits^{\cal L}}~w$ where
${\cal L} = $$\{ (p{\rm :}~{\it PrCl\/}\xspace_p(x,~y)~\vert~(p{\rm :}~x) \in {\cal L}_1~\land~(p{\rm :}~y) \in {\cal L}_2 \}~\cup $
$\{ (p{\rm :}~x)~\vert~(p{\rm :}~x) \in {\cal L}_1~\land~(p{\rm :}~y) \not\in {\cal L}_2 \}~\cup $
$\{ (p{\rm :}~y)~\vert~(p{\rm :}~x) \not\in {\cal L}_1~\land~(p{\rm :}~y) \in {\cal L}_2 \}$
${\it ClPt\/}\xspace_{\Gamma}( v~{\mathop{\longrightarrow}\limits^{{\cal L}_1}}~...
...p{\longrightarrow}\limits^{{\cal L}_m}}~w) \stackrel{\scriptstyle \triangle}{=}$
$ {\it ClPt\/}\xspace_{\Gamma}( v~{\mathop{\longrightarrow}\limits^{{\cal L}_1}}...
...\limits^{{\cal L}_2}}~\ldots~{\mathop{\longrightarrow}\limits^{{\cal L}_m}}~w))$ `$\mathchoice{\raisebox{-.3ex}{\vbox{\hrule width5pt height2.5pt
\hrule width5pt...
...dth5pt height2.5pt
\hrule width5pt height2.5pt
\hrule width5pt height2.5pt}}}$2

The collapsing constructor, ${\it PrCl\/}\xspace_p$, depends on the semantics of the property. Table 1 suggests constructors for a few common properties. In general, since each property is collapsed independently, the collapse constructor for a property should either be a mutator, which transforms one domain value into another, e.g., concatenation, or a restrictor, which reduces the extent of the domain value, e.g., time interval intersection.

transaction time property in the collapsed path in Figure 3 is [31/Jul/1998 - uc]. This is the intersection of the transaction times on the edges on the path. It follows that the value Bruce Willis was described in the database as a movie.stars.name from 31/Jul/1998 to the current time (until it is changed). Note that this is not an exclusive description--a different movie.stars.name path (through &Color of Night) is current over a slightly longer transaction-time interval. $\mathchoice{\raisebox{-.3ex}{\vbox{\hrule width5pt height2.5pt
\hrule width5pt...
...dth5pt height2.5pt
\hrule width5pt height2.5pt
\hrule width5pt height2.5pt}}}$2


To determine if a path is valid, the path is collapsed and then each property is checked to ensure that it is defined.

[ ${\it Valid\/}\xspace_{\Gamma}: {\it PATH} \rightarrow {\it BOOLEAN}$]

A path, P, is valid if after collapsing the path, there are no properties with undefined values.


		${\it Valid\/}\xspace_{\Gamma}(P) \stackrel{\scriptstyle \triangle}{=}\forall p \; [ \: (p{\rm :} {\it undefined}) \not \in {\cal L}~\land$

$ (p{\rm !} {\it undefined}) \not \in {\cal L}~\land$
$ v~{\mathop{\longrightarrow}\limits^{\cal L}}~w = {\it ClPt\/}\xspace_{\Gamma}(P) \: ]$ `$\mathchoice{\raisebox{-.3ex}{\vbox{\hrule width5pt height2.5pt
\hrule width5pt...
...dth5pt height2.5pt
\hrule width5pt height2.5pt
\hrule width5pt height2.5pt}}}$2


 Consider the path from &movies through &Star Wars IV to the misspelled value Bruce Wilis in Figure 2. When the path is collapsed, the name property in the resulting label has the value movie.stars.name. The transaction time property is undefined. The transaction times of the first and last edges in the path are disjoint, so their intersection does not produce a valid transaction time value. Consequently the path is invalid. $\mathchoice{\raisebox{-.3ex}{\vbox{\hrule width5pt height2.5pt
\hrule width5pt...
...dth5pt height2.5pt
\hrule width5pt height2.5pt
\hrule width5pt height2.5pt}}}$2


The cost of checking path validity is ${\cal O}(n \cdot m)$, where nis the length of the path and m is the number of properties in a label. We expect that m will usually be much smaller than n. Path validity can be checked as a path is matched, as discussed next.


next up previous
Next: Path Match Up: Retrieving Information From Semistructures Previous: Retrieving Information From Semistructures

Copyright © 1998. Curtis E. Dyreson, Michael H. B&. All rights reserved.