AUCQL logo
AUCQL's Future Plans
A Query Language for Semistructured Data with Metadata Properties
  The DB
Curtis Dyreson
  Contact me

AUCQL is only a prototype, and an incomplete prototype as well. Many features have yet to be implemented.
  • Full SELECT. Only a limited part of SQL's SELECT statement has been implemented. GROUP BY, HAVING, and ORDERING clauses have not been implemented. SQL types and type checking have not been implemented. WHERE clause expressions have been simplified, as have SELECT clause expressions. Nested SELECTs have also not been implemented.

    In some of the author's opinion (OK only Curtis's :-)) a full SELECT clause implementation may not even be desirable for semistructured databases. The end user will be a WWW user, not a database programmer. For WWW users search will most likely be the most important part of a semistructured query language. GUI or search-engine kinds of query language are preferable to SQL. Nor is SQL desirable for hard-core programmers. An object-oriented API with hooks into AUCQL's algebraic operations would be better.

    So complete implementation of SQL's SELECT will be put on hold.

  • Path indexes. We have yet to address the issue of building path indexes in property space.
  • Culling intermediate results. There is one problematic query for the current prototype of AUCQL. The query is quite short.
       SELECT *
       FROM   ()* All,
    This selects everything in the database and figures out the TT for it. This is a problem because all the paths to every node must be computed and then retained for the coalescing. In other operations, the paths are computed, used, and then discarded (so only one set of variable assignments is ever in memory). But coalescing needs to know about all the paths between a pair of nodes, so all the paths must be retained. So coalescing is an expensive, but necessary operation in semistructured databases, just as it is in relational databases.
  • Name space confusion. This is more of a language issue. What is a variable vs. what is a required NAME property? Consider the following expressions.
      FROM movie  Movie,
    Variable Review1 is independent of variable Movie, but Review2 extends the path in variable Movie. There is also some From clause ordering constraints. The variable Movie must be defined before it can be used in the FROM clause (the compiler won't complain, it will happily generate null values for that variable). Review3 is also independent of Movie, but is problematic because the user has no warning that they mistyped/misused a variable name (the 'Move' defaults to a match on the NAME property, (NAME! Move)). Finally, the compiler will barf on Nodes, since Nodes is now a reserved word (the NODES operation). But this is confusing because of the changing case sensitivity/insensitivity. By using explicit MATCH operations, no such trouble exists, but in order to be more like Lorel, we had to do some fancy syntactic sugar, so there are namespace confusions.

    It is not clear to us how to best resolve this design issue.

  • Parenthesis nightmare. We should have used a {} or square brackets rather than () as the enclosing delimiters for descriptors. The reason why is that () is massively overloaded, in expressions, in regular expressions, and now in descriptors. So a single-token lookahead parser has no way of disambiguating the many uses of (). We currently do a horrible hack, err, a quite sensible munging of the input stream (we make one pass over the input stream prior to parsing to transform the descriptor delimiters to {}). This should be sanitized.
  • Aggregates. The code and syntax are in place for aggregates, just haven't debugged it yet. For some reason, a semantic error is generated.
  • SELECT DISTINCT. Should be simple to add (hash the output lines).
  • GROUP BY, HAVING, ORDERING. After aggregates and SELECT DISTINCT are implemented...
  • Sanity checking on dimensions. The issue here is that some dimensions, e.g., time and security, need special operations, e.g., OVERLAPS. A few of these are built-in to AUCQL, but semantic checking to ensure that operations are only used properly are not currently in place. So if you want to you can do '12 OVERLAPS 14' and most likely the run-time evaluator will die a horrible death.
  • Testing. No extensive, rigorous testing has been done (that's what the WWW interface is for!), so some stuff just may be buggy. We'll fix it if you find it. Just e-mail Curtis.Dyreson at
  • FIXED Nov. 23 1998. Nested operations. Currently, you can't do the following, which you should be able to do.
    Instead you have to use an intermediate variable.
       MATCH(...) X
       MATCH(X, ...
    On the TODO list.
  • FIXED Nov. 23 1998. Cycles. We assume that the input is an acyclic semistructure. Cycles will cause a problem (of non-termination) when doing certain Kleene closure matching, but are OK for all other operations. Cycles can be broken by marking visited nodes to prevent revisiting, which we plan to add in future.
  • PARTIALLY FIXED Nov. 23 1998. Type checking. Only limited type checking is performed. More advanced type checking is needed to prevent brain-dead code like the following.
           MATCH(TT, ...
    Some types are checked, just not all currently.

Curtis E. Dyreson, Michael H. Böhlen, and Christian S. Jensen © 1998-2000. All rights reserved.
  E-mail questions or comments to Curtis.Dyreson at Valid HTML 4.01!