The code is availabe in both Java and Perl.
How to use the cube
There are three phases to using an incomplete data cube.
The code that you download is already configured to
build an example cube (the README file in the release package
will provide more details).
Currently, the incomplete data cube is designed to obtain data from
a text file, the example cube parses an HTTPD access log.
Specifications for units and measures in three dimension (Time,
Machines, and Pages) are also provided.
- The dimensions of the cube must be constructed.
A dimension consists of units and measures.
The data cube implementor (DCI) writes one text file
to describe each dimension. These specifications are
then parsed and (internally) stored as graphs and tables
of names within the cube store.
- The DCI edits a list of cubette specifications.
These specifications are then parsed and (internally) stored
within the cube store. During the parsing, flex (flex
is a lexical analyser generator) source is produced to sieve data
from a text file and populate the incomplete data cube.
The flex source is compiled and input data is passed
through the resulting lexical analyser to populate the cube.
(Only flex was up to the task of handling
the hundreds of thousands of regular expressions that are
potentially produced during construction of the data sieve.
It is unfortunate because this introduces a non-Java and non-Perl
dependency, hence the code for the sieve is not 100% pure Java or Perl).
- Queries can now be made on the populated cube either
through a GUI or by calling the appropriate methods directly.
Below we outline the current state of the implementation.
While the cube is currently functional, much remains to be done.
Unimplemented features (wish list)
- unlimited number of dimensions
- unlimited number of measures (e.g., days, years, countries)
- unlimited number of units (e.g., '1 October 1997', 1995, Australia)
- insertion of new units/measures
- Unicode for all names of measures, units, etc. (Java version only)
- any user-specified measures and units (none are "built-in")
- input data parsed from a text file
- data is "filtered" through cubette specifications
- data cube data, units, and measures stored in generic database,
for speed the database is currently configured to Unix gdbm files
- query satisfaction algorithm
- sum queries
- GUI for query engine
- incomplete data cube store configured to use DB API (e.g., JDBC)
- input data to be retrieved from database using DB API (e.g., JDBC)
- GUI to modify cubette store
- deletion/renaming of units/measures
- min - max queries
- upper - lower bounds on query results
- completeness measures for query results
- suggestion of alternate answers
- development environment is UNIX (SunOS5) - untested in other environments
(but, hey it is almost 100% Java so there should few portability issues!)
- flex - sigh, needed to construct data sieve
- JavaLex/JavaCup - only if you would like to change how the
specification files are parsed
- Jigsaw/jdbm - the Java implementation of gdbm is a (small) part of Jigsaw
Curtis E. Dyreson
© 1995-2001. All rights reserved.