A Benchmark for XPath Evaluation
The fast-growing use
of XML increases the need for efficient, flexible query languages
specifically designed for XML. There are several query languages for XML
data collections. Examples include XML-QL,
LOREL, XSL,
XQL, and XQuery.
Among these query languages, XQuery is a W3C recommendation, and is
likely to become the most widely used, just like SQL in the field of
database languages. An important component in many XML query languages,
especially those promulgated by the W3C, is XPath. XPath is a language for
addressing locations in an XML document, and was developed in part by the XML Query and XSL working groups. In
addition to being used in XQuery, XPath is also a core component in XSL
Transformations (XSLT) and XPointer. To date, XQuery is still being developed as a working draft and
most of the products that claim XQuery support are in the early stage of
development, so we decided to build a benchmark for XPath as a number of XPath query
engines already exist any have full support.
At this time, there's still no commonly agreed
standard of XML application scenarios. So we reports on a generic
benchmark. The benchmark focuses on measuring the cost of query
processing. XPath queries are evaluated against a tree-like data model.
Queries typically traverse part of the tree-like data model. The
efficiency of the tree-traversal has a major impact on the cost of query
processing. The tree can vary in depth, density, size, and the kind of
information in each node. We designed an XML document generator which
generates XML documents that conform to several factors which control the
shape and size of the tree. By varying only one of the control factors
(e.g., tree depth) and keeping the other factors constant the benchmark is
able to isolate the impact of that factor on query performance. The
benchmark also includes a suite of query templates that can be
instantiated to produce a set of benchmark queries. Overall, the benchmark
is designed to assess the impact of trees of different sizes and shapes on
query performance. This will help query engine developers understand and
evaluate implementation alternatives, and also help users to decide which
query engine best fits their needs.
|