jumping spider logo
Jumping Spider Overview
An intranet index and search engine
Curtis Dyreson
  Contact me

A search engine can index a concept that appears entirely on a single page. But concepts can span several pages. For instance, a page on trees may be linked to a page on lecture notes for a data structures course. If the trees page does not specifically mention lecture notes, then a search engine search for lecture notes on trees will, at best, only partially match each page.

Our goal is to develop a strategy to index concepts that span more than one page. Our strategy assumes that a multi-page concept is created by a concept-path, consisting of some number of hyperlinks, that transits through pages with specific content. For instance, there must be a concept-path from the lecture notes page to the trees page to create the lecture notes on trees concept.

The key to indexing multi-page concepts is to find the right concept-paths. The paths must be relatively few (certainly much fewer than the overall number of paths in the World-wide Web) or the cost of the index will be too great. At the same time, the paths must be easily identified, so that they are capable of being automatically computed and indexed quickly. Finally, the paths must be viable, in the sense that they really do connect multi-page concepts.

At this site, you can explore a system called a Jumping Spider, to index content-paths. The Jumping Spider uses a search engine to find the starting point in a content-path and then jumps to other pages with the desired content in the content-path. It precomputes and stores the possible jumps to attain reasonable query efficiency.


Curtis E. Dyreson © 1997-2001. All rights reserved.
  E-mail questions or comments to Curtis.Dyreson at usu.edu