The Zebra system is designed to support a wide range of data management
applications. The system can be configured to handle virtually any
kind of structured data. Each record in the system is associated with
a record schema which lends context to the data
elements of the record.
Any number of record schemas can coexist in the system.
Although it may be wise to use only a single schema within
one database, the system poses no such restrictions.
Records pass through three different states during processing in the
system.
When records are accessed by the system, they are represented
in their local, or native format. This might be SGML or HTML files,
News or Mail archives, MARC records. If the system doesn't already
know how to read the type of data you need to store, you can set up an
input filter by preparing conversion rules based on regular
expressions and possibly augmented by a flexible scripting language
(Tcl).
The input filter produces as output an internal representation,
a tree structure.
When records are processed by the system, they are represented
in a tree-structure, constructed by tagged data elements hanging off a
root node. The tagged elements may contain data or yet more tagged
elements in a recursive structure. The system performs various
actions on this tree structure (indexing, element selection, schema
mapping, etc.),
Before transmitting records to the client, they are first
converted from the internal structure to a form suitable for exchange
over the network - according to the Z39.50 standard.
<title>
<var lang lang "eng">
Zen and the Art of Motorcycle Maintenance</>
<var lang lang "dan">
Zen og Kunsten at Vedligeholde en Motorcykel</>
</title>
Variant elements are terminated by the general end-tag </>, by
the variant end-tag </var>, by the appearance of another variant
tag with the same class and
value settings, or by the
appearance of another, normal tag. In other words, the end-tags for
the variants used in the example above could have been omitted.
Variant elements can be nested. The element
<title>
<var lang lang "eng"><var body iana "text/plain">
Zen and the Art of Motorcycle Maintenance
</title>
Associates two variant components to the variant list for the title
element.
Given the nesting rules described above, we could write
<title>
<var body iana "text/plain>
<var lang lang "eng">
Zen and the Art of Motorcycle Maintenance
<var lang lang "dan">
Zen og Kunsten at Vedligeholde en Motorcykel
</title>
The title element above comes in two variants. Both have the IANA body
type "text/plain", but one is in English, and the other in
Danish. The client, using the element selection mechanism of Z39.50,
can retrieve information about the available variant forms of data
elements, or it can select specific variants based on the requirements
of the end-user.
BEGIN { begin record wais }
/^From:/ BODY /$/ { data -element name $1 }
/^Subject:/ BODY /$/ { data -element title $1 }
/^Date:/ BODY /$/ { data -element lastModified $1 }
/\n\n/ BODY END {
begin element bodyOfDisplay
begin variant body iana "text/plain"
data -text $1
end record
}
If Zebra is compiled with support for Tcl (Tool Command Language)
enabled, the statements described above are supplemented with a complete
scripting environment, including control structures (conditional
expressions and loop constructs), and powerful string manipulation
mechanisms for modifying the elements of a record. Tcl is a popular
scripting environment, with several tutorials available both online
and in hardcopy.