Specifies indexing for record nodes given by
xpath. Unlike directive
elm, this directive allows you to index attribute
contents. The xpath uses
a syntax similar to XPath. The attributes
have same syntax and meaning as directive elm, except that !
refers to the nodes selected by xpath.
encoding encodingname
This directive specifies character encoding for external records.
For records such as XML that specifies encoding within the
file via a header this directive is ignored.
If neither this directive is given, nor an encoding is set
within external records, ISO-8859-1 encoding is assumed.
xpath enable/disable
If this directive is followed by enable,
then extra indexing is performed to allow for XPath-like queries.
If this directive is not specified - equivalent to
disable - no extra XPath-indexing is performed.
Note:
The mechanism for controlling indexing is not adequate for
complex databases, and will probably be moved into a separate
configuration table eventually.
The following is an excerpt from the abstract syntax file for the GILS
profile.
name gils
reference GILS-schema
attset gils.att
tagset gils.tag
varset var1.var
maptab gils-usmarc.map
# Element set names
esetname VARIANT gils-variant.est # for WAIS-compliance
esetname B gils-b.est
esetname G gils-g.est
esetname F @
elm (1,10) rank -
elm (1,12) url -
elm (1,14) localControlNumber Local-number
elm (1,16) dateOfLastModification Date/time-last-modified
elm (2,1) title w:!,p:!
elm (4,1) controlIdentifier Identifier-standard
elm (2,6) abstract Abstract
elm (4,51) purpose !
elm (4,52) originator -
elm (4,53) accessConstraints !
elm (4,54) useConstraints !
elm (4,70) availability -
elm (4,70)/(4,90) distributor -
elm (4,70)/(4,90)/(2,7) distributorName !
elm (4,70)/(4,90)/(2,10 distributorOrganization !
elm (4,70)/(4,90)/(4,2) distributorStreetAddress !
elm (4,70)/(4,90)/(4,3) distributorCity !
name gils
reference GILS-attset
include bib1.att
att 2001 distributorName
att 2002 indextermsControlled
att 2003 purpose
att 2004 accessConstraints
att 2005 useConstraints
name tagsetg
reference TagsetG
type 2
tag 1 title string
tag 2 author string
tag 3 publicationPlace string
tag 4 publicationDate string
tag 5 documentId string
tag 6 abstract string
tag 7 name string
tag 8 date generalizedtime
tag 9 bodyOfDisplay string
tag 10 organization string
name variant-1
reference Variant-1
class 1 variantId
type 1 variantId octetstring
class 2 body
type 1 iana string
type 2 z39.50 string
type 3 other string
This directive introduces a
sort index. The argument is a one-character code to be used in the
.abs fie to select this particular index type. The corresponding
use attribute must be used in the sort request to refer to this
particular sort index. The corresponding character map (see below)
is used in the sort process.
completeness boolean
This directive enables or disables complete field indexing.
The value of the boolean should be 0
(disable) or 1. If completeness is enabled, the index entry will
contain the complete contents of the field (up to a limit), with words
(non-space characters) separated by single space characters
(normalized to " " on display). When completeness is
disabled, each word is indexed as a separate entry. Complete subfield
indexing is most useful for fields which are typically browsed (eg.
titles, authors, or subjects), or instances where a match on a
complete subfield is essential (eg. exact title searching). For fields
where completeness is disabled, the search engine will interpret a
search containing space characters as a word proximity search.
charmap filename
This is the filename of the character
map to be used for this index for field type.
The contents of the character map files are structured as follows:
lowercase value-set
This directive introduces the basic value set of the field type.
The format is an ordered list (without spaces) of the
characters which may occur in "words" of the given type.
The order of the entries in the list determines the
sort order of the index. In addition to single characters, the
following combinations are legal:
Backslashes may be used to introduce three-digit octal, or
two-digit hex representations of single characters
(preceded by x).
In addition, the combinations
\\, \\r, \\n, \\t, \\s (space — remember that real
space-characters may not occur in the value definition), and
\\ are recognized, with their usual interpretation.
Curly braces {} may be used to enclose ranges of single
characters (possibly using the escape convention described in the
preceding point), eg. {a-z} to introduce the
standard range of ASCII characters.
Note that the interpretation of such a range depends on
the concrete representation in your local, physical character set.
paranthesises () may be used to enclose multi-byte characters -
eg. diacritics or special national combinations (eg. Spanish
"ll"). When found in the input stream (or a search term),
these characters are viewed and sorted as a single character, with a
sorting value depending on the position of the group in the value
statement.
uppercase value-set
This directive introduces the
upper-case equivalencis to the value set (if any). The number and
order of the entries in the list should be the same as in the
lowercase directive.
space value-set
This directive introduces the character
which separate words in the input stream. Depending on the
completeness mode of the field in question, these characters either
terminate an index entry, or delimit individual "words" in
the input stream. The order of the elements is not significant —
otherwise the representation is the same as for the
uppercase and lowercase
directives.
map value-settarget
This directive introduces a
mapping between each of the members of the value-set on the left to
the character on the right. The character on the right must occur in
the value set (the lowercase directive) of
the character set, but
it may be a paranthesis-enclosed multi-octet character. This directive
may be used to map diacritics to their base characters, or to map
HTML-style character-representations to their natural form, etc.