Since RPN or reverse polish notation is really just a fancy way of
describing a suffix notation format (operator follows operands), it
would seem that the confusion is total when we now introduce a prefix
notation for RPN. The reason is one of simple laziness - it's somewhat
simpler to interpret a prefix format, and this utility was designed
for maximum simplicity, to provide a baseline representation for use
in simple test applications and scripting environments (like Tcl). The
demonstration client included with YAZ uses the PQF.
The PQF is defined by the pquery module in the YAZ library.
There are two sets of function that have similar behavior. First
set operates on a PQF parser handle, second set doesn't. First set
set of functions are more flexible than the second set. Second set
is obsolete and is only provided to ensure backwards compatibility.
First set of functions all operate on a PQF parser handle:
#include <yaz/pquery.h>
YAZ_PQF_Parser yaz_pqf_create (void);
void yaz_pqf_destroy (YAZ_PQF_Parser p);
Z_RPNQuery *yaz_pqf_parse (YAZ_PQF_Parser p, ODR o, const char *qbuf);
Z_AttributesPlusTerm *yaz_pqf_scan (YAZ_PQF_Parser p, ODR o,
Odr_oid **attributeSetId, const char *qbuf);
int yaz_pqf_error (YAZ_PQF_Parser p, const char **msg, size_t *off);
|
A PQF parser is created and destructed by functions
yaz_pqf_create and
yaz_pqf_destroy respectively.
Function yaz_pqf_parse parses query given
by string qbuf. If parsing was successful,
a Z39.50 RPN Query is returned which is created using ODR stream
o. If parsing failed, a NULL pointer is
returned.
Function yaz_pqf_scan takes a scan query in
qbuf. If parsing was successful, the function
returns attributes plus term pointer and modifies
attributeSetId to hold attribute set for the
scan request - both allocated using ODR stream o.
If parsing failed, yaz_pqf_scan returns a NULL pointer.
Error information for bad queries can be obtained by a call to
yaz_pqf_error which returns an error code and
modifies *msg to point to an error description,
and modifies *off to the offset within last
query were parsing failed.
The second set of functions are declared as follows:
#include <yaz/pquery.h>
Z_RPNQuery *p_query_rpn (ODR o, oid_proto proto, const char *qbuf);
Z_AttributesPlusTerm *p_query_scan (ODR o, oid_proto proto,
Odr_oid **attributeSetP, const char *qbuf);
int p_query_attset (const char *arg);
|
The function p_query_rpn() takes as arguments an
ODR stream (see section The ODR Module)
to provide a memory source (the structure created is released on
the next call to odr_reset() on the stream), a
protocol identifier (one of the constants PROTO_Z3950 and
PROTO_SR), an attribute set reference, and
finally a null-terminated string holding the query string.
If the parse went well, p_query_rpn() returns a
pointer to a Z_RPNQuery structure which can be
placed directly into a Z_SearchRequest.
If parsing failed, due to syntax error, a NULL pointer is returned.
The p_query_attset specifies which attribute set
to use if the query doesn't specify one by the
@attrset operator.
The p_query_attset returns 0 if the argument is a
valid attribute set specifier; otherwise the function returns -1.
The grammar of the PQF is as follows:
query ::= top-set query-struct.
top-set ::= [ '@attrset' string ]
query-struct ::= attr-spec | simple | complex | '@term' term-type
attr-spec ::= '@attr' [ string ] string query-struct
complex ::= operator query-struct query-struct.
operator ::= '@and' | '@or' | '@not' | '@prox' proximity.
simple ::= result-set | term.
result-set ::= '@set' string.
term ::= string.
proximity ::= exclusion distance ordered relation which-code unit-code.
exclusion ::= '1' | '0' | 'void'.
distance ::= integer.
ordered ::= '1' | '0'.
relation ::= integer.
which-code ::= 'known' | 'private' | integer.
unit-code ::= integer.
term-type ::= 'general' | 'numeric' | 'string' | 'oid' | 'datetime' | 'null'.
You will note that the syntax above is a fairly faithful
representation of RPN, except for the Attribute, which has been
moved a step away from the term, allowing you to associate one or more
attributes with an entire query structure. The parser will
automatically apply the given attributes to each term as required.
The @attr operator is followed by an attribute specification
(attr-spec above). The specification consists
of optional an attribute set, an attribute type-value pair and
a sub query. The attribute type-value pair is packed in one string:
an attribute type, a dash, followed by an attribute value.
The type is always an integer but the value may be either an
integer or a string (if it doesn't start with a digit character).
Z39.50 version 3 defines various encoding of terms.
Use the @term operator to indicate the encoding type:
general, numeric,
string (for InternationalString), ..
If no term type has been given, the general form
is used which is the only encoding allowed in both version 2 - and 3
of the Z39.50 standard.
The following are all examples of valid queries in the PQF.
dylan
"bob dylan"
@or "dylan" "zimmerman"
@set Result-1
@or @and bob dylan @set Result-1
@attr 1=4 computer
@attr 4=1 @and @attr 1=1 "bob dylan" @attr 1=4 "slow train coming"
@attr 4=1 @attr 1=4 "self portrait"
@prox 0 3 1 2 k 2 dylan zimmerman
@and @attr 2=4 @attr gils 1=2038 -114 @attr 2=2 @attr gils 1=2039 -109
@term string "a UTF-8 string, maybe?"
@attr 1=/book/title computer
|
Not all users enjoy typing in prefix query structures and numerical
attribute values, even in a minimalistic test client. In the library
world, the more intuitive Common Command Language (or ISO 8777) has
enjoyed some popularity - especially before the widespread
availability of graphical interfaces. It is still useful in
applications where you for some reason or other need to provide a
symbolic language for expressing boolean query structures.
The EUROPAGATE
research project working under the Libraries programme
of the European Commission's DG XIII has, amongst other useful tools,
implemented a general-purpose CCL parser which produces an output
structure that can be trivially converted to the internal RPN
representation of YAZ (The Z_RPNQuery structure).
Since the CCL utility - along with the rest of the software
produced by EUROPAGATE - is made freely available on a liberal
license, it is included as a supplement to YAZ.
The CCL parser obeys the following grammar for the FIND argument.
The syntax is annotated by in the lines prefixed by
‐‐.
CCL-Find ::= CCL-Find Op Elements
| Elements.
Op ::= "and" | "or" | "not"
-- The above means that Elements are separated by boolean operators.
Elements ::= '(' CCL-Find ')'
| Set
| Terms
| Qualifiers Relation Terms
| Qualifiers Relation '(' CCL-Find ')'
| Qualifiers '=' string '-' string
-- Elements is either a recursive definition, a result set reference, a
-- list of terms, qualifiers followed by terms, qualifiers followed
-- by a recursive definition or qualifiers in a range (lower - upper).
Set ::= 'set' = string
-- Reference to a result set
Terms ::= Terms Prox Term
| Term
-- Proximity of terms.
Term ::= Term string
| string
-- This basically means that a term may include a blank
Qualifiers ::= Qualifiers ',' string
| string
-- Qualifiers is a list of strings separated by comma
Relation ::= '=' | '>=' | '<=' | '<>' | '>' | '<'
-- Relational operators. This really doesn't follow the ISO8777
-- standard.
Prox ::= '%' | '!'
-- Proximity operator
|
The following queries are all valid:
dylan
"bob dylan"
dylan or zimmerman
set=1
(dylan and bob) or set=1
|
Assuming that the qualifiers ti, au
and date are defined we may use:
ti=self portrait
au=(bob dylan and slow train coming)
date>1980 and (ti=((self portrait)))
|
Qualifiers are used to direct the search to a particular searchable
index, such as title (ti) and author indexes (au). The CCL standard
itself doesn't specify a particular set of qualifiers, but it does
suggest a few short-hand notations. You can customize the CCL parser
to support a particular set of qualifiers to reflect the current target
profile. Traditionally, a qualifier would map to a particular
use-attribute within the BIB-1 attribute set. However, you could also
define qualifiers that would set, for example, the
structure-attribute.
Consider a scenario where the target support ranked searches in the
title-index. In this case, the user could specify
and the ranked would map to relation=relevance
(2=102) and the ti would map to title (1=4).
A "profile" with a set predefined CCL qualifiers can be read from a
file. The YAZ client reads its CCL qualifiers from a file named
default.bib. Each line in the file has the form:
qualifier-name
type=val
type=val ...
where qualifier-name is the name of the
qualifier to be used (eg. ti),
type is a BIB-1 category type and
val is the corresponding BIB-1 attribute
value.
The type can be either numeric or it may be
either u (use), r (relation),
p (position), s (structure),
t (truncation) or c (completeness).
The qualifier-name term
has a special meaning.
The types and values for this definition is used when
no qualifiers are present.
Consider the following definition:
ti u=4 s=1
au u=1 s=1
term s=105
|
Two qualifiers are defined, ti and
au.
They both set the structure-attribute to phrase (1).
ti
sets the use-attribute to 4. au sets the
use-attribute to 1.
When no qualifiers are used in the query the structure-attribute is
set to free-form-text (105).