|
Pears Index
Routines
Contents
Introduction
[index_definition] for Index
Routine
[index_definition] for Query Normalizer
Index Routine Class Diagram
Notes for the Index Routine Reference
Index Routines for Phrase Indexes
Index Routines for Keyword Indexes
Other Index Routines
Introduction
As
described in the Pears System Overview,
a Pears index routine is a Java class that can play two roles index
routine and term normalizer.
When creating
a Pears database, an index routine creates index terms from specified
field(s) in a database record.
When searching
a Pears database, an index routine may also serve as a query (term)
normalizer in WebZ or Database Builder's Record
Builder application. A query normalizer allows you to manipulate patrons'
search terms to better match specific database indexes.
This document
is a comprehensive reference to the index routines shipped with SiteSearch
4.2.0. It is designed for users who wish to use these index routines when
creating Pears databases or when creating
WebZ or Record Builder database configuration files for Pears databases.
Should you need to extend one of these index routines, see the Javadoc
for more technical information.
[index_definition]
for Index Routine
In a database
description configuration file, an [index_definition]
section defines how to create an index for a Pears database. A database
description configuration file usually contains many [index_definition]
sections, each with a unique name. An index definition section always
includes the following variables:
index = index_id
routine = Pears_index_routine
tagpath* = BER_tagpath OR none
...
Next, there are
variables that define parameters for the Pears_index_routine specified
in the routine variable.
Return
to Contents
[index_definition]
for Query Normalizer
To use an index
routine as a query normalizer in WebZ or Record Builder, specify the index
routine in the filter variable in its [index_definition]
section in its database configuration
file, like this:
filter = ORG.oclc.pears.IndexRoutines.Words
Follow
the filter variable with variables that contain parameters applicable
to the index routine, as needed. These are usually the same parameters
you specified when creating the index. See examples in Pears-Newton
Index Routine Comparison, including a few exceptions to the above
guideline.
Return
to Contents
Index
Routine Class Diagram
The following
diagram shows the relationships among the index routine classes shipped
with SiteSearch 4.2.0, which all belong to the ORG.oclc.pears.IndexRoutines
class package. Click a class name to jump to a description of the class
and its required and optional parameters. Note that:
- Phrase and
all classes that extend Phrase implement ORG.oclc.TermNormalizer. This
allows these index routines to serve as query normalizers in the WebZ
environment.
- Phrase and
Words (which extends Phrase) serve as the basis for most Pears index
routines.
Return
to Contents
Notes
for the Index Routine Reference
This section provides
links to the index routines shipped with SiteSearch 4.2.0, organized by
function. Click a link for information about an index routine. This includes
the index routine's name, the class it extends, a description of the index
routine, its parameters, and any parameters it inherits. For some index
routines, there may also be links to examples or other notes.
Optional and
Required Parameters
All parameters
are optional and apply to all databases unless otherwise indicated. When
a parameter applies to a specific type of database, this information appears
under its name.
Repeatable
Parameters
When a parameter
can appear more than once in an index definition, the word "Repeatable"
appears under the parameter's name. If a parameter is repeatable, use
a separate line for each instance of the parameter. Use an asterisk (*)
at the end of the parameter's name (such as indexUpTo*) or number each
instance sequentially (such as indexUpTo1, indexUpTo2, indexUpTo3, and
so on).
Unicode Characters
in Index Definitions
To include the
Unicode representation of a character in a Pears or WebZ index_definition,
use \unnnn, where nnnn is the character's four-digit hex
Unicode equivalent. For example, use \u0020 to represent a space.
Examples
The Examples section,
if present, provides links to index definitions in the Pears-Newton
Index Routine Comparison. Although that document pertains specifically
to using Pears index routines to replicate commonly-used Newton index
routines, its examples also illustrate use of index routines more generally.
Index Routines for Phrase
Indexes
Index Routines for Keyword
Indexes
Some of these
index routines extend Phrase directly, rather than
extending Words, but we classify them as keyword
routines because they create keyword indexes.
Other Index Routines
Return
to Contents
Index
Routines for Phrase Indexes
Phrase
|
Extends |
|
ORG.oclc.pears.util.IndexRoutine
|
Description
|
The Phrase
class is an index routine that normalizes input fields and builds
lists of terms for phrase indexes. It takes fields from BER records
that have been converted to appropriate UTF8-encoded (Unicode)
data and converts them to index terms. It is the base index routine
that most other index routines extend.
|
Parameters
|
Parameter
|
Description
|
bounds
|
Specifies
a pair of characters that surround data within a field. These
characters surround data that should be included OR should
be excluded from an index term.
Examples: |
|
bounds = ""
bounds = ()
bounds = ""()[]{} |
|
collapse |
Indicates
that Phrase should remove all of the characters in the collapse
list from the field when it creates an index term.
Example: |
|
collapse
= ?!,"':;&_-< >[] |
|
|
debugPhrase |
Turns
on debugging for all processing performed by the Phrase class.
Possible values are:
TRUE |
|
Turns
on debugging. |
FALSE |
|
Performs
no debugging. (default)
|
|
|
Indicates
that another index should receive any terms extracted for
this index by specifying the unique ID for another index defined
for the database (as defined in the index
parameter of its [index_definition] section.
Example: |
|
extraIndex* = 20 |
|
extraTrimChars |
Adds
characters to the default list of trimChars for the current
index only. This may be simpler than modifying the value of
trimChars if you only want to add
one or two more trim characters.
Example: |
|
extraTrimChars = @^ |
|
|
Contains
a character or string that indicates the start of the phrase
to index. Use a separate line for each character or string.
Examples: |
|
indexAfter* = (
indexAfter* = \u0020 |
Note: |
|
This
is less efficient than using a bounds
parameter. |
|
|
Contains
a character or string that indicates the end of the phrase
to index. Use a separate line for each character or string.
Examples: |
|
indexUpTo* = )
indexUpTo* = \u0020 |
Note: |
|
This
is less efficient than using a bounds
parameter. |
|
indicator1 |
Optional
for:
MARC |
|
Indicates
that indicator1 for this field must match the specified indicator(s)
for the field to be indexed.
|
indicator2 |
Optional
for:
MARC |
|
Indicates
that indicator2 for this field must match the specified indicator(s)
for the field to be indexed.
|
indicators |
Optional
for:
MARC |
|
Requires
that both indicator1 and indicator2 for this field must match
the specified indicators for the field to be indexed.
|
joinFieldsWith |
Indicates
the character or string to insert between subfields when creating
a single index term from one or more subfields, as specified
with the subfield* parameter.
Example: |
|
joinFieldsWith = \u0020\u0020
|
Note: |
|
Do
not separate characters with spaces unless you want to
use a space as part of the string. |
|
maxLength |
Shortens
the index term to the specified number of characters. The
practical limit for an index term is approximately 2000 characters.
|
nonFilingIndicator1
nonFilingIndicator2 |
Optional
for:
MARC |
|
Indicates
that the value of the field's first indicator (for nonFilingIndicator1)
or second indicator (for nonFilingIndicator2) determines the
number of characters to remove from the beginning of the field.
Possible values are:
TRUE |
|
Use
the value of nonFilingIndicator1 or nonFilingIndicator2
to determine the number of characters to remove from
the beginning of the field. |
FALSE |
|
Do
not use the value of nonFilingIndicator1 or nonFilingIndicator2
to determine the number of characters to remove from
the beginning of the field. (default)
|
Example: |
|
nonFilingIndicator1 = true |
|
notIndicators |
Optional
for:
MARC |
|
Specifies
that the two indicators for this field must not be equal to
(or must be different from) the specified indicator(s) for
the field to be indexed.
Example: |
|
notIndicators = 04 |
|
replace |
Indicates
that you want to replace characters when indexing a term.
Refers to another section in the index definition with a list
of characters and the replacement value for each character
in the indexed term.
Example: |
|
replace = replaceSect |
|
replaceSect |
Required:
When using replace |
|
Section
that specifies a list of characters and the replacement values
to use when indexing these characters. Use with replace.
Example: |
|
[replaceChars]
\u03B1 = alpha
\u03B2 = beta
\u03B3 = gamma |
|
startOffset |
Indicates
the number of characters to ignore at the beginning of the
field when creating an index term.
Example: |
|
startOffset
= 10 |
|
stripHTML |
Indicates
whether to remove any HTML codes from the term to be indexed,
where an HTML code is a word or phrase within angle brackets
(<>). Possible values are:
TRUE |
|
Remove
HTML codes from the term. |
FALSE |
|
Do
not remove HTML codes from the term. (default)
|
|
|
Indicates
a subfield to include in an index term. Use to selectively
include subfields from a field when you don't want to use
all subfields in a field. Use in conjunction with joinFieldsWith.
Note: |
|
If
you use this parameter, remember to specify a tagpath
parameter one level higher than the subfields. For example,
to index subfields 1 and 2 from the MARC 245 field:
tagpath* = 245
subfield* = 1
subfield* = 2
|
|
trimChars |
Indicates
the characters to remove from the beginning or end of the
field when creating an index term. Type the list of characters
without spaces between them. If you embed a space between
two other characters, a space becomes one of the trim characters.
Example: |
|
trimChars = _
'&.,:* (default) |
Notes: |
|
To
add characters only, use extraTrimChars
instead. Phrase (or other index routines that extend
Phrase) adds the characters in extraTrimChars to the
default list of trim characters.
To
modify the default list of trim characters in any other
way (such as removing trim characters or both adding
and removing trim characters), specify a value for trimChars
that includes all trim characters for the index.
|
|
|
Examples
|
Click
the links below to see how to use Phrase to emulate these Newton index
routines: |
|
combad()
ddc()
greekphrase()
phrase2()
phrbhyp()
|
|
PhraseMinusBoundPhrases
|
Extends |
|
Phrase
|
Description
|
PhraseMinusBoundPhrases
creates an index term from a field that includes all the data in the
field except for data contained within user-defined bounds characters.
If the field contains more than one set of bounds characters, it removes
only the data within the first set of bounds characters before creating
the index term.
|
Parameters
Unique to This Class
|
Parameter
|
Description
|
debugPhraseMinusBoundPhrases |
Adds
debugging statements generated by PhraseMinusBoundPhrases
to the standard output. Possible values are:
TRUE |
|
Turn
on debugging. |
FALSE |
|
Do
not turn on debugging. (default)
|
Note: |
|
This
parameter only affects debugging statements specific to
PhraseMinusBoundPhrases. To see debugging statements generated
by Phrase, set debugPhrase=TRUE
as well. |
|
|
Other
Parameters
|
Parameters
inherited from Phrase
|
Notes
|
When using
this index routine, specify the bounds characters with the bounds
parameter (from Phrase) in the index definition.
|
Examples
|
Click here
to see how to use PhraseMinusBoundPhrases to emulate the Newton govtdoc()
index routine. |
PhraseWithinBoundPhrases
|
Extends |
|
Phrase
|
Description
|
PhraseWithinBoundPhrases
creates an index term from the data contained within a user-defined
set of bounds characters. It ignores the rest of the data in the field,
including the bounds characters. If the field contains more than one
set of bounds characters, it indexes only the data within the first
set of bounds characters.
|
Parameters
Unique to This Class
|
Parameter
|
Description
|
debugPhraseWithinBoundPhrases |
Adds
debugging statements generated by PhraseWithinBoundPhrases
to the standard output. Possible values are:
TRUE |
|
Turn
on debugging. |
FALSE |
|
Do
not turn on debugging. (default)
|
Note: |
|
This
parameter only affects debugging statements specific to
PhraseWithinBoundPhrases. To see debugging statements
generated by Phrase, set debugPhrase=TRUE
as well. |
|
|
Other
Parameters
|
Parameters
inherited from Phrase
|
Notes |
When using
this index routine, specify the bounds characters with the bounds
parameter (from Phrase) in the index definition.
|
SimplePatterns
|
Extends |
|
Phrase
|
Description
|
SimplePatterns
creates index terms from words that match a pattern defined in index
definition. For example, if the pattern is "isbn*9", it
only generates index terms for that fields that begin with "isbn"
and end with "9." Therefore, it would create an index term
for "isbn077821278909", but not for "077821278909"
or "isbn077821278905".
|
Parameters
Unique to This Class
|
Parameter
|
Description
|
debugSimplePatterns |
Adds
debugging statements generated by SimplePatterns to the standard
output. Possible values are:
TRUE |
|
Turn
on debugging. |
FALSE |
|
Do
not turn on debugging. (default)
|
Note: |
|
This
parameter only affects debugging statements specific to
SimplePatterns. To see debugging statements generated
by Phrase, set debugPhrase=TRUE
as well. |
|
pattern |
Contains
a pattern that the field must contain for SimplePatterns to
generate an index term from a word. The pattern can include
the wildcard characters "*" and "?".
Example: |
|
pattern = isbn*9 |
|
maxWordLength |
Specifies
the maximum number of characters that a word can contain to
be indexed. SimplePatterns ignores any words that exceed the
maxWordLength value.
Example: |
|
maxWordLength =
15 |
|
|
Other
Parameters
|
Parameters
inherited from Phrase
|
Index
Routines for Keyword Indexes
Words
|
Extends |
|
Phrase
|
Description
|
Words
is a commonly-used index routine that extracts and stores individual
terms from fields in a record. Words is the basis for a number of
other keyword index routines.
|
Parameters
Unique to This Class
|
Parameter
|
Description
|
debugWords |
Adds
debugging statements generated by Words to the standard output.
Possible values are:
TRUE |
|
Turn
on debugging. |
FALSE |
|
Do
not turn on debugging. (default)
|
Note: |
|
This
parameter only affects debugging statements specific to
Words. To see debugging statements generated by Phrase,
set debugPhrase = TRUE as well. |
|
delimiters
|
List
of characters that separate words in a field. Used to determine
where one word ends and the next word starts.
Default: |
delimiters = \t\n\r+-=<>(){}[]:;/\\\"!?", |
|
Note: |
|
Use
extraDelimiters to specify
additional delimiters for a specific index. Use removeDelimiters
to specify characters to remove from the default list
for a specific index. Otherwise, the list of characters
in this must contain all delimiters applicable to a specific
index. |
|
extraDelimiters |
Indicates
character(s) to add to the default list of delimiters
for a specific index.
Example: |
extraDelimiters = . |
|
|
maxWordLength
|
Defines
the maximum length for a word used as an index term. The Words
routine truncates a word with more characters that the maxWordLength
value. The default is to index words of any length.
|
maxWords
|
Specifies
the maximum number of words to index in a field. The Words
routine ignores other words in the field once the number of
words indexed reaches the maxWords value. The default is to
index all words in a field.
|
minWordLength
|
Specifies
the minimum number of characters that a word must contain
to be indexed. The default is to index words of any length.
|
removeDelimiters |
Indicates
character(s) to remove from the default list of delimiters
for a specific index.
Example: |
removeDelimiters = " |
|
|
|
Other
Parameters
|
Parameters
inherited from Phrase
|
Examples
|
Click
the links below to see how to use Words to emulate these Newton index
routines: |
|
|
|
BerInteger
|
Extends |
|
ORG.oclc.pears.util.IndexRoutine
|
Description
|
BerInteger
allows you to index numeric fields that contain integer values.
For example, a database may contain OCLC numbers stored as integers
rather than character strings. BerInteger creates index terms from
the integer values.
|
Parameters
|
Parameter
|
Description
|
debugBerInteger |
Adds
debugging statements generated by BerInteger to the standard
output. Possible values are:
TRUE |
|
Turn
on debugging. |
FALSE |
|
Do
not turn on debugging. (default)
|
|
|
LCCardNumber
|
Extends |
|
Phrase
|
Description
|
LCCardNumber
creates an index term for a Library of Congress Control Number (LCCN).
It creates a 12-character index term, which does not contain optional
suffixes that begin in character position 13 of the LCCN.
|
Parameters
Unique to This Class
|
Parameter
|
Description
|
debugLCCardNumber |
Adds
debugging statements generated by LCCardNumber to the standard
output. Possible values are:
TRUE |
|
Turn
on debugging. |
FALSE |
|
Do
not turn on debugging. (default)
|
Note: |
|
This
parameter only affects debugging statements specific to
LCCardNumber. To see debugging statements generated by
Phrase, set debugPhrase=TRUE
as well. |
|
|
Other
Parameters
|
Parameters
inherited from Phrase
|
LCClass
|
Extends |
|
Phrase
|
Description
|
LCClass
constructs a standard Library of Congress (LC) classification number
from the data in a field, in the format AAA1111.222.a333, where:
AAA |
|
represents one to three alphabetic characters, padded on the
right with underscores (_). All characters are converted to
lower case. |
1111 |
represents
four numerals, padded on the left with zeros (0). |
222 |
represents
three numerals, padded on the right with zeros (0). |
a |
represents
a single alphabetic character. |
333 |
represents
one to three numerals, with no padding. This section is truncated
to three numerals if has over three numbers in the data. |
If a
field has an LC number in the format AAA1111, LCClass does not
add append any additional numerals or alphabetic characters when
creating the index term.
If a
field has an LC number with an a333 section, it must also contain
the 222 section.
Input
/output examples
Input
|
|
Output
|
PR5398 |
pr_5398 |
PN1992.8.S35 |
pn_1992.800.s35 |
PS3573.I456213 |
ps_3573.000.i456 |
|
Parameters
Unique to This Class
|
Parameter
|
Description
|
debugLCClass |
Adds
debugging statements generated by LCClass to the standard
output. Possible values are:
TRUE |
|
Turn
on debugging. |
FALSE |
|
Do
not turn on debugging. (default)
|
Note: |
|
This
parameter only affects debugging statements specific to
LCClass. To see debugging statements generated by Phrase,
set debugPhrase=TRUE as well. |
|
|
Other
Parameters
|
Parameters
inherited from Phrase
|
Examples
|
Click
here to see how to use LCClass to emulate the Newton lcclass()
index routine.
|
Notes
|
Even though
LCClass creates keyword indexes, it extends Phrase
rather than Words so that segments of a LC number
separated by spaces are not processed as individual index terms. |
MarcBibliographicLevel
|
Extends |
|
Words
|
Description
|
Generates
an index term for a record based on the value of the bibliographic
level byte (07) in the leader of a MARC record, as follows:
For
this
value ... |
|
the
index term
created is ... |
|
which
refers to this bibliographic level... |
a
|
analytic |
analytic
monograph |
b
|
analytic |
analytic
serial |
m
|
monograph |
monograph |
s
|
serial |
serial |
c
|
collection |
collection
|
d
|
subunit |
subunit |
|
Parameters
Unique to This Class
|
Parameter
|
Description
|
debugMarcBibliographicLevel |
Adds
debugging statements generated by MarcBibliographicLevel to
the standard output. Possible values are:
TRUE |
|
Turn
on debugging. |
FALSE |
|
Do
not turn on debugging. (default)
|
Note: |
|
This
parameter only affects debugging statements specific to
MarcBibliographicLevel To see debugging statements generated
by Words and/or Phrase, set debugWords=TRUE
and/or debugPhrase=TRUE as
well. |
|
|
Other
Parameters
|
Parameters
inherited from Words
Parameters inherited from Phrase
|
MarcFormat
|
Extends |
|
Words
|
Description
|
Generates
an index term for a record based on the values of the type of record
(06) and bibliographic level (07) bytes in the leader of a MARC
record, as follows:
For
this
record type(s) ... |
|
and
this bibliographic level(s) ... |
|
the
index term created is ... |
|
which
refers to ...
|
a, t |
m,c,a,d |
bks |
books |
e,
f
|
any
|
map |
maps |
p,b
|
any
|
mix |
mixed
materials |
m
|
any
|
com |
computer
files |
c,
d
|
any
|
sco |
scores |
any
|
s,b
|
ser |
serials |
i,
j
|
any
|
rec |
sound
recordings |
g,
k, o, r
|
any
|
vis |
visual
materials |
|
Parameters
Unique to This Class
|
Parameter
|
Description
|
debugMarcFormat |
Adds
debugging statements generated by MarcFormat to the standard
output. Possible values are:
TRUE |
|
Turn
on debugging. |
FALSE |
|
Do
not turn on debugging. (default)
|
Note: |
|
This
parameter only affects debugging statements specific to
MarcFormat. To see debugging statements generated by Words
and/or Phrase, set debugWords=TRUE
and/or debugPhrase=TRUE as
well. |
|
|
Other
Parameters
|
Parameters
inherited from Words
Parameters inherited from Phrase
|
MarcLanguage
|
Extends |
|
Words
|
Description
|
MarcLanguage
generates an index term for a record based on the value of the three-character
language code in the 008 field of a MARC record. For example, for
the language code "fre", it generates the index term "french".
See MARC
Code List for Languages for a list of the language codes.
|
Parameters
Unique to This Class
|
Parameter
|
Description
|
debugMarcLanguage |
Adds
debugging statements generated by MarcLanguage to the standard
output. Possible values are:
TRUE |
|
Turn
on debugging. |
FALSE |
|
Do
not turn on debugging. (default)
|
Note: |
|
This
parameter only affects debugging statements specific to
MarcLanguage. To see debugging statements generated by
Words and/or Phrase, set debugWords=TRUE
and/or debugPhrase=TRUE as
well. |
|
|
Other
Parameters
|
Parameters
inherited from Words
Parameters inherited from Phrase
|
Examples
|
Click
here to see how to use MarcLanguage to emulate the Newton marcla()
index routine. |
MarcTypeOfMaterial
|
Extends |
|
Words
|
Description
|
Generates
an index term for the record based on the value of the type of record
byte (06) in the leader of a MARC record, as follows:
For
this
code ... |
|
the
index term
created is ... |
|
which
refers to this type of material... |
a, t |
bks |
books |
e,
f
|
map
|
maps |
p
|
mix
|
mixed
materials |
m
|
com
|
computer
files |
c,
d
|
sco
|
scores
|
s
|
ser
|
serials |
i,
j
|
rec
|
sound
recordings |
g,
k, o, r
|
vis
|
visual
materials |
Note: |
|
Set
tagpath = 0 when using this index routine. |
|
Parameters
Unique to This Class
|
Parameter
|
Description
|
debugMarcBibliographicLevel |
Adds
debugging statements generated by MarcTypeOfMaterial to the
standard output. Possible values are:
TRUE |
|
Turn
on debugging. |
FALSE |
|
Do
not turn on debugging. (default)
|
Note: |
|
This
parameter only affects debugging statements specific to
MarcTypeOfMaterial. To see debugging statements generated
by Words and/or Phrase, set debugWords=TRUE
and/or debugPhrase=TRUE as
well. |
|
|
Other
Parameters
|
Parameters
inherited from Words
Parameters inherited from Phrase
|
Numbers
|
Extends |
|
Words
|
Description
|
Numbers removes
all but numeric characters (0 through 9) from a field and creates
an index term from the remaining numeral.
|
Parameters
Unique to This Class
|
Parameter
|
Description
|
debugNumbers |
Adds
debugging statements generated by Numbers to the standard
output. Possible values are:
TRUE |
|
Turn
on debugging. |
FALSE |
|
Do
not turn on debugging. (default)
|
Note: |
|
This
parameter only affects debugging statements specific to
Numbers. To see debugging statements generated by Words
and/or Phrase, set debugWords=TRUE
and/or debugPhrase=TRUE as
well. |
|
zeropad |
Specifies
a minimum term length. If a generated term has fewer digits
than the number specified in the zeropad parameter, Numbers
adds enough zeros to the left of the term to reach the required
length.
|
|
Other
Parameters
|
Parameters
inherited from Words
Parameters inherited from Phrase
|
Examples
|
Click the
links below to see how to use Numbers to emulate these Newton index
routines:
|
PluralWords
|
Extends |
|
Words
|
Description
|
Removes plural
endings ('s', 'es', or 'ies') from words before indexing them. By
subsequently using this class as a query normalizer in WebZ, your
patrons can enter plural forms of a search term and receive results
that contain either the singular or plural forms of the term.
|
Parameters
Unique to This Class
|
Parameter
|
Description
|
debugPluralWords |
Adds
debugging statements generated by PluralWords to the standard
output. Possible values are:
TRUE |
|
Turn
on debugging. |
FALSE |
|
Do
not turn on debugging. (Default)
|
Notes: |
|
(1)This
parameter only affects debugging statements specific to
PluralWords. To see debugging statements generated by
Words and/or Phrase, set debugWords=TRUE
and/or debugPhrase = TRUE as
well.
|
(2)
Consider the type of field you are indexing when deciding
whether to use this index. For example, it is not appropriate
for a field that contains names. It removes plural endings
that need to remain in the index term for a search to
succeed, which results in changing the index term, such
as "Jones" to "Jon". |
|
|
Other
Parameters
|
Parameters
inherited from Words
Parameters inherited from Phrase
|
Notes
|
For all
indexes you generate with this index routine, remember to specify
PluralWords as the query normalizer in the [index_definition]
sections of their WebZ database configuration file as follows:
filter = ORG.oclc.pears.IndexRoutines.PluralWords
|
PublicationDate
|
Extends |
|
Words
|
Description
|
PublicationDate
removes non-numeric characters from a date field, such as the MARC
008 fixed field, and creates an index term for the remaining numeral.
|
Parameters
Unique to This Class
|
Parameter
|
Description
|
debugPublicationDate |
Adds
debugging statements generated by PublicationDate to the standard
output. Possible values are:
TRUE |
|
Turn
on debugging. |
FALSE |
|
Do
not turn on debugging. (Default)
|
Note: |
|
This
parameter only affects debugging statements specific to
PublicationDate. To see debugging statements generated
by Words and/or Phrase, set debugWords=TRUE
and/or debugPhrase=TRUE as
well. |
|
|
Other
Parameters
|
Parameters
inherited from Words
|
Example
|
Click
here to see how to use PublicationDate to emulate the Newton pubdate()
index routine. |
WordsMinusBoundPhrases
|
Extends |
|
Words
|
Description
|
WordsMinusBoundPhrases
indexes all the words in a field except for the words between user-defined
bound characters, such as the left and right parentheses.
|
Parameters
Unique to This Class
|
Parameter
|
Description
|
debugWordsMinusBoundPhrases |
Adds
debugging statements generated by WordsMinusBoundPhrases to
the standard output. Possible values are:
TRUE |
|
Turn
on debugging. |
FALSE |
|
Do
not turn on debugging. (Default)
|
Note: |
|
This
parameter only affects debugging statements specific to
WordsMinusBoundPhrase. To see debugging statements generated
by Words and/or Phrase, set debugWords=TRUE
and/or debugPhrase=TRUE as
well. |
|
|
Other
Parameters
|
Parameters
inherited from Words
Parameters inherited from Phrase
|
Notes
|
When using
this index routine, specify the bounds characters with the bounds
parameter (from Phrase) in the index definition.
|
Example |
Click
here to see how to use WordsMinusBoundPhrases to emulate the Newton
isbn() index routine. |
YearRange
|
Extends |
|
PublicationDate
|
Description
|
YearRange
creates a series of numeric index terms based on two numbers within
a field. The numbers are the earliest and latest dates in a date
range. Optionally, YearRange only creates these terms if the field
also contains character(s) specified in the mustContain parameter.
For example, if the field contains the date range "1962-1966",
YearRange creates the index terms "1962", "1963",
"1964", "1965", and "1966". YearRange
always right-pads the numbers with enough zeros to create a four-digits
number.
Note: |
|
YearRange
may yield unpredictable results if both numbers do not contain
four digits.
For
example, if the field contains the date range "1962-66",
it pads "66" so it becomes "6600". Instead
of creating index terms for all the years between 1962 and
6600, it uses the maxTerms parameter to determine the maximum
number of terms to create for any date range. If maxTerms
= 50, YearRange creates index terms "1962", "1962",
"1963", "1964", "1965", "1966",
"1967", "1968" ... "2010", "2011".
Here, the year range doesn't really reflect the year range
specified in the field.
|
|
Parameters
Unique to This Class
|
Parameter
|
Description
|
debugYearRange |
Adds
debugging statements generated by YearRange to the standard
output. Possible values are:
TRUE |
|
Turn
on debugging. |
FALSE |
|
Do
not turn on debugging. (Default)
|
|
mustContain |
Designates
a character(s) that the field must contain for YearRange to
generate index terms over the date range specified in a field.
A typical value is a hyphen.
|
maxTerms |
Designates
the maximum number of index terms created for any date range.
Example: |
|
maxTerms = 100
(default)
|
|
|
Other
Parameters
|
Parameters
inherited from PublicationDate
Parameters inherited from Words
Parameters inherited from Phrase
|
Examples
|
Click
here to see how to use YearRange to emulate the Newton numrang()
index routine. |
Other
Index Routines
StopwordEnforcer
|
Extends |
|
ORG.oclc.pears.util.IndexRoutine
|
Description
|
StopwordEnforcer
removes stopwords from keyword indexes. Stopwords are words that
you want to exclude from an index because they occur too frequently
or because they are not significant as search terms. Typical stopwords
include words such as "a", "and", "an",
"the", and "but".
For efficiency,
StopwordEnforcer removes stopwords from all keyword indexes in one
pass after index terms for all the indexes defined in a database
description configuration file have been created.
|
Parameters
|
Parameter
|
Description
|
|
Specifies
a global stopword that you want to exclude from all indexes. |
|
Notes
|
In
the [index_definition]
section for global stopwords, use these values for the tagpath
and index parameters:
|
index = 0
|
|
"0"
indicates that no index should be created from this index definition. |
tagpath = none
|
"none"
indicates that this index definition should not be executed
until all indexes have been created. |
|
See Also
Pears
System Overview
Pears-Newton Index Routine Comparison
Pears Database Description Configuration
File
Creating a New Pears Database
Converting a Newton Database to a Pears Database
|