Pears Index Routines

Main -> Documentation -> Database Builder – Pears -> Index Routines

Pears Index Routines

Contents

Introduction
[index_definition] for Index Routine
[index_definition] for Query Normalizer
Index Routine Class Diagram
Notes for the Index Routine Reference
Index Routines for Phrase Indexes
Index Routines for Keyword Indexes
Other Index Routines

Introduction

As described in the Pears System Overview, a Pears index routine is a Java class that can play two roles – index routine and term normalizer.

When creating a Pears database, an index routine creates index terms from specified field(s) in a database record.

When searching a Pears database, an index routine may also serve as a query (term) normalizer in WebZ or Database Builder's Record Builder application. A query normalizer allows you to manipulate patrons' search terms to better match specific database indexes.

This document is a comprehensive reference to the index routines shipped with SiteSearch 4.2.0. It is designed for users who wish to use these index routines when creating Pears databases or when creating WebZ or Record Builder database configuration files for Pears databases. Should you need to extend one of these index routines, see the Javadoc for more technical information.

[index_definition] for Index Routine

In a database description configuration file, an [index_definition] section defines how to create an index for a Pears database. A database description configuration file usually contains many [index_definition] sections, each with a unique name. An index definition section always includes the following variables:

Next, there are variables that define parameters for the Pears_index_routine specified in the routine variable.

Return to Contents

[index_definition] for Query Normalizer

To use an index routine as a query normalizer in WebZ or Record Builder, specify the index routine in the filter variable in its [index_definition] section in its database configuration file, like this:

Follow the filter variable with variables that contain parameters applicable to the index routine, as needed. These are usually the same parameters you specified when creating the index. See examples in Pears-Newton Index Routine Comparison, including a few exceptions to the above guideline.

Return to Contents

Index Routine Class Diagram

The following diagram shows the relationships among the index routine classes shipped with SiteSearch 4.2.0, which all belong to the ORG.oclc.pears.IndexRoutines class package. Click a class name to jump to a description of the class and its required and optional parameters. Note that:

Phrase and all classes that extend Phrase implement ORG.oclc.TermNormalizer. This allows these index routines to serve as query normalizers in the WebZ environment.
Phrase and Words (which extends Phrase) serve as the basis for most Pears index routines.

Return to Contents

Notes for the Index Routine Reference

This section provides links to the index routines shipped with SiteSearch 4.2.0, organized by function. Click a link for information about an index routine. This includes the index routine's name, the class it extends, a description of the index routine, its parameters, and any parameters it inherits. For some index routines, there may also be links to examples or other notes.

Optional and Required Parameters

All parameters are optional and apply to all databases unless otherwise indicated. When a parameter applies to a specific type of database, this information appears under its name.

Repeatable Parameters

When a parameter can appear more than once in an index definition, the word "Repeatable" appears under the parameter's name. If a parameter is repeatable, use a separate line for each instance of the parameter. Use an asterisk (*) at the end of the parameter's name (such as indexUpTo*) or number each instance sequentially (such as indexUpTo1, indexUpTo2, indexUpTo3, and so on).

Unicode Characters in Index Definitions

To include the Unicode representation of a character in a Pears or WebZ index_definition, use \unnnn, where nnnn is the character's four-digit hex Unicode equivalent. For example, use \u0020 to represent a space.

Examples

The Examples section, if present, provides links to index definitions in the Pears-Newton Index Routine Comparison. Although that document pertains specifically to using Pears index routines to replicate commonly-used Newton index routines, its examples also illustrate use of index routines more generally.

Index Routines for Phrase Indexes

Phrase
PhraseMinusBoundPhrases

PhraseWithinBoundPhrases
SimplePatterns

Index Routines for Keyword Indexes

Words
BerInteger
LCCardNumber
LCClass
MarcBibliographicLevel
MarcFormat
MarcLanguage

MarcTypeOfMaterial
Numbers
Plural Words
PublicationDate
SmartWords
WordsMinusBoundPhrases
YearRange

Some of these index routines extend Phrase directly, rather than extending Words, but we classify them as keyword routines because they create keyword indexes.

Other Index Routines

StopwordEnforcer

Return to Contents

Index Routines for Phrase Indexes

Phrase

Extends
ORG.oclc.pears.util.IndexRoutine

Description

The Phrase class is an index routine that normalizes input fields and builds lists of terms for phrase indexes. It takes fields from BER records that have been converted to appropriate UTF8-encoded (Unicode) data and converts them to index terms. It is the base index routine that most other index routines extend.

Parameters

Parameter

Description

bounds

Specifies a pair of characters that surround data within a field. These characters surround data that should be included OR should be excluded from an index term.

Examples:    bounds = ""
bounds = ()
bounds = ""()[]{}

Note:    The Phrase routine checks for this parameter, but does not
use it. It is available for routines that extend Phrase, such as PhraseMinusBoundPhrases, PhraseWithinBoundPhrases, and WordsMinusBoundPhrases.

collapse
Indicates that Phrase should remove all of the characters in the collapse list from the field when it creates an index term.

Example:    collapse = ?!,"':;&_-< >[]

debugPhrase
Turns on debugging for all processing performed by the Phrase class. Possible values are:

TRUE    Turns on debugging.

FALSE
Performs no debugging. (default)

extraIndex*

Repeatable

Indicates that another index should receive any terms extracted for this index by specifying the unique ID for another index defined for the database (as defined in the index parameter of its [index_definition] section.

Example:    extraIndex* = 20

extraTrimChars
Adds characters to the default list of trimChars for the current index only. This may be simpler than modifying the value of trimChars if you only want to add one or two more trim characters.

Example:    extraTrimChars = @^

indexAfter*

Repeatable

Contains a character or string that indicates the start of the phrase to index. Use a separate line for each character or string.

Examples:    indexAfter* = (
indexAfter* = \u0020

Note:    This is less efficient than using a bounds parameter.

indexUpTo*

Repeatable

Contains a character or string that indicates the end of the phrase to index. Use a separate line for each character or string.

Examples:    indexUpTo* = )
indexUpTo* = \u0020

Note:    This is less efficient than using a bounds parameter.

indicator1

Optional for:
MARC

Indicates that indicator1 for this field must match the specified indicator(s) for the field to be indexed.

Example:    indicator1 = 1

indicator2

Optional for:
MARC

Indicates that indicator2 for this field must match the specified indicator(s) for the field to be indexed.

Example:    indicator2 = 2

indicators

Optional for:
MARC

Requires that both indicator1 and indicator2 for this field must match the specified indicators for the field to be indexed.

Example:    indicators = 12

joinFieldsWith
Indicates the character or string to insert between subfields when creating a single index term from one or more subfields, as specified with the subfield* parameter.

Example:    joinFieldsWith = \u0020\u0020

Note:    Do not separate characters with spaces unless you want to use a space as part of the string.

maxLength
Shortens the index term to the specified number of characters. The practical limit for an index term is approximately 2000 characters.

Example:    maxLength = 80

nonFilingIndicator1
nonFilingIndicator2

Optional for:
MARC

Indicates that the value of the field's first indicator (for nonFilingIndicator1) or second indicator (for nonFilingIndicator2) determines the number of characters to remove from the beginning of the field. Possible values are:

TRUE    Use the value of nonFilingIndicator1 or nonFilingIndicator2 to determine the number of characters to remove from the beginning of the field.

FALSE
Do not use the value of nonFilingIndicator1 or nonFilingIndicator2 to determine the number of characters to remove from the beginning of the field. (default)

Example:    nonFilingIndicator1 = true

notIndicators

Optional for:
MARC

Specifies that the two indicators for this field must not be equal to (or must be different from) the specified indicator(s) for the field to be indexed.

Example:    notIndicators = 04

replace
Indicates that you want to replace characters when indexing a term. Refers to another section in the index definition with a list of characters and the replacement value for each character in the indexed term.

Example:    replace = replaceSect

replaceSect

Required:
When using replace

Section that specifies a list of characters and the replacement values to use when indexing these characters. Use with replace.

Example:    [replaceChars]
\u03B1 = alpha
\u03B2 = beta
\u03B3 = gamma

startOffset
Indicates the number of characters to ignore at the beginning of the field when creating an index term.

Note:    Phrase counts off the startOffset for a field before removing any trim characters (see trimChars and extraTrimChars) from the start or end of a field or removing any collapse characters from the field.

Example:    startOffset = 10

stripHTML
Indicates whether to remove any HTML codes from the term to be indexed, where an HTML code is a word or phrase within angle brackets (<>). Possible values are:

TRUE    Remove HTML codes from the term.

FALSE
Do not remove HTML codes from the term. (default)

subfield*

Repeatable

Indicates a subfield to include in an index term. Use to selectively include subfields from a field when you don't want to use all subfields in a field. Use in conjunction with joinFieldsWith.

Note:
If you use this parameter, remember to specify a tagpath parameter one level higher than the subfields. For example, to index subfields 1 and 2 from the MARC 245 field:

tagpath* = 245
subfield* = 1
subfield* = 2

trimChars
Indicates the characters to remove from the beginning or end of the field when creating an index term. Type the list of characters without spaces between them. If you embed a space between two other characters, a space becomes one of the trim characters.

Example:    trimChars = _ '&.,:* (default)

Notes:
To add characters only, use extraTrimChars instead. Phrase (or other index routines that extend Phrase) adds the characters in extraTrimChars to the default list of trim characters.

To modify the default list of trim characters in any other way (such as removing trim characters or both adding and removing trim characters), specify a value for trimChars that includes all trim characters for the index.

Examples
Click the links below to see how to use Phrase to emulate these Newton index routines:

combad()

ddc()

greekphrase()

phrase2()

phrbhyp()

repnum()

sgmlphrase()

ugsbjcl()

uptoparen()

Return to Index Routine List
Return to Contents

PhraseMinusBoundPhrases

Extends

Phrase

Description

PhraseMinusBoundPhrases creates an index term from a field that includes all the data in the field except for data contained within user-defined bounds characters. If the field contains more than one set of bounds characters, it removes only the data within the first set of bounds characters before creating the index term.

Parameters Unique to This Class

Parameter

Description

debugPhraseMinusBoundPhrases

Adds debugging statements generated by PhraseMinusBoundPhrases to the standard output. Possible values are:

Note:

This parameter only affects debugging statements specific to PhraseMinusBoundPhrases. To see debugging statements generated by Phrase, set debugPhrase=TRUE as well.

Other Parameters

Parameters inherited from Phrase

Notes

When using this index routine, specify the bounds characters with the bounds parameter (from Phrase) in the index definition.

Example:

bounds = ()

Examples

Click here to see how to use PhraseMinusBoundPhrases to emulate the Newton govtdoc() index routine.

Return to Index Routine List

Return to Contents

PhraseWithinBoundPhrases

Extends

Phrase

Description

PhraseWithinBoundPhrases creates an index term from the data contained within a user-defined set of bounds characters. It ignores the rest of the data in the field, including the bounds characters. If the field contains more than one set of bounds characters, it indexes only the data within the first set of bounds characters.

Parameters Unique to This Class

Parameter

Description

debugPhraseWithinBoundPhrases

Adds debugging statements generated by PhraseWithinBoundPhrases to the standard output. Possible values are:

Note:

This parameter only affects debugging statements specific to PhraseWithinBoundPhrases. To see debugging statements generated by Phrase, set debugPhrase=TRUE as well.

Other Parameters

Parameters inherited from Phrase

Notes

When using this index routine, specify the bounds characters with the bounds parameter (from Phrase) in the index definition.

Example:

bounds = ()

Return to Index Routine List

Return to Contents

SimplePatterns

Extends

Phrase

Description

SimplePatterns creates index terms from words that match a pattern defined in index definition. For example, if the pattern is "isbn*9", it only generates index terms for that fields that begin with "isbn" and end with "9." Therefore, it would create an index term for "isbn077821278909", but not for "077821278909" or "isbn077821278905".

Parameters Unique to This Class

Parameter

Description

debugSimplePatterns

Adds debugging statements generated by SimplePatterns to the standard output. Possible values are:

Note:

This parameter only affects debugging statements specific to SimplePatterns. To see debugging statements generated by Phrase, set debugPhrase=TRUE as well.

pattern

Contains a pattern that the field must contain for SimplePatterns to generate an index term from a word. The pattern can include the wildcard characters "*" and "?".

Example:

pattern = isbn*9

maxWordLength

Specifies the maximum number of characters that a word can contain to be indexed. SimplePatterns ignores any words that exceed the maxWordLength value.

Example:

maxWordLength = 15

Other Parameters

Parameters inherited from Phrase

Return to Index Routine List

Return to Contents

Index Routines for Keyword Indexes

Words

Extends

Phrase

Description

Words is a commonly-used index routine that extracts and stores individual terms from fields in a record. Words is the basis for a number of other keyword index routines.

Parameters Unique to This Class

Parameter

Description

debugWords

Adds debugging statements generated by Words to the standard output. Possible values are:

Note:

This parameter only affects debugging statements specific to Words. To see debugging statements generated by Phrase, set debugPhrase = TRUE as well.

delimiters

List of characters that separate words in a field. Used to determine where one word ends and the next word starts.

Default:

delimiters = \t\n\r+-=<>(){}[]:;/\\\"!?",

Note:

Use extraDelimiters to specify additional delimiters for a specific index. Use removeDelimiters to specify characters to remove from the default list for a specific index. Otherwise, the list of characters in this must contain all delimiters applicable to a specific index.

extraDelimiters

Indicates character(s) to add to the default list of delimiters for a specific index.

Example:

extraDelimiters = .

maxWordLength

Defines the maximum length for a word used as an index term. The Words routine truncates a word with more characters that the maxWordLength value. The default is to index words of any length.

maxWords

Specifies the maximum number of words to index in a field. The Words routine ignores other words in the field once the number of words indexed reaches the maxWords value. The default is to index all words in a field.

minWordLength

Specifies the minimum number of characters that a word must contain to be indexed. The default is to index words of any length.

removeDelimiters

Indicates character(s) to remove from the default list of delimiters for a specific index.

Example:

removeDelimiters = "

Other Parameters

Parameters inherited from Phrase

Examples

Click the links below to see how to use Words to emulate these Newton index routines:

Return to Index Routine List

Return to Contents

BerInteger

Extends

ORG.oclc.pears.util.IndexRoutine

Description

BerInteger allows you to index numeric fields that contain integer values. For example, a database may contain OCLC numbers stored as integers rather than character strings. BerInteger creates index terms from the integer values.

Parameters

Parameter

Description

debugBerInteger

Adds debugging statements generated by BerInteger to the standard output. Possible values are:

Return to Index Routine List

Return to Contents

LCCardNumber

Extends

Phrase

Description

LCCardNumber creates an index term for a Library of Congress Control Number (LCCN). It creates a 12-character index term, which does not contain optional suffixes that begin in character position 13 of the LCCN.

Parameters Unique to This Class

Parameter

Description

debugLCCardNumber

Adds debugging statements generated by LCCardNumber to the standard output. Possible values are:

Note:

This parameter only affects debugging statements specific to LCCardNumber. To see debugging statements generated by Phrase, set debugPhrase=TRUE as well.

Other Parameters

Parameters inherited from Phrase

Return to Index Routine List

Return to Contents

LCClass

Extends

Phrase

Description

LCClass constructs a standard Library of Congress (LC) classification number from the data in a field, in the format AAA1111.222.a333, where:

If a field has an LC number in the format AAA1111, LCClass does not add append any additional numerals or alphabetic characters when creating the index term.

If a field has an LC number with an a333 section, it must also contain the 222 section.

Input /output examples

Input		Output
PR5398		pr_5398
PN1992.8.S35		pn_1992.800.s35
PS3573.I456213		ps_3573.000.i456

Parameters Unique to This Class

Parameter

Description

debugLCClass

Adds debugging statements generated by LCClass to the standard output. Possible values are:

Note:

This parameter only affects debugging statements specific to LCClass. To see debugging statements generated by Phrase, set debugPhrase=TRUE as well.

Other Parameters

Parameters inherited from Phrase

Examples

Click here to see how to use LCClass to emulate the Newton lcclass() index routine.

Notes

Even though LCClass creates keyword indexes, it extends Phrase rather than Words so that segments of a LC number separated by spaces are not processed as individual index terms.

Return to Index Routine List

Return to Contents

MarcBibliographicLevel

Extends

Words

Description

Generates an index term for a record based on the value of the bibliographic level byte (07) in the leader of a MARC record, as follows:

For this value ...	the index term created is ...	which refers to this bibliographic level...
a	analytic	analytic monograph
b	analytic	analytic serial
m	monograph	monograph
s	serial	serial
c	collection	collection
d	subunit	subunit

Parameters Unique to This Class

Parameter

Description

debugMarcBibliographicLevel

Adds debugging statements generated by MarcBibliographicLevel to the standard output. Possible values are:

Note:

This parameter only affects debugging statements specific to MarcBibliographicLevel To see debugging statements generated by Words and/or Phrase, set debugWords=TRUE and/or debugPhrase=TRUE as well.

Other Parameters

Parameters inherited from Words
Parameters inherited from Phrase

Return to Index Routine List

Return to Contents

MarcFormat

Extends

Words

Description

Generates an index term for a record based on the values of the type of record (06) and bibliographic level (07) bytes in the leader of a MARC record, as follows:

For this record type(s) ...	and this bibliographic level(s) ...	the index term created is ...	which refers to ...
a, t	m,c,a,d	bks	books
e, f	any	map	maps
p,b	any	mix	mixed materials
m	any	com	computer files
c, d	any	sco	scores
any	s,b	ser	serials
i, j	any	rec	sound recordings
g, k, o, r	any	vis	visual materials

Parameters Unique to This Class

Parameter

Description

debugMarcFormat

Adds debugging statements generated by MarcFormat to the standard output. Possible values are:

Note:

This parameter only affects debugging statements specific to MarcFormat. To see debugging statements generated by Words and/or Phrase, set debugWords=TRUE and/or debugPhrase=TRUE as well.

Other Parameters

Parameters inherited from Words
Parameters inherited from Phrase

Return to Index Routine List

Return to Contents

MarcLanguage

Extends

Words

Description

MarcLanguage generates an index term for a record based on the value of the three-character language code in the 008 field of a MARC record. For example, for the language code "fre", it generates the index term "french".

See MARC Code List for Languages for a list of the language codes.

Parameters Unique to This Class

Parameter

Description

debugMarcLanguage

Adds debugging statements generated by MarcLanguage to the standard output. Possible values are:

Note:

This parameter only affects debugging statements specific to MarcLanguage. To see debugging statements generated by Words and/or Phrase, set debugWords=TRUE and/or debugPhrase=TRUE as well.

Other Parameters

Parameters inherited from Words
Parameters inherited from Phrase

Examples

Click here to see how to use MarcLanguage to emulate the Newton marcla() index routine.

Return to Index Routine List

Return to Contents

MarcTypeOfMaterial

Extends

Words

Description

Generates an index term for the record based on the value of the type of record byte (06) in the leader of a MARC record, as follows:

For this code ...	the index term created is ...	which refers to this type of material...
a, t	bks	books
e, f	map	maps
p	mix	mixed materials
m	com	computer files
c, d	sco	scores
s	ser	serials
i, j	rec	sound recordings
g, k, o, r	vis	visual materials

Note:

Set tagpath = 0 when using this index routine.

Parameters Unique to This Class

Parameter

Description

debugMarcBibliographicLevel

Adds debugging statements generated by MarcTypeOfMaterial to the standard output. Possible values are:

Note:

This parameter only affects debugging statements specific to MarcTypeOfMaterial. To see debugging statements generated by Words and/or Phrase, set debugWords=TRUE and/or debugPhrase=TRUE as well.

Other Parameters

Parameters inherited from Words
Parameters inherited from Phrase

Return to Index Routine List

Return to Contents

Numbers

Extends

Words

Description

Numbers removes all but numeric characters (0 through 9) from a field and creates an index term from the remaining numeral.

Parameters Unique to This Class

Parameter

Description

debugNumbers

Adds debugging statements generated by Numbers to the standard output. Possible values are:

Note:

This parameter only affects debugging statements specific to Numbers. To see debugging statements generated by Words and/or Phrase, set debugWords=TRUE and/or debugPhrase=TRUE as well.

zeropad

Specifies a minimum term length. If a generated term has fewer digits than the number specified in the zeropad parameter, Numbers adds enough zeros to the left of the term to reach the required length.

Example:

zeropad = 2

Other Parameters

Parameters inherited from Words
Parameters inherited from Phrase

Examples

Click the links below to see how to use Numbers to emulate these Newton index routines:

Return to Index Routine List

Return to Contents

PluralWords

Extends

Words

Description

Removes plural endings ('s', 'es', or 'ies') from words before indexing them. By subsequently using this class as a query normalizer in WebZ, your patrons can enter plural forms of a search term and receive results that contain either the singular or plural forms of the term.

Parameters Unique to This Class

Parameter

Description

debugPluralWords

Adds debugging statements generated by PluralWords to the standard output. Possible values are:

Notes:		(1)This parameter only affects debugging statements specific to PluralWords. To see debugging statements generated by Words and/or Phrase, set debugWords=TRUE and/or debugPhrase = TRUE as well.
		(2) Consider the type of field you are indexing when deciding whether to use this index. For example, it is not appropriate for a field that contains names. It removes plural endings that need to remain in the index term for a search to succeed, which results in changing the index term, such as "Jones" to "Jon".

Other Parameters

Parameters inherited from Words
Parameters inherited from Phrase

Notes

For all indexes you generate with this index routine, remember to specify PluralWords as the query normalizer in the [index_definition] sections of their WebZ database configuration file as follows:

Return to Index Routine List

Return to Contents

PublicationDate

Extends

Words

Description

PublicationDate removes non-numeric characters from a date field, such as the MARC 008 fixed field, and creates an index term for the remaining numeral.

Parameters Unique to This Class

Parameter

Description

debugPublicationDate

Adds debugging statements generated by PublicationDate to the standard output. Possible values are:

Note:

This parameter only affects debugging statements specific to PublicationDate. To see debugging statements generated by Words and/or Phrase, set debugWords=TRUE and/or debugPhrase=TRUE as well.

Other Parameters

Parameters inherited from Words

Example

Click here to see how to use PublicationDate to emulate the Newton pubdate() index routine.

Return to Index Routine List

Return to Contents

SmartWords

Extends

PluralWords

Description

SmartWords is a refinement of PluralWords that ensures that words contain at least two characters (after removing plural endings) before creating index terms for them.

Parameters Unique to This Class

Parameter

Description

debugSmartWords

Adds debugging statements generated by SmartWords to the standard output. Possible values are:

Note:

This parameter only affects debugging statements specific to SmartWords. To see debugging statements generated by PluralWords, Words and/or Phrase, set debugPluralWords=TRUE, debugWords=TRUE and/or debugPhrase=TRUE as well.

Other Parameters

Parameters inherited from PluralWords
Parameters inherited from Words
Parameters inherited from Phrase

Return to Index Routine List

Return to Contents

WordsMinusBoundPhrases

Extends

Words

Description

WordsMinusBoundPhrases indexes all the words in a field except for the words between user-defined bound characters, such as the left and right parentheses.

Parameters Unique to This Class

Parameter

Description

debugWordsMinusBoundPhrases

Adds debugging statements generated by WordsMinusBoundPhrases to the standard output. Possible values are:

Note:

This parameter only affects debugging statements specific to WordsMinusBoundPhrase. To see debugging statements generated by Words and/or Phrase, set debugWords=TRUE and/or debugPhrase=TRUE as well.

Other Parameters

Parameters inherited from Words
Parameters inherited from Phrase

Notes

When using this index routine, specify the bounds characters with the bounds parameter (from Phrase) in the index definition.

Example:

bounds = ()

Example

Click here to see how to use WordsMinusBoundPhrases to emulate the Newton isbn() index routine.

Return to Index Routine List

Return to Contents

YearRange

Extends

PublicationDate

Description

YearRange creates a series of numeric index terms based on two numbers within a field. The numbers are the earliest and latest dates in a date range. Optionally, YearRange only creates these terms if the field also contains character(s) specified in the mustContain parameter. For example, if the field contains the date range "1962-1966", YearRange creates the index terms "1962", "1963", "1964", "1965", and "1966". YearRange always right-pads the numbers with enough zeros to create a four-digits number.

Note:

YearRange may yield unpredictable results if both numbers do not contain four digits.

For example, if the field contains the date range "1962-66", it pads "66" so it becomes "6600". Instead of creating index terms for all the years between 1962 and 6600, it uses the maxTerms parameter to determine the maximum number of terms to create for any date range. If maxTerms = 50, YearRange creates index terms "1962", "1962", "1963", "1964", "1965", "1966", "1967", "1968" ... "2010", "2011". Here, the year range doesn't really reflect the year range specified in the field.

Parameters Unique to This Class

Parameter

Description

debugYearRange

Adds debugging statements generated by YearRange to the standard output. Possible values are:

Note:

This parameter only affects debugging statements specific to the YearRange. To see debugging statements generated by PublicationDate, Words and/or Phrase, set debugPublicationDate=TRUE, debugWords=TRUE and/or debugPhrase=TRUE as well.

mustContain

Designates a character(s) that the field must contain for YearRange to generate index terms over the date range specified in a field. A typical value is a hyphen.

Example:

mustContain = -

maxTerms

Designates the maximum number of index terms created for any date range.

Example:

maxTerms = 100 (default)

Other Parameters

Parameters inherited from PublicationDate
Parameters inherited from Words
Parameters inherited from Phrase

Examples

Click here to see how to use YearRange to emulate the Newton numrang() index routine.

Return to Index Routine List

Return to Contents

Other Index Routines

StopwordEnforcer

Extends

ORG.oclc.pears.util.IndexRoutine

Description

StopwordEnforcer removes stopwords from keyword indexes. Stopwords are words that you want to exclude from an index because they occur too frequently or because they are not significant as search terms. Typical stopwords include words such as "a", "and", "an", "the", and "but".

For efficiency, StopwordEnforcer removes stopwords from all keyword indexes in one pass after index terms for all the indexes defined in a database description configuration file have been created.

Parameters

Parameter

Description

stopword*

Repeatable

Specifies a global stopword that you want to exclude from all indexes.

Notes

In the [index_definition] section for global stopwords, use these values for the tagpath and index parameters:
index = 0		"0" indicates that no index should be created from this index definition.
tagpath = none		"none" indicates that this index definition should not be executed until all indexes have been created.

Return to Index Routine List

Return to Contents

[Main][Documentation][Support][Technical Reference][Community][Glossary][Search]

Last Modified: