Main -> Documentation -> Database Builder – Pears -> Index Routines

Pears Index Routines

Contents

Introduction
[index_definition] for Index Routine
[index_definition] for Query Normalizer
Index Routine Class Diagram
Notes for the Index Routine Reference
Index Routines for Phrase Indexes
Index Routines for Keyword Indexes
Other Index Routines


Introduction

As described in the Pears System Overview, a Pears index routine is a Java class that can play two roles – index routine and term normalizer.

When creating a Pears database, an index routine creates index terms from specified field(s) in a database record.

When searching a Pears database, an index routine may also serve as a query (term) normalizer in WebZ or Database Builder's Record Builder application. A query normalizer allows you to manipulate patrons' search terms to better match specific database indexes.

This document is a comprehensive reference to the index routines shipped with SiteSearch 4.2.0. It is designed for users who wish to use these index routines when creating Pears databases or when creating WebZ or Record Builder database configuration files for Pears databases. Should you need to extend one of these index routines, see the Javadoc for more technical information.


[index_definition] for Index Routine

In a database description configuration file, an [index_definition] section defines how to create an index for a Pears database. A database description configuration file usually contains many [index_definition] sections, each with a unique name. An index definition section always includes the following variables:

index = index_id
routine = Pears_index_routine
tagpath* = BER_tagpath OR none
...

Next, there are variables that define parameters for the Pears_index_routine specified in the routine variable.

Return to Contents

[index_definition] for Query Normalizer

To use an index routine as a query normalizer in WebZ or Record Builder, specify the index routine in the filter variable in its [index_definition] section in its database configuration file, like this:

filter = ORG.oclc.pears.IndexRoutines.Words

Follow the filter variable with variables that contain parameters applicable to the index routine, as needed. These are usually the same parameters you specified when creating the index. See examples in Pears-Newton Index Routine Comparison, including a few exceptions to the above guideline.

Return to Contents


Index Routine Class Diagram

The following diagram shows the relationships among the index routine classes shipped with SiteSearch 4.2.0, which all belong to the ORG.oclc.pears.IndexRoutines class package. Click a class name to jump to a description of the class and its required and optional parameters. Note that:

  • Phrase and all classes that extend Phrase implement ORG.oclc.TermNormalizer. This allows these index routines to serve as query normalizers in the WebZ environment.
  • Phrase and Words (which extends Phrase) serve as the basis for most Pears index routines.

Return to Contents


Notes for the Index Routine Reference

This section provides links to the index routines shipped with SiteSearch 4.2.0, organized by function. Click a link for information about an index routine. This includes the index routine's name, the class it extends, a description of the index routine, its parameters, and any parameters it inherits. For some index routines, there may also be links to examples or other notes.

Optional and Required Parameters

All parameters are optional and apply to all databases unless otherwise indicated. When a parameter applies to a specific type of database, this information appears under its name.

Repeatable Parameters

When a parameter can appear more than once in an index definition, the word "Repeatable" appears under the parameter's name. If a parameter is repeatable, use a separate line for each instance of the parameter. Use an asterisk (*) at the end of the parameter's name (such as indexUpTo*) or number each instance sequentially (such as indexUpTo1, indexUpTo2, indexUpTo3, and so on).

Unicode Characters in Index Definitions

To include the Unicode representation of a character in a Pears or WebZ index_definition, use \unnnn, where nnnn is the character's four-digit hex Unicode equivalent. For example, use \u0020 to represent a space.

Examples

The Examples section, if present, provides links to index definitions in the Pears-Newton Index Routine Comparison. Although that document pertains specifically to using Pears index routines to replicate commonly-used Newton index routines, its examples also illustrate use of index routines more generally.

Index Routines for Phrase Indexes

Index Routines for Keyword Indexes

Some of these index routines extend Phrase directly, rather than extending Words, but we classify them as keyword routines because they create keyword indexes.

Other Index Routines

Return to Contents


Index Routines for Phrase Indexes

Phrase
 
Extends  

ORG.oclc.pears.util.IndexRoutine
 

Description

The Phrase class is an index routine that normalizes input fields and builds lists of terms for phrase indexes. It takes fields from BER records that have been converted to appropriate UTF8-encoded (Unicode) data and converts them to index terms. It is the base index routine that most other index routines extend.
 

Parameters

Parameter

Description

bounds

Specifies a pair of characters that surround data within a field. These characters surround data that should be included OR should be excluded from an index term.

Examples:    bounds = ""
bounds = ()
bounds = ""()[]{}

Note:    The Phrase routine checks for this parameter, but does not
use it. It is available for routines that extend Phrase, such as PhraseMinusBoundPhrases, PhraseWithinBoundPhrases, and WordsMinusBoundPhrases.
collapse

Indicates that Phrase should remove all of the characters in the collapse list from the field when it creates an index term.

Example:    collapse = ?!,"':;&_-< >[]  
debugPhrase

Turns on debugging for all processing performed by the Phrase class. Possible values are:

TRUE    Turns on debugging.
FALSE  

Performs no debugging. (default)

extraIndex*

Repeatable

Indicates that another index should receive any terms extracted for this index by specifying the unique ID for another index defined for the database (as defined in the index parameter of its [index_definition] section.

Example:    extraIndex* = 20
extraTrimChars

Adds characters to the default list of trimChars for the current index only. This may be simpler than modifying the value of trimChars if you only want to add one or two more trim characters.

Example:    extraTrimChars = @^ 
indexAfter*

Repeatable

Contains a character or string that indicates the start of the phrase to index. Use a separate line for each character or string.

Examples:    indexAfter* = (
indexAfter* = \u0020

Note:    This is less efficient than using a bounds parameter.
indexUpTo*

Repeatable

Contains a character or string that indicates the end of the phrase to index. Use a separate line for each character or string.

Examples:    indexUpTo* = )
indexUpTo* = \u0020

Note:    This is less efficient than using a bounds parameter.
indicator1

Optional for:
MARC

Indicates that indicator1 for this field must match the specified indicator(s) for the field to be indexed.

Example:    indicator1 = 1  
indicator2

Optional for:
MARC

Indicates that indicator2 for this field must match the specified indicator(s) for the field to be indexed.

Example:    indicator2 = 2 
indicators

Optional for:
MARC

Requires that both indicator1 and indicator2 for this field must match the specified indicators for the field to be indexed.

Example:    indicators = 12 
joinFieldsWith

Indicates the character or string to insert between subfields when creating a single index term from one or more subfields, as specified with the subfield* parameter.

Example:    joinFieldsWith = \u0020\u0020

Note:    Do not separate characters with spaces unless you want to use a space as part of the string.
maxLength

Shortens the index term to the specified number of characters. The practical limit for an index term is approximately 2000 characters.

Example:    maxLength = 80
nonFilingIndicator1
nonFilingIndicator2

Optional for:
MARC

Indicates that the value of the field's first indicator (for nonFilingIndicator1) or second indicator (for nonFilingIndicator2) determines the number of characters to remove from the beginning of the field. Possible values are:

TRUE    Use the value of nonFilingIndicator1 or nonFilingIndicator2 to determine the number of characters to remove from the beginning of the field.
FALSE  

Do not use the value of nonFilingIndicator1 or nonFilingIndicator2 to determine the number of characters to remove from the beginning of the field. (default)

Example:    nonFilingIndicator1 = true
notIndicators

Optional for:
MARC

Specifies that the two indicators for this field must not be equal to (or must be different from) the specified indicator(s) for the field to be indexed.

Example:    notIndicators = 04
replace

Indicates that you want to replace characters when indexing a term. Refers to another section in the index definition with a list of characters and the replacement value for each character in the indexed term.

Example:    replace = replaceSect
replaceSect

Required:
When using replace

Section that specifies a list of characters and the replacement values to use when indexing these characters. Use with replace.

Example:    [replaceChars]
\u03B1 = alpha
\u03B2 = beta
\u03B3 = gamma
startOffset

Indicates the number of characters to ignore at the beginning of the field when creating an index term.

Note:    Phrase counts off the startOffset for a field before removing any trim characters (see trimChars and extraTrimChars) from the start or end of a field or removing any collapse characters from the field.

Example:    startOffset = 10
stripHTML

Indicates whether to remove any HTML codes from the term to be indexed, where an HTML code is a word or phrase within angle brackets (<>). Possible values are:

TRUE    Remove HTML codes from the term.
FALSE  

Do not remove HTML codes from the term. (default)

subfield*

Repeatable

Indicates a subfield to include in an index term. Use to selectively include subfields from a field when you don't want to use all subfields in a field. Use in conjunction with joinFieldsWith.

Note:   

If you use this parameter, remember to specify a tagpath parameter one level higher than the subfields. For example, to index subfields 1 and 2 from the MARC 245 field:

tagpath* = 245
subfield* = 1
subfield* = 2

trimChars

Indicates the characters to remove from the beginning or end of the field when creating an index term. Type the list of characters without spaces between them. If you embed a space between two other characters, a space becomes one of the trim characters.

Example:    trimChars = _ '&.,:* (default)

Notes:   

To add characters only, use extraTrimChars instead. Phrase (or other index routines that extend Phrase) adds the characters in extraTrimChars to the default list of trim characters.

To modify the default list of trim characters in any other way (such as removing trim characters or both adding and removing trim characters), specify a value for trimChars that includes all trim characters for the index.


 

Examples

Click the links below to see how to use Phrase to emulate these Newton index routines:
 
  • combad()
  • ddc()
  • greekphrase()
  • phrase2()
  • phrbhyp()

    Return to Index Routine List   

    Return to Contents


     
    PhraseMinusBoundPhrases
     
    Extends   

    Phrase
     

    Description

    PhraseMinusBoundPhrases creates an index term from a field that includes all the data in the field except for data contained within user-defined bounds characters. If the field contains more than one set of bounds characters, it removes only the data within the first set of bounds characters before creating the index term.
     

    Parameters Unique to This Class

    Parameter

    Description

    debugPhraseMinusBoundPhrases

    Adds debugging statements generated by PhraseMinusBoundPhrases to the standard output. Possible values are:

    TRUE    Turn on debugging.
    FALSE  

    Do not turn on debugging. (default)

    Note:    This parameter only affects debugging statements specific to PhraseMinusBoundPhrases. To see debugging statements generated by Phrase, set debugPhrase=TRUE as well.

     

    Other Parameters
     

    Parameters inherited from Phrase
     

    Notes

    When using this index routine, specify the bounds characters with the bounds parameter (from Phrase) in the index definition.

    Example:    bounds =  ()

     

    Examples

    Click here to see how to use PhraseMinusBoundPhrases to emulate the Newton govtdoc() index routine.

    Return to Index Routine List   

    Return to Contents


     
    PhraseWithinBoundPhrases
     
    Extends   

    Phrase
     

    Description

    PhraseWithinBoundPhrases creates an index term from the data contained within a user-defined set of bounds characters. It ignores the rest of the data in the field, including the bounds characters. If the field contains more than one set of bounds characters, it indexes only the data within the first set of bounds characters.
     

    Parameters Unique to This Class

    Parameter

    Description

    debugPhraseWithinBoundPhrases

    Adds debugging statements generated by PhraseWithinBoundPhrases to the standard output. Possible values are:

    TRUE    Turn on debugging.
    FALSE  

    Do not turn on debugging. (default)

    Note:    This parameter only affects debugging statements specific to PhraseWithinBoundPhrases. To see debugging statements generated by Phrase, set debugPhrase=TRUE as well.

     

    Other Parameters

    Parameters inherited from Phrase

    Notes

    When using this index routine, specify the bounds characters with the bounds parameter (from Phrase) in the index definition.

    Example:    bounds = ()

    Return to Index Routine List   

    Return to Contents



    SimplePatterns
     
    Extends   

    Phrase
     

    Description

    SimplePatterns creates index terms from words that match a pattern defined in index definition. For example, if the pattern is "isbn*9", it only generates index terms for that fields that begin with "isbn" and end with "9." Therefore, it would create an index term for "isbn077821278909", but not for "077821278909" or "isbn077821278905".
     

    Parameters Unique to This Class

    Parameter

    Description

    debugSimplePatterns

    Adds debugging statements generated by SimplePatterns to the standard output. Possible values are:

    TRUE    Turn on debugging.
    FALSE  

    Do not turn on debugging. (default)

    Note:    This parameter only affects debugging statements specific to SimplePatterns. To see debugging statements generated by Phrase, set debugPhrase=TRUE as well.
    pattern

    Contains a pattern that the field must contain for SimplePatterns to generate an index term from a word. The pattern can include the wildcard characters "*" and "?".

    Example:    pattern = isbn*9
    maxWordLength

    Specifies the maximum number of characters that a word can contain to be indexed. SimplePatterns ignores any words that exceed the maxWordLength value.

    Example:    maxWordLength =  15

     

    Other Parameters

    Parameters inherited from Phrase


    Return to Index Routine List   

    Return to Contents


    Index Routines for Keyword Indexes

    Words
     
    Extends   

    Phrase
     

    Description

    Words is a commonly-used index routine that extracts and stores individual terms from fields in a record. Words is the basis for a number of other keyword index routines.
     

    Parameters Unique to This Class

    Parameter

    Description

    debugWords

    Adds debugging statements generated by Words to the standard output. Possible values are:

    TRUE    Turn on debugging.
    FALSE  

    Do not turn on debugging. (default)

    Note:    This parameter only affects debugging statements specific to Words. To see debugging statements generated by Phrase, set debugPhrase = TRUE as well.

    delimiters

    List of characters that separate words in a field. Used to determine where one word ends and the next word starts.

    Default:  delimiters = \t\n\r+-=<>(){}[]:;/\\\"!?",   

    Note:    Use extraDelimiters to specify additional delimiters for a specific index. Use removeDelimiters to specify characters to remove from the default list for a specific index. Otherwise, the list of characters in this must contain all delimiters applicable to a specific index.
    extraDelimiters

    Indicates character(s) to add to the default list of delimiters for a specific index.

    Example:  extraDelimiters = .  

    maxWordLength

    Defines the maximum length for a word used as an index term. The Words routine truncates a word with more characters that the maxWordLength value. The default is to index words of any length.

    maxWords

    Specifies the maximum number of words to index in a field. The Words routine ignores other words in the field once the number of words indexed reaches the maxWords value. The default is to index all words in a field.

    minWordLength

    Specifies the minimum number of characters that a word must contain to be indexed. The default is to index words of any length.

    removeDelimiters

    Indicates character(s) to remove from the default list of delimiters for a specific index.

    Example:  removeDelimiters = "  

     

    Other Parameters
     

    Parameters inherited from Phrase
     

    Examples

    Click the links below to see how to use Words to emulate these Newton index routines:
     

    Return to Index Routine List   

    Return to Contents



    BerInteger
     
    Extends   

    ORG.oclc.pears.util.IndexRoutine
     

    Description

    BerInteger allows you to index numeric fields that contain integer values. For example, a database may contain OCLC numbers stored as integers rather than character strings. BerInteger creates index terms from the integer values.
     

    Parameters

    Parameter

    Description

    debugBerInteger

    Adds debugging statements generated by BerInteger to the standard output. Possible values are:

    TRUE    Turn on debugging.
    FALSE  

    Do not turn on debugging. (default)


     

    Return to Index Routine List   

    Return to Contents



    LCCardNumber
     
    Extends    Phrase
     

    Description

    LCCardNumber creates an index term for a Library of Congress Control Number (LCCN). It creates a 12-character index term, which does not contain optional suffixes that begin in character position 13 of the LCCN.
     

    Parameters Unique to This Class

    Parameter

    Description

    debugLCCardNumber

    Adds debugging statements generated by LCCardNumber to the standard output. Possible values are:

    TRUE    Turn on debugging.
    FALSE  

    Do not turn on debugging. (default)

    Note:    This parameter only affects debugging statements specific to LCCardNumber. To see debugging statements generated by Phrase, set debugPhrase=TRUE as well.

     

    Other Parameters

    Parameters inherited from Phrase


    Return to Index Routine List   

    Return to Contents



    LCClass
     
    Extends   

    Phrase
     

    Description

    LCClass constructs a standard Library of Congress (LC) classification number from the data in a field, in the format AAA1111.222.a333, where:

    AAA    represents one to three alphabetic characters, padded on the right with underscores (_). All characters are converted to lower case.
    1111 represents four numerals, padded on the left with zeros (0).
    222 represents three numerals, padded on the right with zeros (0).
    a represents a single alphabetic character.
    333 represents one to three numerals, with no padding. This section is truncated to three numerals if has over three numbers in the data.

    If a field has an LC number in the format AAA1111, LCClass does not add append any additional numerals or alphabetic characters when creating the index term.

    If a field has an LC number with an a333 section, it must also contain the 222 section.

    Input /output examples

    Input

     

    Output

    PR5398 pr_5398
    PN1992.8.S35 pn_1992.800.s35
    PS3573.I456213 ps_3573.000.i456

     

    Parameters Unique to This Class

    Parameter

    Description

    debugLCClass

    Adds debugging statements generated by LCClass to the standard output. Possible values are:

    TRUE    Turn on debugging.
    FALSE  

    Do not turn on debugging. (default)

    Note:    This parameter only affects debugging statements specific to LCClass. To see debugging statements generated by Phrase, set debugPhrase=TRUE as well.

     

    Other Parameters
     

    Parameters inherited from Phrase

    Examples

    Click here to see how to use LCClass to emulate the Newton lcclass() index routine.
     

    Notes

    Even though LCClass creates keyword indexes, it extends Phrase rather than Words so that segments of a LC number separated by spaces are not processed as individual index terms.

    Return to Index Routine List   

    Return to Contents



    MarcBibliographicLevel
     
    Extends   

    Words
     

    Description

    Generates an index term for a record based on the value of the bibliographic level byte (07) in the leader of a MARC record, as follows:

    For this
    value ...
      the index term
    created is ...
      which refers to this bibliographic level...
    a
    analytic analytic monograph
    b
    analytic analytic serial
    m
    monograph monograph
    s
    serial serial
    c
    collection collection
    d
    subunit subunit

     

    Parameters Unique to This Class

    Parameter

    Description

    debugMarcBibliographicLevel

    Adds debugging statements generated by MarcBibliographicLevel to the standard output. Possible values are:

    TRUE    Turn on debugging.
    FALSE  

    Do not turn on debugging. (default)

    Note:    This parameter only affects debugging statements specific to MarcBibliographicLevel To see debugging statements generated by Words and/or Phrase, set debugWords=TRUE and/or debugPhrase=TRUE as well.

     

    Other Parameters

    Parameters inherited from Words
    Parameters inherited from Phrase


    Return to Index Routine List   

    Return to Contents



    MarcFormat
     
    Extends   

    Words
     

    Description

    Generates an index term for a record based on the values of the type of record (06) and bibliographic level (07) bytes in the leader of a MARC record, as follows:

    For this
    record type(s) ...
        and this bibliographic level(s) ...     the index term created is ...    

    which refers to ...

    a, t m,c,a,d bks books
    e, f
    any
    map maps
    p,b
    any
    mix mixed materials
    m
    any
    com computer files
    c, d
    any
    sco scores
    any
    s,b
    ser serials
    i, j
    any
    rec sound recordings
    g, k, o, r
    any
    vis visual materials

    Parameters Unique to This Class

    Parameter

    Description

    debugMarcFormat

    Adds debugging statements generated by MarcFormat to the standard output. Possible values are:

    TRUE    Turn on debugging.
    FALSE  

    Do not turn on debugging. (default)

    Note:    This parameter only affects debugging statements specific to MarcFormat. To see debugging statements generated by Words and/or Phrase, set debugWords=TRUE and/or debugPhrase=TRUE as well.

     

    Other Parameters

    Parameters inherited from Words
    Parameters inherited from Phrase


    Return to Index Routine List   

    Return to Contents



    MarcLanguage
     
    Extends   

    Words
     

    Description

    MarcLanguage generates an index term for a record based on the value of the three-character language code in the 008 field of a MARC record. For example, for the language code "fre", it generates the index term "french".

    See MARC Code List for Languages for a list of the language codes.
     

    Parameters Unique to This Class

    Parameter

    Description

    debugMarcLanguage

    Adds debugging statements generated by MarcLanguage to the standard output. Possible values are:

    TRUE    Turn on debugging.
    FALSE  

    Do not turn on debugging. (default)

    Note:    This parameter only affects debugging statements specific to MarcLanguage. To see debugging statements generated by Words and/or Phrase, set debugWords=TRUE and/or debugPhrase=TRUE as well.

     

    Other Parameters

    Parameters inherited from Words
    Parameters inherited from Phrase
     

    Examples

    Click here to see how to use MarcLanguage to emulate the Newton marcla() index routine.

    Return to Index Routine List   

    Return to Contents



    MarcTypeOfMaterial
     
    Extends   

    Words
     

    Description

    Generates an index term for the record based on the value of the type of record byte (06) in the leader of a MARC record, as follows:

    For this
    code ...
      the index term
    created is ...
      which refers to this type of material...
    a, t bks books
    e, f
    map
    maps
    p
    mix
    mixed materials
    m
    com
    computer files
    c, d
    sco
    scores
    s
    ser
    serials
    i, j
    rec
    sound recordings
    g, k, o, r
    vis
    visual materials

    Note:    Set tagpath = 0 when using this index routine.

    Parameters Unique to This Class

    Parameter

    Description

    debugMarcBibliographicLevel

    Adds debugging statements generated by MarcTypeOfMaterial to the standard output. Possible values are:

    TRUE    Turn on debugging.
    FALSE  

    Do not turn on debugging. (default)

    Note:    This parameter only affects debugging statements specific to MarcTypeOfMaterial. To see debugging statements generated by Words and/or Phrase, set debugWords=TRUE and/or debugPhrase=TRUE as well.

     

    Other Parameters

    Parameters inherited from Words
    Parameters inherited from Phrase


    Return to Index Routine List   

    Return to Contents



    Numbers
     
    Extends   

    Words
     

    Description

    Numbers removes all but numeric characters (0 through 9) from a field and creates an index term from the remaining numeral.
     

    Parameters Unique to This Class

    Parameter

    Description

    debugNumbers

    Adds debugging statements generated by Numbers to the standard output. Possible values are:

    TRUE    Turn on debugging.
    FALSE  

    Do not turn on debugging. (default)

    Note:    This parameter only affects debugging statements specific to Numbers. To see debugging statements generated by Words and/or Phrase, set debugWords=TRUE and/or debugPhrase=TRUE as well.
    zeropad

    Specifies a minimum term length. If a generated term has fewer digits than the number specified in the zeropad parameter, Numbers adds enough zeros to the left of the term to reach the required length.

    Example:    zeropad = 2

     

    Other Parameters

    Parameters inherited from Words
    Parameters inherited from Phrase
     

    Examples

    Click the links below to see how to use Numbers to emulate these Newton index routines:


    Return to Index Routine List   

    Return to Contents



    PluralWords
     
    Extends   

    Words
     

    Description

    Removes plural endings ('s', 'es', or 'ies') from words before indexing them. By subsequently using this class as a query normalizer in WebZ, your patrons can enter plural forms of a search term and receive results that contain either the singular or plural forms of the term.
     

    Parameters Unique to This Class

    Parameter

    Description

    debugPluralWords

    Adds debugging statements generated by PluralWords to the standard output. Possible values are:

    TRUE    Turn on debugging.
    FALSE  

    Do not turn on debugging. (Default)

    Notes:    (1)This parameter only affects debugging statements specific to PluralWords. To see debugging statements generated by Words and/or Phrase, set debugWords=TRUE and/or debugPhrase = TRUE as well.
     
    (2) Consider the type of field you are indexing when deciding whether to use this index. For example, it is not appropriate for a field that contains names. It removes plural endings that need to remain in the index term for a search to succeed, which results in changing the index term, such as "Jones" to "Jon".

     

    Other Parameters

    Parameters inherited from Words
    Parameters inherited from Phrase

    Notes

    For all indexes you generate with this index routine, remember to specify PluralWords as the query normalizer in the [index_definition] sections of their WebZ database configuration file as follows:

    filter = ORG.oclc.pears.IndexRoutines.PluralWords


    Return to Index Routine List   

    Return to Contents



    PublicationDate
     
    Extends   

    Words
     

    Description

    PublicationDate removes non-numeric characters from a date field, such as the MARC 008 fixed field, and creates an index term for the remaining numeral.
     

    Parameters Unique to This Class

    Parameter

    Description

    debugPublicationDate

    Adds debugging statements generated by PublicationDate to the standard output. Possible values are:

    TRUE    Turn on debugging.
    FALSE  

    Do not turn on debugging. (Default)

    Note:    This parameter only affects debugging statements specific to PublicationDate. To see debugging statements generated by Words and/or Phrase, set debugWords=TRUE and/or debugPhrase=TRUE as well.

     

    Other Parameters
     

    Parameters inherited from Words
     

    Example

    Click here to see how to use PublicationDate to emulate the Newton pubdate() index routine.

    Return to Index Routine List   

    Return to Contents



    SmartWords
     
    Extends   

    PluralWords
     

    Description

    SmartWords is a refinement of PluralWords that ensures that words contain at least two characters (after removing plural endings) before creating index terms for them.
     

    Parameters Unique to This Class

    Parameter

    Description

    debugSmartWords

    Adds debugging statements generated by SmartWords to the standard output. Possible values are:

    TRUE    Turn on debugging.
    FALSE  

    Do not turn on debugging. (Default)

    Note:    This parameter only affects debugging statements specific to SmartWords. To see debugging statements generated by PluralWords, Words and/or Phrase, set debugPluralWords=TRUE, debugWords=TRUE and/or debugPhrase=TRUE as well.

     

    Other Parameters

    Parameters inherited from PluralWords
    Parameters inherited from Words
    Parameters inherited from Phrase


    Return to Index Routine List   

    Return to Contents



    WordsMinusBoundPhrases
     
    Extends   

    Words

    Description

    WordsMinusBoundPhrases indexes all the words in a field except for the words between user-defined bound characters, such as the left and right parentheses.
     

    Parameters Unique to This Class

    Parameter

    Description

    debugWordsMinusBoundPhrases

    Adds debugging statements generated by WordsMinusBoundPhrases to the standard output. Possible values are:

    TRUE    Turn on debugging.
    FALSE  

    Do not turn on debugging. (Default)

    Note:    This parameter only affects debugging statements specific to WordsMinusBoundPhrase. To see debugging statements generated by Words and/or Phrase, set debugWords=TRUE and/or debugPhrase=TRUE as well.

     

    Other Parameters

    Parameters inherited from Words
    Parameters inherited from Phrase
     

    Notes

    When using this index routine, specify the bounds characters with the bounds parameter (from Phrase) in the index definition.

    Example:    bounds =  ()
     
    Example Click here to see how to use WordsMinusBoundPhrases to emulate the Newton isbn() index routine.

    Return to Index Routine List   

    Return to Contents



    YearRange
     
    Extends   

    PublicationDate
     

    Description

    YearRange creates a series of numeric index terms based on two numbers within a field. The numbers are the earliest and latest dates in a date range. Optionally, YearRange only creates these terms if the field also contains character(s) specified in the mustContain parameter. For example, if the field contains the date range "1962-1966", YearRange creates the index terms "1962", "1963", "1964", "1965", and "1966". YearRange always right-pads the numbers with enough zeros to create a four-digits number.

    Note:   

    YearRange may yield unpredictable results if both numbers do not contain four digits.

    For example, if the field contains the date range "1962-66", it pads "66" so it becomes "6600". Instead of creating index terms for all the years between 1962 and 6600, it uses the maxTerms parameter to determine the maximum number of terms to create for any date range. If maxTerms = 50, YearRange creates index terms "1962", "1962", "1963", "1964", "1965", "1966", "1967", "1968" ... "2010", "2011". Here, the year range doesn't really reflect the year range specified in the field.
     

     

    Parameters Unique to This Class

    Parameter

    Description

    debugYearRange

    Adds debugging statements generated by YearRange to the standard output. Possible values are:

    TRUE    Turn on debugging.
    FALSE  

    Do not turn on debugging. (Default)

    Note:    This parameter only affects debugging statements specific to the YearRange. To see debugging statements generated by PublicationDate, Words and/or Phrase, set debugPublicationDate=TRUE, debugWords=TRUE and/or debugPhrase=TRUE as well.
    mustContain

    Designates a character(s) that the field must contain for YearRange to generate index terms over the date range specified in a field. A typical value is a hyphen.

    Example:    mustContain = -
    maxTerms

    Designates the maximum number of index terms created for any date range.

    Example:    maxTerms = 100 (default)

     

    Other Parameters

    Parameters inherited from PublicationDate
    Parameters inherited from Words
    Parameters inherited from Phrase
     

    Examples

    Click here to see how to use YearRange to emulate the Newton numrang() index routine.

    Return to Index Routine List   

    Return to Contents


    Other Index Routines


    StopwordEnforcer
     
    Extends   

    ORG.oclc.pears.util.IndexRoutine
     

    Description

    StopwordEnforcer removes stopwords from keyword indexes. Stopwords are words that you want to exclude from an index because they occur too frequently or because they are not significant as search terms. Typical stopwords include words such as "a", "and", "an", "the", and "but".

    For efficiency, StopwordEnforcer removes stopwords from all keyword indexes in one pass after index terms for all the indexes defined in a database description configuration file have been created.
     

    Parameters

    Parameter

    Description

    stopword*

    Repeatable
    Specifies a global stopword that you want to exclude from all indexes.

     

    Notes

    In the [index_definition] section for global stopwords, use these values for the tagpath and index parameters:
     
    index = 0   "0" indicates that no index should be created from this index definition.
    tagpath = none "none" indicates that this index definition should not be executed until all indexes have been created.

    Return to Index Routine List   

    Return to Contents


    See Also

    Pears System Overview
    Pears-Newton Index Routine Comparison
    Pears Database Description Configuration File
    Creating a New Pears Database
    Converting a Newton Database to a Pears Database

     

    [Main][Documentation][Support][Technical Reference][Community][Glossary][Search]

    Last Modified:
    Go to Phrase description Go to StopwordEnforcer description Go to BerInteger description Go to LCCardNumber description Go to SimplePatterns Description Go to LCClass description Go to Words description Go to PhraseMinusBoundPhrases description Go to PhraseWithinBoundPhrases description Go to MarcFormat description Go to MarcBibliographicLevel description Go to MarcTypeOfMaterial description Go to MarcLanguage description Go to Numbers description Go to wordsMinusBoundPhrases description Go to PublicationDate description Go to PluralWords description Go to YearRange description Go to SmartWords description