Main -> Documentation -> Database Builder – Pears -> Pears Database Description Configuration File

Pears Database Description Configuration File

Contents

Introduction
Document Conventions
Sections in a Pears Database Description Configuration File
Parameters in Each Section of a Pears Database Description Configuration File
Example Pears Database Description Configuration File


Introduction

In much the same way as a .dsc file is used to construct a Newton database, a Pears database description configuration file supplies the rules for handling records and extracting index terms to build a local Pears database.

Like all configuration files, a Pears database description configuration file is a text file that can be easily viewed and/or modified using a variety of UNIX or Windows-based editors. Unlike other configuration files, however, it contains only a few required section names. Most section names in a Pears database description configuration file are strictly descriptive, allowing you to label a section irrespective of any class. In most cases, you can use a name that is most meaningful to you or to fellow database administrators.

The parameters that you set in a particular section within a Pears database description configuration file depend on your objective for that section. Some sections declare the input type of records or the specific routine that is to be used to process records. Others define global stopwords or indexes. Regardless of their purpose or quantity, the various sections collectively form the blueprint that is used to build a Pears database.

While there are some conventional and overall structural similarities between a Pears database description configuration file and other configuration files, the purpose of this document is to outline the unique sections and parameters that make it different from files of the same type.


Document Conventions

General Conventions

  • <WebZ_root> is the location of your WebZ environment.
  • Sections or parameters in italics, such as [index_definition], denote sections or parameters that you name yourself and that will vary.

Required and Optional Sections

This document contains a description of each possible section in a Pears database description configuration file. A section can be either optional or required. The table below describes how you can distinguish the difference.

Example Explanation
[DB] – Required The [dB] section is required for all Pears databases. It sets general parameters for the database.
[docrulen] Optional A [docrulen] section defines a restrictor for a database.

Required and Optional Parameters for Each Section

After the table of general information about each of the sections in a Pears database description configuration file, there is a corresponding series of tables that lists every possible parameter that may be declared under each section. Whether a parameter is required or optional depends on what you are trying to accomplish. For instance, an optional section may have some parameters that are required if you choose to include the section in a Pears database description configuration file.

The table that lists the parameters in a section has two columns Parameter and Description. The Parameter column includes the parameter name and an indication of whether the parameter is optional or required and if it recurs within a section, like these examples:

Example Explanation
tagpath

Required
Recurring
The tagpath parameter points to a specific field in the BER record from which terms will be extracted in order to build an index. More than one field is often used to build an index, so more than one tagpath is allowed under an index definition section.
 routine

Required
The routine parameter declares the specific Java class that is to be used to create an index.
startoffset

Required for: Indexing
routines that handle
Marc fixed fields
The startoffset parameter identifies the beginning position within the fixed field string in a Marc record where the index routine is to begin extracting index terms.

The Description column provides a definition of the parameter and notes pertaining to its use. If applicable, it includes a list of allowable values for the parameter. One of these values may be the default.

Return to Contents


Sections in a Pears Database Description Configuration File

The following table lists all the possible sections in a Pears database description configuration file. In addition to the name of the section and its description, the table also indicates if the section is required or optional.

Section
Description
[Bartlett] Optional

The [Bartlett] section is used to set specific Bartlett run-time parameters.

Example: badRecordMessageFile=badRecordMessage.txt

[dB] Required

The [dB] section sets general parameters that must preexist so that the designated database is updated with the correct records and indexes.

[LockServer] Optional

The [LockServer] section contains the parameters needed to connect to a server that locks the database during a Bartlett update procedure. This locking mechanism prevents multiple copies of Bartlett from updating the database at the same time.

Note: For more information about Pears database and record locking, see Step 7 under the Update Procedure in Creating a New Pears Database.
[Handleinput_record_type] – Optional for certain input record types

Based on input record type, some record handling classes use additional parameters to define exactly how records are to be extracted from the data input file. This section contains those parameters. When you define a [Handleinput_record_type] section, always derive its name from "Handle" and the value for the InputRecordType variable that you declared under the dB section.

Example: [HandleUSMARC]
[stopwords] Optional Use the [stopwords] section to define a global list of stopwords that will be removed from all indexes after they have been built.
[docrulen] Optional

Use a [docrule] section to define a restrictor for a Pears database.

Example:

[docrule2]
index = 49
routine = ORG.oclc.pears.Bartlett.termrest
parameters = english german french

[Partitions] Optional

The [Partitions] section references the individual parts of a partitioned database and it contains the class that enables those parts to work together as a logical unit.

Note: A [Partitions] section exists only in a database description configuration file for a partitioned database.
[indexfilen] Optional

Indexes for a Pears database can be partitioned. In order to partition them, you must define a separate [indexfile] section for each one.

Example:

 

[indexfile1]
filename=/home/dbbuilder/dbs/marc/marc.index1.pdb
indexID=1
[index_definition] Optional

This section contains the parameters necessary to define an index. You must set up an [index_definition] section for each index that you wish to create for a Pears database.

Example:

 

[Titles]
index=2
routine=ORG.oclc.Pears.IndexRoutines.PluralWords
OccurrenceRoutine=ORG.oclc.pears.Bartlett.wordfield
tagpath*=242/01
tagpath*=245/01
tagpath*=246/01

Note: You may name an index definition anything you choose. The best practice is to select a name that describes the content of the index such as Title or Author.

 


Return to Contents


Parameters in Each Section of a
Pears Database Description Configuration File

[dB] Section - Required

The [dB] section contains general parameters that do things like set the size of the database file, declare the format of the input data, and provide a name for the database.

Parameter Description
blocksize

Optional

The blocksize sets the capacity of the .pdb file in terms of bytes. The default and highly recommended setting is 16384 bytes.

Example:

blocksize=16384

 InputRecordType

Required

InputRecordType identifies the format of the records in the input data file. Based on its value, the appropriate record handler is selected and executed by the main RecordHandler class to update the database.

Example: inputRecordType=USMARC

Note: Both the inputRecordType parameter and its possible values are case sensitive. Other possible values include "SGML" and "TEXT."

Name

Optional
The Name parameter provides the internal name for the database. If it is not used, the registered name in SSDOT is assumed by default. The internal name is used to refer to the database in other configuration files such as the ObiTopics.ini file.
filename

Optional
Use this parameter to provide the full path to the .pdb file, if it is located somewhere other than in the same directory as the database description configuration file. If you don't use this parameter, SSDOT for Pears or Bartlett assumes that the .pdb file is located in the same directory as the database description configuration file.
RecordIDIndex

Optional

This parameter identifies the index that contains the unique ID for each record in the database.

Note: Although it is optional to create one, a unique record ID must exist for you to replace or delete a record in a Pears database.

Return to Contents

[LockServer] Section - Optional

The [LockServer] section contains the parameters necessary to associate the database with a locking mechanism so that multiple Bartletts cannot update the database at the same time.

Parameter Description
Host

Required

This is the domain name or IP address of the host on which the lock server resides. Customarily, the lock server is on the same host as the database.

Example:

Host=cypress.dev.oclc.org

 Port

Required

This is the port at which the lock server is currently running.

 

[Handleinput_record_type] Section - Optional for certain input record types

Under the [Handleinput_record_type] section, you may define how records of a certain type are to be handled by the RecordHandler class. For example, you can use a special filterclass parameter and its value under this section to extract only those records that meet very specific criteria.

Note: Because the parameters that can be declared under the [Handleinput_record_type] section may vary according to the record handler class, only those common to all the classes are listed in the table below. For a more complete list, see the document Pears Record Handlers.

Parameter Description
filterclass

Optional

This parameter references a special class that extends the general Record Handler class to extract only those records that satisfy certain user-defined requirements.

Example:  

[handleusmarc]
filterclass=ORG.oclc.RecordHandler./
BERCorporateAuthorityOnlyFilter


Note: To find out more about record handlers and their extended filter classes, see Pears Record Handlers.
LocalByteConverter

Optional

This parameter references the character set to be used by the record handler to convert certain characters in a record to Unicode before the record is committed to the database.

Example:  

[handleDB]
LocalByteConverter=USM94

 

[global_stopwords] Section - Optional

You may create a global list of stopwords to be discarded from each index by listing them under a [global_stopwords] section in the database description configuration file. This section contains most of the same parameters used for an index definition, but the settings for these parameters are unique.

Parameter Description
index

Required

A "0" value for the index parameter tells StopwordEnforcer to essentially ignore this index and that no terms are to be extracted for it; rather, terms are to be pulled out of the other indexes after they have been built. A "0" is the only recognized value for this parameter when it's used to define a stopwords list.

Example: index=0
tagpath*

Required

The tagpath parameter is not recurring for a stopwords list. Since no terms are created, no BER field paths are required. The tagpath=none setting works in concert with the index=0 setting (see above) to ensure that the designated terms are removed from all indexes. "None" is the only recognized value for this parameter when it is used to define a stopwords list.

Example: tagpath=none
routine

Required

The routine parameter references the specific Java indexing class that searches all the indexes and discards the designated terms. It does not generate any index terms itself.

Example:  
routine=ORG.oclc.pears.IndexRoutines.StopwordEnforcer
stopwordn

Required
Recurring

The stopword parameter identifies the specific term that is to be removed from an index. Since multiple stopwords may exist, the stopword parameter may occur an unlimited number of times under a [stopwords] section in a database description configuration file.

Example: stopword*=a
stopword*=and
stopword*=the

 

[docrulen] Section - Optional

As with local Newton databases, you may define restrictors for local Pears databases. To define a restrictor, you need to set parameters under a [docrule] section in the database description configuration file. You must set up a new [docrule] section for each restrictor that you wish to use for a database.

Parameter Description
index

Required

This is the unique identification number of the restrictor index. No other index or restrictor definition may have this same number.

Example: index = 24
routine

Required

The routine parameter references the specific class that will be used to create the restrictor.

Example: routine=ORG.oclc.pears.Bartlett.termrest
parameters

Optional

Parameters identifies the exact terms that the search engine uses to retrieve records from the database. If not declared, the terms are assumed to be numbers, which creates a numeric restrictor by default.

Example: parameters=English German French

Return to Contents

[Partitions] Section - Optional

A Pears database can be partitioned by record content so that records from a single input stream are automatically divided among multiple Pears databases. To accomplish this, you need to set up a special partitioned database description configuration file and include a [Partitions] section in it. Under this section, you must include references to the individual database description configuration files that make up the parts of the partitioned database. Additionally, the database configuration file for an each partition must include the "filename" parameter and an accompanying file reference in its [database] section.

Parameter Description
class

Required

This parameter refers to the class that determines into which database a record should be deposited.

Example:  

class=ORG.oclc.pears.Bartlett.\
PartitionByNumericFieldValue

Database*

Required
Recurring

Each instance of the Database parameter references an individual database description configuration file for one part of a partitioned database. There should be as many occurrences of the Database parameter as there are parts to the database.

Example: Database*=marc4desc.ini
Database*=marc5desc.ini
Database*=marc6desc.ini

 

[indexfilen] Section - Optional

You can break a large Pears database into several files by defining an individual [indexfile] section for each index that is a part of a partition. Under this section, you identify the directory path to a .pdb file where specific index terms are to be stored after they are extracted from records. You then associate an index with a partition by using its unique ID (see [index_definition] Section).

Parameter Description
filename

Required

The filename parameter refers to a dedicated region in the .pdb file where the index terms are to be stored.

Example:  

filename=/home/dbbuilder/dbs/marc/marc.index1.pdb

indexID

Required

The indexID corresponds to a unique index number for an existing index. This parameter links the index to a partition and ensures that the terms from the index are stored at the location designated by the filename parameter.

Example: indexID=1

 

[index_definition] Section - Optional

Each index to be created for a Pears database must have its own definition in a database description configuration file. There is no limit on the number of indexes for a Pears database. You may define as few or as many indexes as you like.

Parameter Description
index

Required
This is the unique identifier for the index. No other index may have this same value for its index parameter. When referring to an index in another configuration file, use its index number.
tagpath*

Required
Recurring

The tagpath parameter points to a specific field in the BER record from which terms will be extracted in order to build an index. More than one field is often used to build an index, so more than one tagpath is allowed under an index definition section.

Example: tagpath=245/01
tagpath=246/01
routine

Required

The routine parameter references the specific Java indexing class that will be used to extract terms and build the index.

Example: routine=ORG.oclc.pears.IndexRoutines.Words
OccurrenceRoutine

Optional

Use this parameter to create an index that supports term adjacency for a keyword index. Currently, the only acceptable value for this parameter is the class ORG.oclc.pears.Bartlett.wordfield.

Example:  
OccurrenceRoutine=ORG.oclc.pears.Bartless.wordfield

Note: Do not use this parameter for phrase index definitions or for single-term indexes. To do so wastes space in the database.

Return to Contents



Example Pears Database Description Configuration File

What follows is a representative example of a Pears database description configuration file that contains many of the sections and parameters outlined above.

[bartlett]
badRecordMessageFile=<WebZ_root>/dbbuilder/dbs/marcus/badRecordMessage.txt

InputRecordType = USMARC
RecordIDIndex = 17
Name = Marcus
filename = <WebZ_root>/dbbuilder/dbs/Marcus/marcus.pdb

[handleusmarc]
filterclass = ORG.oclc.RecordHandler.USMARCCorporateAuthorityOnlyFilter

[lockserver]
Host = orc
Port = 11110

[stopwords]
index = 0
tagpath = none
routine = ORG.oclc.pears.IndexRoutines.StopwordEnforcer
stopword* = a
stopword* = and
stopword* = the

[titlewords]
index = 1
tagpath* = 100/20
tagpath* = 110/20
tagpath* = 130/1
tagpath* = 240/1
tagpath* = 242/1
tagpath* = 245/1
tagpath* = 245/2
tagpath* = 245/3
tagpath* = 245/8
tagpath* = 245/14
tagpath* = 245/16
tagpath* = 246/1
tagpath* = 440/1
tagpath* = 600/1
tagpath* = 600/2
tagpath* = 600/3
tagpath* = 600/4
tagpath* = 600/17
tagpath* = 600/20
tagpath* = 600/24
tagpath* = 600/25
tagpath* = 600/26
tagpath* = 610/1
tagpath* = 610/2
tagpath* = 610/3
tagpath* = 610/4
tagpath* = 610/14
tagpath* = 610/20
tagpath* = 610/24
tagpath* = 610/25
tagpath* = 610/26
tagpath* = 630/1
tagpath* = 630/4
tagpath* = 630/24
tagpath* = 630/25
tagpath* = 630/26
tagpath* = 650/1
tagpath* = 650/24
routine = ORG.oclc.pears.IndexRoutines.Words

occurenceroutine = ORG.oclc.pears.Bartlett.wordfield

[subjectcategorycodes]
index = 2
tagpath1 = 72/1
routine = ORG.oclc.pears.IndexRoutines.Words

[authorwords]
index = 3

tagpath* = 100/1
tagpath* = 100/2
tagpath* = 100/3
tagpath* = 100/4
tagpath* = 100/5
tagpath* = 100/17
tagpath* = 110/1
tagpath* = 110/2
tagpath* = 110/3
tagpath* = 110/4
tagpath* = 110/14
tagpath* = 700/1
tagpath* = 700/2
tagpath* = 700/3
tagpath* = 700/4
tagpath* = 700/5
tagpath* = 700/17
tagpath* = 710/1
tagpath* = 710/2
tagpath* = 710/3
tagpath* = 710/4
tagpath* = 710/14
tagpath* = 773/1
routine = ORG.oclc.pears.IndexRoutines.Words
occurenceroutine = ORG.oclc.pears.Bartlett.wordfield

[titlephrasewithoutnonfilingindicators]
index = 4
routine = ORG.oclc.pears.IndexRoutines.Phrase
extratrimchars = /
tagpath* = 100/20
tagpath* = 110/20
tagpath* = 246/1
tagpath* = 600/1
tagpath* = 600/2
tagpath* = 600/3
tagpath* = 600/4
tagpath* = 600/17
tagpath* = 600/20
tagpath* = 600/24
tagpath* = 600/25
tagpath* = 600/26
tagpath* = 610/1
tagpath* = 610/2
tagpath* = 610/3
tagpath* = 610/4
tagpath* = 610/14
tagpath* = 610/20
tagpath* = 610/24
tagpath* = 610/25
tagpath* = 610/26
tagpath* = 650/1
tagpath* = 650/24
tagpath* = 650/25
tagpath* = 650/26
tagpath* = 651/1
tagpath* = 651/24
tagpath* = 651/25
tagpath* = 651/26
tagpath* = 653/1
tagpath* = 655/1
tagpath* = 700/20
tagpath* = 710/20

titlePhraseWithNonFilingIndicator2
[tpwnf2]
index = 4
routine = ORG.oclc.pears.IndexRoutines.Phrase
nonFilingIndicator2 = true
extratrimchars = /
tagpath* = 240/1
tagpath* = 242/1
tagpath* = 245/1
tagpath* = 245/2
tagpath* = 245/3
tagpath* = 245/8
tagpath* = 245/14
tagpath* = 245/16
tagpath* = 440/1

[format]
index = 5
routine = ORG.oclc.pears.IndexRoutines.MarcFormat
tagpath = 0
startoffset = 1

[typeofmaterial]
index = 5
routine = ORG.oclc.pears.IndexRoutines.MarcTypeOfMaterial
tagpath = 6

[publisherwords]
index = 6
tagpath* = 260/2
tagpath* = 949/3
routine = ORG.oclc.pears.IndexRoutines.Words
occurenceroutine = ORG.oclc.pears.Bartlett.wordfield

[language]
index = 7
tagpath = 8
routine = ORG.oclc.pears.IndexRoutines.MarcLanguage
startoffset = 35
maxlength = 3

[lccardnumber]
index = 11
tagpath* = 10/1
tagpath* = 20/1
tagpath* = 86/1
tagpath* = 440/24
tagpath* = 773/24
tagpath* = 773/26
tagpath* = 949/1
routine = ORG.oclc.pears.IndexRoutines.Words

[othertitle]
index = 12
tagpath* = 773/7
tagpath* = 773/16
tagpath* = 773/20
routine = ORG.oclc.pears.IndexRoutines.Words
occurenceroutine = ORG.oclc.pears.Bartlett.wordfield

[subjectwords]
index = 14
tagpath* = 600/1
tagpath* = 600/2
tagpath* = 600/3
tagpath* = 600/4
tagpath* = 600/17
tagpath* = 600/20
tagpath* = 600/24
tagpath* = 600/25
tagpath* = 600/26
tagpath* = 610/1
tagpath* = 610/2
tagpath* = 610/3
tagpath* = 610/4
tagpath* = 610/14
tagpath* = 610/20
tagpath* = 610/24
tagpath* = 610/25
tagpath* = 610/26
tagpath* = 630/1
tagpath* = 630/4
tagpath* = 630/24
tagpath* = 630/25
tagpath* = 630/26
tagpath* = 650/1
tagpath* = 650/24
tagpath* = 650/25
tagpath* = 650/26
tagpath* = 651/1
tagpath* = 651/24
tagpath* = 651/25
tagpath* = 651/26
tagpath* = 653/1
tagpath* = 655/1
routine = ORG.oclc.pears.IndexRoutines.Words
occurenceroutine = ORG.oclc.pears.Bartlett.wordfield

[basicindex]
index = 16
tagpath* = 100/20
tagpath* = 110/20
tagpath* = 130/1
tagpath* = 240/1
tagpath* = 242/1
tagpath* = 245/1
tagpath* = 245/2
tagpath* = 245/3
tagpath* = 245/8
tagpath* = 245/14
tagpath* = 245/16
tagpath* = 246/1
tagpath* = 440/1
tagpath* = 700/20
tagpath* = 710/20
tagpath* = 730/1
tagpath* = 730/7
tagpath* = 740/1
routine = ORG.oclc.pears.IndexRoutines.Words
occurenceroutine = ORG.oclc.pears.Bartlett.wordfield

[controlnumber]
index = 17
tagpath1 = 1
routine = ORG.oclc.pears.IndexRoutines.Words

Return to Contents


See Also

Pears Record Handlers

Pears Indexing Routines
Creating a New Pears Database


[Home] [Documentation] [Support] [Search]
Last Modified: