Pears Database
Description Configuration File
Contents
Introduction
Document Conventions
Sections in a Pears Database Description
Configuration File
Parameters in Each Section of a Pears
Database Description Configuration File
Example Pears Database Description Configuration
File
Introduction
In
much the same way as a .dsc file is used to construct a Newton
database, a Pears database description
configuration file supplies the rules for handling records and extracting
index terms to build a local Pears database.
Like
all configuration files, a Pears
database description configuration file is a text file that can be
easily viewed and/or modified using a variety of UNIX or Windows-based
editors. Unlike other configuration
files, however, it contains only a few required section names. Most
section names in a Pears database description
configuration file are strictly descriptive, allowing you to label a section
irrespective of any class. In most cases, you can use a name that is most
meaningful to you or to fellow database administrators.
The
parameters that you set in a particular section within a Pears
database description configuration file depend on your objective for
that section. Some sections declare the input type of records or the specific
routine that is to be used to process records. Others define global stopwords
or indexes. Regardless of their purpose or quantity, the various sections
collectively form the blueprint that is used to build a Pears
database.
While
there are some conventional and overall structural similarities between
a Pears database description configuration
file and other configuration files,
the purpose of this document is to outline the unique sections and parameters
that make it different from files of the same type.
Document
Conventions
General Conventions
- <WebZ_root>
is the location of your WebZ environment.
- Sections or
parameters in italics, such as [index_definition], denote sections
or parameters that you name yourself and that will vary.
Required and Optional
Sections
This document contains
a description of each possible section in a Pears
database description configuration file. A section can be either optional
or required. The table below describes how you can distinguish the difference.
Example |
Explanation |
[DB]
Required |
The [dB] section
is required for all Pears databases.
It sets general parameters for the database. |
[docrulen]
Optional |
A [docrulen]
section defines a restrictor for a database. |
Required and Optional
Parameters for Each Section
After the table
of general information about each of the sections in a Pears
database description configuration file, there is a corresponding
series of tables that lists every possible parameter that may be declared
under each section. Whether a parameter is required or optional depends
on what you are trying to accomplish. For instance, an optional section
may have some parameters that are required if you choose to include the
section in a Pears database description
configuration file.
The table that lists
the parameters in a section has two columns
Parameter and Description.
The Parameter column
includes the parameter name and an indication of whether the parameter
is optional or required and if it recurs within a section, like these
examples:
Example
|
Explanation |
tagpath |
Required
Recurring |
|
The tagpath parameter points to a specific field in the BER record
from which terms will be extracted in order to build an index. More
than one field is often used to build an index, so more than one tagpath
is allowed under an index definition section. |
|
The
routine parameter declares the specific Java class that is to be used
to create an index. |
startoffset |
Required
for: Indexing
routines that handle
Marc fixed fields |
|
The
startoffset parameter identifies the beginning position within the
fixed field string in a Marc record where the index routine is to
begin extracting index terms. |
The Description column
provides a definition of the parameter and notes pertaining to its use.
If applicable, it includes a list of allowable values for the parameter.
One of these values may be the default.
Return
to Contents
Sections
in a Pears Database Description Configuration File
The following table
lists all the possible sections in a Pears
database description configuration file. In addition to the name of
the section and its description, the table also indicates if the section
is required or optional.
Section
|
Description
|
[Bartlett]
Optional |
The [Bartlett]
section is used to set specific Bartlett run-time parameters.
Example: |
badRecordMessageFile=badRecordMessage.txt |
|
[dB]
Required
|
The [dB]
section sets general parameters that must preexist so that the designated
database is updated with the correct records and indexes.
|
[LockServer]
Optional
|
The [LockServer]
section contains the parameters needed to connect to a server that
locks the database during a Bartlett update procedure. This locking
mechanism prevents multiple copies of Bartlett from updating the
database at the same time.
|
[Handleinput_record_type]
Optional for certain input record types |
Based on
input record type, some record handling classes use additional parameters
to define exactly how records are to be extracted from the data
input file. This section contains those parameters.
When you define a [Handleinput_record_type] section, always
derive its name from "Handle" and the value for the InputRecordType
variable that you declared under the dB section.
|
[stopwords]
Optional |
Use
the [stopwords] section to define a global list of stopwords that
will be removed from all indexes after they have been built. |
[docrulen]
Optional |
Use a [docrule]
section to define a restrictor for a Pears database.
Example: |
[docrule2]
index = 49
routine = ORG.oclc.pears.Bartlett.termrest
parameters = english german french
|
|
[Partitions]
Optional
|
The [Partitions]
section references the individual parts of a partitioned database
and it contains the class that enables those parts to work together
as a logical unit.
Note: |
A [Partitions]
section exists only in a database description configuration
file for a partitioned database. |
|
[indexfilen]
Optional |
Indexes for
a Pears database can be partitioned. In order to partition them,
you must define a separate [indexfile] section for each one.
Example: |
|
[indexfile1]
filename=/home/dbbuilder/dbs/marc/marc.index1.pdb
indexID=1 |
|
[index_definition]
Optional |
This section
contains the parameters necessary to define an index. You must set
up an [index_definition] section for each index that you
wish to create for a Pears database.
Example: |
|
[Titles]
index=2
routine=ORG.oclc.Pears.IndexRoutines.PluralWords
OccurrenceRoutine=ORG.oclc.pears.Bartlett.wordfield
tagpath*=242/01
tagpath*=245/01
tagpath*=246/01
|
Note: |
You may
name an index definition anything you choose. The best practice
is to select a name that describes the content of the index
such as Title or Author. |
|
Return
to Contents
Parameters
in Each Section of a
Pears Database Description Configuration File
[dB]
Section
- Required
The [dB] section
contains general parameters that do things like set the size of the database
file, declare the format of the input data, and provide a name for the
database.
Parameter |
Description |
|
The blocksize
sets the capacity of the .pdb file in terms of bytes. The default
and highly recommended setting is 16384 bytes.
|
|
InputRecordType
identifies the format of the records in the input data file. Based
on its value, the appropriate record handler is selected and executed
by the main RecordHandler class to update the database.
Example: |
inputRecordType=USMARC |
Note: |
Both
the inputRecordType parameter and its possible values are case
sensitive. Other possible values include "SGML" and
"TEXT." |
|
|
The
Name parameter provides the internal name for the database. If it
is not used, the registered name in SSDOT is assumed by default. The
internal name is used to refer to the database in other configuration
files such as the ObiTopics.ini file. |
|
Use
this parameter to provide the full path to the .pdb file, if it is
located somewhere other than in the same directory as the database
description configuration file. If you don't use this parameter, SSDOT
for Pears or Bartlett assumes that the .pdb file is located in
the same directory as the database description configuration file. |
|
This parameter
identifies the index that contains the unique ID for each record
in the database.
Note: |
Although
it is optional to create one, a unique record ID must exist
for you to replace or delete a record in a Pears database. |
|
Return
to Contents
[LockServer]
Section
- Optional
The [LockServer]
section contains the parameters necessary to associate the database with
a locking mechanism so that multiple Bartletts cannot update the database
at the same time.
Parameter |
Description |
|
This is the
domain name or IP address of the host on which the lock server resides.
Customarily, the lock server is on the same host as the database.
Example: |
Host=cypress.dev.oclc.org
|
|
|
This is the
port at which the lock server is currently running.
|
[Handleinput_record_type]
Section
- Optional for certain input record types
Under
the [Handleinput_record_type]
section, you may define how records of a certain type are to be handled
by the RecordHandler class. For example, you can use a special filterclass
parameter and its value under this section to extract only those records
that meet very specific criteria.
Note: |
Because the
parameters that can be declared under the [Handleinput_record_type]
section may vary according to the record handler class, only those
common to all the classes are listed in the table below. For a more
complete list, see the document Pears
Record Handlers. |
Parameter |
Description |
|
This parameter
references a special class that extends the general Record
Handler class to extract only those records that satisfy certain
user-defined requirements.
Example: |
|
[handleusmarc]
filterclass=ORG.oclc.RecordHandler./
BERCorporateAuthorityOnlyFilter
|
Note: |
To find
out more about record handlers and their extended filter classes,
see Pears Record Handlers. |
|
LocalByteConverter |
Optional |
|
This parameter
references the character set to be used by the record
handler to convert certain characters in a record to Unicode
before the record is committed to the database.
Example: |
|
[handleDB]
LocalByteConverter=USM94
|
|
[global_stopwords]
Section
- Optional
You
may create a global list of stopwords to be discarded from each index
by listing them under a [global_stopwords] section in the database
description configuration file. This section contains most of the same
parameters used for an index definition, but the settings for these parameters
are unique.
Parameter |
Description |
|
A "0"
value for the index parameter tells StopwordEnforcer
to essentially ignore this index and that no terms are to be extracted
for it; rather, terms are to be pulled out of the other indexes
after they have been built. A "0" is the only recognized
value for this parameter when it's used to define a stopwords list.
|
|
The tagpath
parameter is not recurring for a stopwords list. Since no
terms are created, no BER field paths are required. The tagpath=none
setting works in concert with the index=0 setting (see above) to
ensure that the designated terms are removed from all indexes. "None"
is the only recognized value for this parameter when it is used
to define a stopwords list.
|
|
The routine
parameter references the specific Java indexing
class that searches all the indexes and discards the designated
terms. It does not generate any index terms itself.
Example: |
|
routine=ORG.oclc.pears.IndexRoutines.StopwordEnforcer |
|
stopwordn |
Required
Recurring
|
|
The stopword
parameter identifies the specific term that is to be removed from
an index. Since multiple stopwords may exist, the stopword parameter
may occur an unlimited number of times under a [stopwords] section
in a database description configuration file.
Example: |
stopword*=a
stopword*=and
stopword*=the |
|
[docrulen]
Section
- Optional
As
with local Newton databases, you may
define restrictors for local Pears databases.
To define a restrictor, you need to set parameters under a [docrule] section
in the database description configuration file. You must set up a new
[docrule] section for each restrictor that you wish to use for a database.
Parameter |
Description |
|
This is the
unique identification number of the restrictor index. No other index
or restrictor definition may have this same number.
|
|
The routine
parameter references the specific class that will be used to create
the restrictor.
Example: |
routine=ORG.oclc.pears.Bartlett.termrest |
|
|
Parameters
identifies the exact terms that the search engine uses to retrieve
records from the database. If not declared, the terms are assumed
to be numbers, which creates a numeric restrictor by default.
Example: |
parameters=English
German French |
|
Return
to Contents
[Partitions]
Section
- Optional
A
Pears database can be partitioned by
record content so that records from a single input stream are automatically
divided among multiple Pears databases.
To accomplish this, you need to set up a special partitioned database
description configuration file and include a [Partitions] section in it.
Under this section, you must include references to the individual database
description configuration files that make up the parts of the partitioned
database. Additionally, the database
configuration file for an each partition must include the "filename"
parameter and an accompanying file reference in its [database] section.
Parameter |
Description |
|
This parameter
refers to the class that determines into which database a record
should be deposited.
Example: |
|
class=ORG.oclc.pears.Bartlett.\
PartitionByNumericFieldValue
|
|
Database* |
Required
Recurring |
|
Each instance
of the Database parameter references an individual database description
configuration file for one part of a partitioned database. There
should be as many occurrences of the Database parameter as there
are parts to the database.
Example: |
Database*=marc4desc.ini
Database*=marc5desc.ini
Database*=marc6desc.ini |
|
[indexfilen]
Section
- Optional
You
can break a large Pears database into
several files by defining an individual [indexfile] section for each index
that is a part of a partition. Under this section, you identify the directory
path to a .pdb file where specific index terms are to be stored after
they are extracted from records. You then associate an index with a partition
by using its unique ID (see [index_definition]
Section).
Parameter |
Description |
|
The filename
parameter refers to a dedicated region in the .pdb file where the
index terms are to be stored.
Example: |
|
filename=/home/dbbuilder/dbs/marc/marc.index1.pdb
|
|
|
The indexID
corresponds to a unique index number
for an existing index. This parameter links the index to a partition
and ensures that the terms from the index are stored at the location
designated by the filename parameter.
|
[index_definition]
Section
- Optional
Each
index to be created for a Pears database
must have its own definition in a database description configuration file.
There is no limit on the number of indexes for a Pears
database. You may define as few or as many indexes as you like.
Parameter |
Description |
|
This
is the unique identifier for the index. No other index may have this
same value for its index parameter. When referring to an index in
another configuration file, use its index number. |
tagpath* |
Required
Recurring
|
|
The tagpath
parameter points to a specific field in the BER record from which
terms will be extracted in order to build an index. More than one
field is often used to build an index, so more than one tagpath
is allowed under an index definition section.
Example: |
tagpath=245/01
tagpath=246/01 |
|
|
The routine
parameter references the specific Java indexing
class that will be used to extract terms and build the index.
Example: |
routine=ORG.oclc.pears.IndexRoutines.Words |
|
OccurrenceRoutine |
Optional |
|
Use this parameter
to create an index that supports term adjacency for a keyword index.
Currently, the only acceptable value for this parameter is the class
ORG.oclc.pears.Bartlett.wordfield.
Example: |
|
OccurrenceRoutine=ORG.oclc.pears.Bartless.wordfield |
Note: |
Do
not use this parameter for phrase index definitions or for single-term
indexes. To do so wastes space in the database. |
|
Return
to Contents
Example Pears Database Description Configuration
File
What follows is
a representative example of a Pears database
description configuration file that contains many of the sections and
parameters outlined above.
[bartlett]
badRecordMessageFile=<WebZ_root>/dbbuilder/dbs/marcus/badRecordMessage.txt
InputRecordType = USMARC
RecordIDIndex = 17
Name = Marcus
filename = <WebZ_root>/dbbuilder/dbs/Marcus/marcus.pdb
[handleusmarc]
filterclass = ORG.oclc.RecordHandler.USMARCCorporateAuthorityOnlyFilter
[lockserver]
Host = orc
Port = 11110
[stopwords]
index = 0
tagpath = none
routine = ORG.oclc.pears.IndexRoutines.StopwordEnforcer
stopword* = a
stopword* = and
stopword* = the
[titlewords]
index = 1
tagpath* = 100/20
tagpath* = 110/20
tagpath* = 130/1
tagpath* = 240/1
tagpath* = 242/1
tagpath* = 245/1
tagpath* = 245/2
tagpath* = 245/3
tagpath* = 245/8
tagpath* = 245/14
tagpath* = 245/16
tagpath* = 246/1
tagpath* = 440/1
tagpath* = 600/1
tagpath* = 600/2
tagpath* = 600/3
tagpath* = 600/4
tagpath* = 600/17
tagpath* = 600/20
tagpath* = 600/24
tagpath* = 600/25
tagpath* = 600/26
tagpath* = 610/1
tagpath* = 610/2
tagpath* = 610/3
tagpath* = 610/4
tagpath* = 610/14
tagpath* = 610/20
tagpath* = 610/24
tagpath* = 610/25
tagpath* = 610/26
tagpath* = 630/1
tagpath* = 630/4
tagpath* = 630/24
tagpath* = 630/25
tagpath* = 630/26
tagpath* = 650/1
tagpath* = 650/24
routine = ORG.oclc.pears.IndexRoutines.Words
occurenceroutine = ORG.oclc.pears.Bartlett.wordfield
[subjectcategorycodes]
index = 2
tagpath1 = 72/1
routine = ORG.oclc.pears.IndexRoutines.Words
[authorwords]
index = 3
tagpath* = 100/1
tagpath* = 100/2
tagpath* = 100/3
tagpath* = 100/4
tagpath* = 100/5
tagpath* = 100/17
tagpath* = 110/1
tagpath* = 110/2
tagpath* = 110/3
tagpath* = 110/4
tagpath* = 110/14
tagpath* = 700/1
tagpath* = 700/2
tagpath* = 700/3
tagpath* = 700/4
tagpath* = 700/5
tagpath* = 700/17
tagpath* = 710/1
tagpath* = 710/2
tagpath* = 710/3
tagpath* = 710/4
tagpath* = 710/14
tagpath* = 773/1
routine = ORG.oclc.pears.IndexRoutines.Words
occurenceroutine = ORG.oclc.pears.Bartlett.wordfield
[titlephrasewithoutnonfilingindicators]
index = 4
routine = ORG.oclc.pears.IndexRoutines.Phrase
extratrimchars = /
tagpath* = 100/20
tagpath* = 110/20
tagpath* = 246/1
tagpath* = 600/1
tagpath* = 600/2
tagpath* = 600/3
tagpath* = 600/4
tagpath* = 600/17
tagpath* = 600/20
tagpath* = 600/24
tagpath* = 600/25
tagpath* = 600/26
tagpath* = 610/1
tagpath* = 610/2
tagpath* = 610/3
tagpath* = 610/4
tagpath* = 610/14
tagpath* = 610/20
tagpath* = 610/24
tagpath* = 610/25
tagpath* = 610/26
tagpath* = 650/1
tagpath* = 650/24
tagpath* = 650/25
tagpath* = 650/26
tagpath* = 651/1
tagpath* = 651/24
tagpath* = 651/25
tagpath* = 651/26
tagpath* = 653/1
tagpath* = 655/1
tagpath* = 700/20
tagpath* = 710/20
titlePhraseWithNonFilingIndicator2
[tpwnf2]
index = 4
routine = ORG.oclc.pears.IndexRoutines.Phrase
nonFilingIndicator2 = true
extratrimchars = /
tagpath* = 240/1
tagpath* = 242/1
tagpath* = 245/1
tagpath* = 245/2
tagpath* = 245/3
tagpath* = 245/8
tagpath* = 245/14
tagpath* = 245/16
tagpath* = 440/1
[format]
index = 5
routine = ORG.oclc.pears.IndexRoutines.MarcFormat
tagpath = 0
startoffset = 1
[typeofmaterial]
index = 5
routine = ORG.oclc.pears.IndexRoutines.MarcTypeOfMaterial
tagpath = 6
[publisherwords]
index = 6
tagpath* = 260/2
tagpath* = 949/3
routine = ORG.oclc.pears.IndexRoutines.Words
occurenceroutine = ORG.oclc.pears.Bartlett.wordfield
[language]
index = 7
tagpath = 8
routine = ORG.oclc.pears.IndexRoutines.MarcLanguage
startoffset = 35
maxlength = 3
[lccardnumber]
index = 11
tagpath* = 10/1
tagpath* = 20/1
tagpath* = 86/1
tagpath* = 440/24
tagpath* = 773/24
tagpath* = 773/26
tagpath* = 949/1
routine = ORG.oclc.pears.IndexRoutines.Words
[othertitle]
index = 12
tagpath* = 773/7
tagpath* = 773/16
tagpath* = 773/20
routine = ORG.oclc.pears.IndexRoutines.Words
occurenceroutine = ORG.oclc.pears.Bartlett.wordfield
[subjectwords]
index = 14
tagpath* = 600/1
tagpath* = 600/2
tagpath* = 600/3
tagpath* = 600/4
tagpath* = 600/17
tagpath* = 600/20
tagpath* = 600/24
tagpath* = 600/25
tagpath* = 600/26
tagpath* = 610/1
tagpath* = 610/2
tagpath* = 610/3
tagpath* = 610/4
tagpath* = 610/14
tagpath* = 610/20
tagpath* = 610/24
tagpath* = 610/25
tagpath* = 610/26
tagpath* = 630/1
tagpath* = 630/4
tagpath* = 630/24
tagpath* = 630/25
tagpath* = 630/26
tagpath* = 650/1
tagpath* = 650/24
tagpath* = 650/25
tagpath* = 650/26
tagpath* = 651/1
tagpath* = 651/24
tagpath* = 651/25
tagpath* = 651/26
tagpath* = 653/1
tagpath* = 655/1
routine = ORG.oclc.pears.IndexRoutines.Words
occurenceroutine = ORG.oclc.pears.Bartlett.wordfield
[basicindex]
index = 16
tagpath* = 100/20
tagpath* = 110/20
tagpath* = 130/1
tagpath* = 240/1
tagpath* = 242/1
tagpath* = 245/1
tagpath* = 245/2
tagpath* = 245/3
tagpath* = 245/8
tagpath* = 245/14
tagpath* = 245/16
tagpath* = 246/1
tagpath* = 440/1
tagpath* = 700/20
tagpath* = 710/20
tagpath* = 730/1
tagpath* = 730/7
tagpath* = 740/1
routine = ORG.oclc.pears.IndexRoutines.Words
occurenceroutine = ORG.oclc.pears.Bartlett.wordfield
[controlnumber]
index = 17
tagpath1 = 1
routine = ORG.oclc.pears.IndexRoutines.Words
|
Return
to Contents
See Also
Pears Record Handlers
Pears Indexing Routines
Creating a New Pears Database
|