Pears Record Handlers

Main -> Documentation -> Database Builder – Pears -> Record Handlers

Pears Record Handlers

Contents

Introduction
Document Conventions
What Is a Record Handler?
Record Handlers and Input Record Types
RecordHandler Class Diagram
Record Handler Reference

Introduction

This document is a reference to the Pears record handlers shipped with SiteSearch 4.2.0. It defines a record handler, explains the relationship between an input record type and a record handler, contains a class diagram for the RecordHandler class package, and lists the required and optional parameters for each record handler.

Document Conventions

<WebZ_root> is the location of your WebZ environment.

What is a Record Handler?

In Pears, a record handler is a Java class that:

converts input records to BER format when you add them to a Pears database
converts BER records stored in a Pears database to a specified record format and calls internal classes that add the records to an output file

Pears record handlers shipped with SiteSearch 4.2.0 are part of the ORG.oclc.RecordHandler class package. (Except for ORG.oclc.RecordHandler.HandleDB, the record handlers reside in the pears.jar file in <WebZ_root>/classes/lib. HandleDB resides in the <WebZ_root>/classes/lib/SS4_2_0.jar file.)

You can extend these record handlers to work with other input formats or to customize in other ways. If you create custom record handlers, a recommended practice is to put them in your own class package.

Each record handler is associated with an input record type. By convention, record handler class names are Handleinput_record_type, where input_record_type is the record type that the record handle is designed to convert to BER format.

Return to Contents

Record Handlers and Input Record Types

inputRecordType Variable in Database Description Configuration File

You specify an input record type with the inputRecordType variable in the [DB] section of the database description configuration file, based on the class name of the associated record handler, like this:

Use this format ...	for record handlers ...
inputRecordType = SGML (or another record type)	in the ORG.oclc.RecordHandler class package.
inputRecordType = fully qualified class name	for record handlers you create and add to a custom class package.

The Pears Bartlett utility looks for a period (.) in the value of the inputRecordType variable. If Bartlett does not find a period in this value, it prepends the string "ORG.oclc.RecordHandler.Handle" to the value and calls the record handler with this name. If Bartlett finds a period, it assumes that the inputRecordType value is a fully qualified class name. In the first example shown above, Bartlett would call ORG.oclc.RecordHandler.HandleSGML. In the second example, Bartlett would call the class indicated.

If you run Bartlett from the command line instead of using SSDOT for Pears, you can also specify the inputRecordType with Bartlett's -c parameter.

Note:

The inputRecordType value is case-sensitive. If you specified Sgml in the first example above, Bartlett would try to call a nonexistent class, ORG.oclc.RecordHandler.HandleSgml.

[Handleinput_record_type] in Database Description Configuration File

By convention, the record handler obtains its input parameters from a section in the database description configuration file named [Handleinput_record_type], which matches its class name. If you create your own record handlers, you may wish to establish your own conventions for naming this section.

See Record Handler Reference for the optional and required variables in this section for each record handler.

Return to Contents

RecordHandler Class Diagram

The following diagram shows the relationships among the RecordHandler classes. Click any of the class names on the diagram to jump to a description of the class and its required and optional parameters.

Record Handler class diagram

Return to Contents

Record Handler Reference

This section contains reference information for these classes in the ORG.oclc.RecordHandler class package:

RecordHandler
HandleSGML
HandleDelimited
HandleMARC
HandleUSMARC
HandleChinaMarc

HandleUnimarc
HandleDB
HandleBER
HandleTransactionJournal
HandlePDB

For each record handler, it describes the variables (its input parameters) that you specify in its section of the database description configuration file. It indicates whether each variable is optional or required, whether it applies to importing and/or exporting records, and whether the variable has a default value. There are also examples of typical sections for several record handlers.

These tables are quick references for creating a record handler section in a database description configuration file. For more technical information about a record handler, see its Javadoc.

Return to Contents

RecordHandler

RecordHandler is an abstract class that contains a main method for batch data conversion. Other record handler classes extend RecordHandler to enable them to read records from an input data file and convert them to DataDir trees, which eventually become converted to BER records.

The following parameters are available to classes that extend RecordHandler.

Parameter

Description

filterClass

Optional for:
IMPORT

Fully qualified class name of a record filter to use to include or exclude records from the input file from being added to the database.

byteConverter

Optional for:
IMPORT

Name of a Java standard ByteToCharConverter. The default is the standard local character set.

charConverter

Optional for:
EXPORT

Name of a Java standard CharToByteConverter. The default is the standard local character set.

localByteConverter

Optional for:
IMPORT

Name of an OCLC localByteConverter. The only localByteConverter defined is USM94 for USMARC records. The default is the standard local character set.

localCharConverter

Optional for:
EXPORT

Name of an OCLC localCharConverter. The default is the standard local character set.

doNotConvert*

Optional for:
IMPORT

Tagpath to a field containing binary data that should not be
converted to Unicode. Specify multiple fields with each tagpath on a separate line with a different doNotConvert* variable.

Running RecordHandler for Test Purposes

You can run RecordHandler from the command line to test a sample of records you want to convert before committing them to a database or to test custom record handler classes that you create. You can run the base RecordHandler class and specify a subordinate class to convert a sample of records to BER format, using these parameters:

This parameter ... specifies the ...

-c

RecordHandler class to perform the data conversion. Use the conventions shown for the inputRecordType variable to specify the class name.

-i

name of an input file that contains data records in a specified format.

-o

name of an output file that contains converted records in BER format.

-d

database description configuration file with information needed for the data conversion, if necessary.

-n

number of records to convert.

Example 1 – Convert USMARC records from the file example1.data:

java ORG.oclc.RecordHandler.RecordHandler -cUSMARC -iexample1.data \
-oexample1.ber

Example 2 – Convert the only first 100 USMARC records from the file example1.data:

Java ORG.oclc.RecordHandler.RecordHandler -cUSMARC -iexample1.data \
-oexample1.ber -n100

Example 3 – Convert 1000 records from an existing Newton database to Pears database format:

Java ORG.oclc.RecordHandler.RecordHandler -cDB -inewthedr.db \
-otest.ber -dtestdesc.ini

testdesc.ini contains only one section, for the character converter:

[HandleDB]
byteConverter = ISO-8859-1

Note:

In each example, the backslash (\) characters are included only for readability. Do not insert them into the command line.

Return to Record Handler List

Return to Contents

HandleSGML

Extends:

RecordHandler

HandleSGML allows you to import or export SGML or XML records. The records can have a hierarchical structure with multiple levels of tags and/or repeating fields. Each record requires a beginning and ending tag, which you designate in a .tags file. The .tags file maps the fields in the SGML or XML records to the BER tag paths that Bartlett uses to store records in the Pears database.

Parameter

Description

Parameters inherited from RecordHandler

See RecordHandler.

tagsFile

Required for:
IMPORT, EXPORT

Full path name of the .tags file that maps the fields in the input file to their BER tag paths in a physical database record.

AllowNewTags

Optional for:
IMPORT

Indicates that whether to process data in fields that are not listed in the .tags file. Possible values are:

TRUE Add the data from fields not listed in the .tags file to the database record; add an entry for the new tag to the .tags file.

FALSE
Ignore any data from any fields not listed in the .tags file. (Default)

Setting AllowNewTags=FALSE allows you to use an input file with data in fields that you don't want to add to the database. This affects only the records added to the database and does not modify the input file.

ignoreRecoverableErrors

Optional for:
IMPORT

Indicates whether to continue to process records from the input file (for importing records) or database (for exporting records) if recoverable error conditions occur. Possible values are:

TRUE Continue processing records if recoverable errors occur.

FALSE
Do not ignore recoverable errors. Throw an exception and abort the process. Write bad records to the file specified in the badRecordMessageFile variable in the [Bartlett] section of the database description configuration file. (Default)

combineFields

Optional for:
EXPORT

Indicates whether to combine duplicate non-terminal tags into a single tag when exporting records.

TRUE
Combine duplicate non-terminal tags into a single tag. For example, change:

<rec>
   <tag1>
     <tag2>
       data_value
     </tag2>
   </tag1>
   <tag1>
     <tag3>
       data_value
     </tag3>
   </tag1>
</rec>

to this:

<rec>
   <tag1>
     <tag2>
       data_value
     </tag2>
     <tag3>
       data_value
     </tag3>
   </tag1>
</rec>

FALSE
Do not combine duplicate non-terminal tags into a single tag. (Default)

newLinesAfterEveryField

Optional for:
EXPORT

Indicates whether to add a newline character after each closing tag when exporting records. This makes the output more readable by staff members, but may decrease export speed somewhat for a large database. Possible values are:

TRUE Add the newline character after each closing tag. This overrides newLinesAroundRecord=TRUE.

FALSE
Do not add newlines after each closing tags. This makes the entire record one long text string unbroken by line breaks. (Default)

newlinesAroundRecord

Optional for:
EXPORT

Indicates whether to add a newline character before and after each record during a record export, using tag designated as the recordTag in the .tags file to determine and start and end of each record. Possible values are:

TRUE Add a newline character before and after each record when exporting records. (Default)

FALSE
Do not add newline characters around each record when exporting records.

Example

[handlesgml]
tagsFile = <WebZ_root>/dbbuilder/dbs/demosgml/demosgml.tags
newLinesAroundRecord = false
newLinesAfterEveryField = true
ignoreMissingTags = true

Return to Record Handler List

Return to Contents

HandleDelimited

Extends:

HandleSGML

The HandleDelimited record handler allows you to add data to a Pears database from a delimited text file. Many database and spreadsheet applications allow you to export data to a delimited text file. The default delimiter is a tab, but you can specify another delimiter if necessary, such as a comma.

You can also subsequently export database records imported in delimited text format to an output file in delimited text format.

For importing records, HandleDelimited uses a .tags file to relate fields in the input records to the BER tag paths used to store data from each field in the record. If you do not have a .tags file, HandleDelimited creates a basic .tags file that numbers the fields field001, field002, and so on. The corresponding BER tag paths are assigned sequentially starting at 10.

For exporting records, HandleDelimited requires a Tags file to determine how to create fields in the export file.

Variable

Description

Parameters inherited from RecordHandler

See RecordHandler.

Parameters inherited from HandleSGML

See HandleSGML.

TrimWhitespace

Optional for:
IMPORT

Indicates whether to trim leading white space from fields in input records when importing records. Possible values are:

TRUE Trim leading white space from each field when adding records to the database.

FALSE
Do not trim leading white space. (Default)

Removing the white space may result in records that are easier to format for display. However, if you subsequently export these records, they no longer contain the white space contained in the original records.

EscapeWithBackslashes

Optional for:
IMPORT

Indicates that the data contains a backslash to delimit fields that themselves contain the delimiter used to separate fields in the data file (such as \John Doe, Jane Doe\, in a comma-delimited file.

TRUE
The backslash (\) is an escape character in the input data file.

FALSE
The backslash is not an escape character in the input data file. (Default)

EscapeWithQuotes

Optional for:
IMPORT

Indicates that the data contains double quotes (") to delimit fields that themselves contain the delimiter used to separate fields in the data file (such as "John Doe, Jane Doe", in a comma-delimited file.

TRUE		The double quote (") is an escape character in the input data file.
FALSE		The double quote is not an escape character in the input data file. (Default)

Delimiters

Optional for:
IMPORT, EXPORT

Character(s) used to separate fields within a record in the input file (import) or output file (export). The default value is a tab (\t).

repeatDelimiter

Optional for:
IMPORT, EXPORT

Character(s) used to separate repeating data elements in a single field within a record in the input file (import) or output file (export). The default value is a semicolon (;).

firstRecordHasTags

Optional for:
IMPORT

Indicates whether the first record in the input file contains field labels for the data records in the rest of the file.

TRUE The first record contains field labels. HandleDelimited uses these labels to construct field labels in the Tags file if you do not provide your own Tags file.

FALSE The first record does not contain field labels. It contains a data record. (Default)

terminateRecordWithNewLineOnly

Optional for:
EXPORT

Indicates whether to add only a newline character at the end each record when exporting records to a file. Possible values are:

TRUE Add only a newline character to the end of each record.

FALSE
Add both a carriage return and a newline character to the end of each record. (Default)

CollapseTreeOnExport

Optional for:
EXPORT

Indicates whether to collapse a record's tree structure by combining fields in a record with common tag paths (such as a repeating field) and records with a common base tag (such as 102/1001/1001 and 102/1002/1001) into a single field. Possible values are:

TRUE Collapse the tree structure, as described above. If CollapseTreeOnExport=TRUE, HandleSGML uses the value of the repeatDelimiter variable as the delimiter for multiple values in a single field or a semicolon (;) if you do not specify a value for repeatDelimiter.

FALSE
Do not collapse the tree structure. (Default)

Collapsing the tree occurs only in the records contained in the export output file. It has no effect on the data stored in the database.

Example

[HandleDelimited]
tagsFile = <WebZ_root>/dbbuilder/dbs/demodelimited/demodelimited.tags
delimiters =\t
repeatDelimiter = ;
FirstRecordHasTags=true
EscapeWithQuotes = true
terminateRecordWithNewLineOnly = false

Return to Record Handler List

Return to Contents

HandleMARC, HandleChinaMarc, HandleUSMARC, HandleUnimarc

Extends:

See the class diagram for inheritance among these classes.

These classes allow you to import and export records in standard MARC, ChinaMarc, US MARC, and Unimarc, respectively. Each class has the same input parameters, as shown below.

Parameter

Description

Parameters inherited from RecordHandler

See RecordHandler.

ignoreRecoverableErrors

Optional for:
IMPORT

Indicates whether to continue to process records from the input file (for importing records) or database (for exporting records) if recoverable error conditions occur. For MARC records, errors occur primarily when converting diacritics to Unicode characters. Possible values are:

TRUE Continue processing records if recoverable errors occur.

FALSE
Do not ignore recoverable errors. Throw an exception and abort the process. Write bad records to the file specified in the badRecordMessageFile variable in the [Bartlett] section of the database description configuration file. (Default)

deleteChars

Optional for:
IMPORT, EXPORT

Specifies characters, that if found in the Record Status field in the MARC Leader, mean that the record should be deleted. The default is to ignore the Record Status field.

Return to Record Handler List

Return to Contents

HandleBER

Extends:

RecordHandler

HandleBER adds BER records to a Pears database.

Parameter	Description
Parameters inherited from RecordHandler.	See RecordHandler.

Return to Record Handler List

Return to Contents

HandleDB

Extends:

HandleBER

The HandleDB record handler converts records from a Newton database (.db) to the format required by a Pears database (.pdb) by reading records from the Newton HEDR database file (dbnamehedr.db). HandleDB expects to find the Newton HDIR file in the same directory as the HEDR file and to have the same dbname characters in its name (that is, dbnamehdir.db).

Parameter	Description
Parameters inherited from RecordHandler.	See RecordHandler.

Return to Record Handler List

Return to Contents

HandleTransactionJournal

Extends:

HandleBER

HandleTransactionJournal adds transactions stored in a Pears journal file to a Pears database. You can use this record handler for database recovery, when you restore a database backup and then add transactions from one or more journal files to bring the database back up to date.

Parameter	Description
Parameters inherited from RecordHandler	See RecordHandler.

Return to Record Handler List

Return to Contents

HandlePDB

Extends:

RecordHandler

HandlePDB extracts records from an existing Pears database. SSDOT for Pears uses HandlePDB during database reorganization. You can also use it to extract records from a Pears database and store them in an external file in BER format.

Parameter	Description
Parameters inherited from RecordHandler	See RecordHandler.

Return to Record Handler List

Return to Contents

[Main][Documentation][Support][Technical Reference][Community][Glossary][Search]

Last Modified: