Main -> Documentation -> Database Builder – Newton -> Open SiteSearch Database Builder Utility Programs -> The Pippin Utility

The Pippin Utility

The pippin utility reads ASN.1/BER encoded records from an input file, extracts terms from the various fields of the database records according to rules in the database description for the rome utility program, stores input records in the HEDR file, and stores data in the HDIR file. The terms extracted from the record are written to the New Index and Postings (NIP) file. The record itself is either added to, replaced, or deleted from the HEDR file, depending upon the defined status of the ASN.1/BER record. As the records are written to or deleted from ApprovedHEDR file,12/31/98 errors occur while processing a record, messages are written to the standard output, and the record is written to the error file.

When pippin processes a record, it uses the rules in the database description (.dsc) file to extract words and phrases, called terms, from the primitive data in the record. Each term is written to the NIP file along with information about which input record the term was from and its location in that record. Information in the NIP file is later used by rome to create the POST and INDX files.

Pippin reads indexing specs from the first block of the HEDR file, which contains the binary encoded .dsc file (see step 2 in the database build process). An optional override to that is to specify the -c parameter with a separate .dsc file (see the -cdbdescfile variable option below).

Important Note: This utility completes step 3 of the database build process. It is strongly recommended that you use the SiteSearch Database Operations Tool (SSDOT) to build your databases. SSDOT, a component of the SiteSearch system, has a menu-driven interface that calls the appropriate database utility programs to execute the various build processes, including the tasks performed by the pippin utility.

Syntax

pippin -hheadername -ddirectoryname [-xindexfile] -upostfile -vpdirfile -nnipfile -iinputfile
[-eerrorfile] [-mmax] [-sskipnum] [-ztablefile] [-cdescname] [-prestart_filename] [-gsgmltagfile]
[-fr] [-fo] [-fx] [-fs] [-fv] [-fz] [-h]

Parameter

Description

-hheadername

Specifies the name of the HEDR file to be used, where headername is the file name. This file is generated by the initdb utility.

-ddirectoryname

Specifies the name of the HDIR file to be used, where directoryname is the file name. This file is generated by the initdb utility.

-xindexfile

Locates the delete and replace record numbers using the unique record keys in the headername. The indexfile must be specified if there are delete or replace records in the inputfile.

-upostfile
(beginning with SiteSearch 4.1.0)
Specifies the name of the POST file to be used, where postfile is the file name. This file is generated by the initdb utility.
-vpdirfile
(beginning with SiteSearch 4.1.0)
Specifies the name of the PDIR file to be used, where pdirfile is the file name. This file is generated by the initdb utility.

-nnipfile

Defines the target location for the NIP files generated by pippin, where nipfile is a file name for output.

-iinputfile

Specifies the ASN.1/BER file pippin uses to generate the NIP files, where inputfile is the ASN.1/BER file name. These records can be created by using the sgmlconv or marcconv utility programs to convert your SGML or MARC source data into ASN.1/BER format. If you do not have SGML or MARC source data, you will need to develop a separate conversion program to convert your source data files into ASN.1/BER format.

-eerrorfile

Specifies the file to write any errors to when pippin is executed, where errorfile is the file name for output.

-mmax

Indicates the maximum number of records, or value of max, to convert to ASN.1/BER format.

-sskipnum

Indicates the number of records to be skipped, or the value of skipnum, in the inputfile before beginning to convert records to ASN.1/BER format.

-ztablefile

Defines characters to be removed from the data, characters to be replaced with other characters, and delimiters used to separate words in an index. For more information about this parameter, refer to the Pippin Tablefile Parameter.

-cdescname

Executes the initdb utility to update the database description (.dsc) file, where descname is the .dsc file name.

Note:

Use caution when modifying the .dsc file of an already existing database. Inconsistent search results can occur because of changed rules for building the database.

-prestart_filename

Creates the restart_filename in the event of an abnormal termination of pippin. The restart_filename can be used to attempt a restart of the pippin utility.

-gsgmltagfile

Specifies the name and path of a file (sgmltagfile) that contains a list of SGML tags NOT to be indexed. A useful convention is to name the sgmltagfile as db_name.tag, where db_name is the database's name.

The input file must contain one tag per line, with each line beginning at column 1. Each line should contain the character(s) in between the brackets in a SGML tag that should not be indexed. For example, to exclude the italic (<I> and </I>) and bold tags (<B> and </B>) from indexing, the sgmltagfile would contain:


I

/I

B

/B

You must use the -g flag in conjunction with the -fs flag to activate this function. The -fs flag tells pippin to look for an sgmltagfile. The -g flag specifies the name and location of the sgmltagfile.

-fr

Adds only records that are not already in the database.

-fo
(beginning with SiteSearch 4.1.0)
Adds only records that already exist in the database. This is the reverse of the -fr flag.

-fx

Puts pippin in input only mode, which is helpful for debugging problems with your inputfile.

-fs

Loads SGML tag tables. You must use the -fs flag in conjunction with the -g flag. The -fs flag tells pippin to look for an sgmltagfile. The -g flag specifies the name and location of the sgmltagfile.

-fv

Creates print status messages as output while pippin is executing.

-fz
(beginning with SiteSearch 4.1.0)
Indicates that there is no postings directory.

-h

Displays the utility's usage statement.

See Also

Pippin Tablefile Parameter
An Introduction to Database Files
An Explanation of the Database Build Process
The Initdb Utility
The Rome Utility
Creating a Database Description (.dsc) File
Open SiteSearch Database Builder Utility Programs
SiteSearch Database Operations Tool (SSDOT)
ASN.1/BER Input Record Format (available from Open SiteSearch technical support)


[Main][Documentation][Support][Technical Reference][Community][Glossary][Search]

Last Modified: