|
The Pippin Utility
The pippin utility reads ASN.1/BER encoded records from an input file, extracts terms from the various fields of the database records according to rules in the database description for the rome utility program, stores input records in the HEDR file, and stores data in the HDIR file. The terms extracted from the record are written to the New Index and Postings (NIP) file. The record itself is either added to, replaced, or deleted from the HEDR file, depending upon the defined status of the ASN.1/BER record. As the records are written to or deleted from ApprovedHEDR file,12/31/98 errors occur while processing a record, messages are written to the standard output, and the record is written to the error file.
When pippin processes a record, it uses the rules in the database description (.dsc) file to extract words and phrases, called terms, from the primitive data in the record. Each term is written to the NIP file along with information about which input record the term was from and its location in that record. Information in the NIP file is later used by rome to create the POST and INDX files.
Pippin reads indexing specs from the first block of the HEDR file, which contains the binary encoded .dsc file (see step 2 in the database build process). An optional override to that is to specify the -c parameter with a separate .dsc file (see the -cdbdescfile variable option below).
Important Note: This utility completes step 3 of the database build process. It is strongly recommended that you use the SiteSearch Database Operations Tool (SSDOT) to build your databases. SSDOT, a component of the SiteSearch system, has a menu-driven interface that calls the appropriate database utility programs to execute the various build processes, including the tasks performed by the pippin utility.
Syntax
pippin
-hheadername -ddirectoryname
[-xindexfile] -upostfile -vpdirfile
-nnipfile -iinputfile
[-eerrorfile] [-mmax] [-sskipnum]
[-ztablefile] [-cdescname] [-prestart_filename]
[-gsgmltagfile]
[-fr] [-fo] [-fx] [-fs] [-fv] [-fz]
[-h]
Parameter
|
Description
|
-hheadername
|
Specifies
the name of the HEDR file to be used, where headername
is the file name. This file is generated by the initdb
utility.
|
-ddirectoryname
|
Specifies
the name of the HDIR file to be used, where directoryname
is the file name. This file is generated by the initdb
utility.
|
-xindexfile
|
Locates
the delete and replace record numbers using the unique record keys
in the headername. The indexfile must
be specified if there are delete or replace records in the inputfile.
|
-upostfile
(beginning
with SiteSearch 4.1.0) |
Specifies the
name of the POST file to be used, where postfile is
the file name. This
file is generated by the initdb utility. |
-vpdirfile
(beginning with SiteSearch 4.1.0) |
Specifies the
name of the PDIR file to be used, where pdirfile is
the file name. This
file is generated by the initdb utility. |
-nnipfile
|
Defines
the target location for the NIP files generated by pippin,
where nipfile is a file name for output.
|
-iinputfile
|
Specifies
the ASN.1/BER file pippin uses to generate the NIP files,
where inputfile is the ASN.1/BER file name. These
records can be created by using the sgmlconv
or marcconv utility programs to
convert your SGML or MARC source data into ASN.1/BER format. If
you do not have SGML or MARC source data, you will need to develop
a separate conversion program to convert your source data files
into ASN.1/BER format.
|
-eerrorfile
|
Specifies
the file to write any errors to when pippin is executed,
where errorfile is the file name for output.
|
-mmax
|
Indicates
the maximum number of records, or value of max, to
convert to ASN.1/BER format.
|
-sskipnum
|
Indicates
the number of records to be skipped, or the value of skipnum,
in the inputfile before beginning to convert records
to ASN.1/BER format.
|
-ztablefile
|
Defines
characters to be removed from the data, characters to be replaced
with other characters, and delimiters used to separate words in
an index. For more information about this parameter, refer to the
Pippin Tablefile Parameter.
|
-cdescname
|
Executes
the initdb utility to update the
database description (.dsc) file,
where descname is the .dsc file name.
Note: |
Use
caution when modifying the .dsc file of an already existing
database. Inconsistent search results can occur because of
changed rules for building the database.
|
|
-prestart_filename
|
Creates
the restart_filename in the event of an abnormal termination
of pippin. The restart_filename can be used
to attempt a restart of the pippin utility.
|
-gsgmltagfile
|
Specifies
the name and path of a file (sgmltagfile) that contains
a list of SGML tags NOT to be indexed. A useful convention is to
name the sgmltagfile as db_name.tag, where
db_name is the database's name.
The input
file must contain one tag per line, with each line beginning at
column 1. Each line should contain the character(s) in between the
brackets in a SGML tag that should not be indexed. For example,
to exclude the italic (<I> and </I>) and bold tags (<B>
and </B>) from indexing, the sgmltagfile would contain:
You must
use the -g flag in conjunction with the -fs
flag to activate this function. The -fs flag tells pippin
to look for an sgmltagfile. The -g flag specifies
the name and location of the sgmltagfile.
|
-fr
|
Adds only
records that are not already in the database.
|
-fo
(beginning
with SiteSearch 4.1.0) |
Adds only records
that already exist in the database. This is the reverse of the -fr
flag. |
-fx
|
Puts pippin
in input only mode, which is helpful for debugging problems with
your inputfile.
|
-fs
|
Loads SGML
tag tables. You must use the -fs flag in conjunction with
the -g flag. The -fs flag tells pippin
to look for an sgmltagfile. The -g flag
specifies the name and location of the sgmltagfile.
|
-fv
|
Creates
print status messages as output while pippin is executing.
|
-fz
(beginning
with SiteSearch 4.1.0)
|
Indicates that
there is no postings directory. |
-h
|
Displays
the utility's usage statement.
|
See Also
Pippin Tablefile Parameter An Introduction to Database Files An Explanation of the Database Build Process The Initdb Utility The Rome Utility Creating a Database Description (.dsc) File Open SiteSearch Database Builder Utility Programs SiteSearch Database Operations Tool (SSDOT) ASN.1/BER Input Record Format (available from Open SiteSearch technical support)
|