Main -> Documentation -> Database Builder – Pears -> Pears-Newton Comparison
Pears-Newton Comparison

 

Contents

Introduction
Document Conventions
Database Building and Updates
Database Files
Non-Latin Characters, Diacritics, and Technical Symbols
Input Records
Record Indexing
Record Export


Introduction

This document compares the Open SiteSearch Database Builder Pears software introduced in SiteSearch 4.2.0 and the Newton Database Builder software provided with previous versions of SiteSearch. It describes the similarities and differences of Pears and Newton in several areas, as shown in the Contents. Newton features not included in this document are also available in Pears.

See the Pears System Overview for more detailed information about the features and elements of Pears.

Note:    Database Builder for SiteSearch 4.2.0 includes both Pears and Newton software.

Document Conventions

  • <WebZ_root> is the location of your WebZ environment.
  • Version 4.0.x refers to any SiteSearch 4 version prior to 4.1.0; that is, 4.0.0, 4.0.0a, 4.0.1, or 4.0.2.
  • Version 4.1.x refers to SiteSearch 4.1.0, 4.1.1, 4.1.2, or 4.1.2a.
  • Class refers to a Java class that performs a specific function in the Pears version of Database Builder.

Database Building and Updates

Item

Pears

Newton

Software language Java C
Configuration file

Database description configuration (.ini) file


This file contains information about:

Database description (.dsc) file


Data conversion

Record handler classes (ORG.oclc.RecordHandler package) for supported input formats to convert data

sgmlconv
marcconv
Database-building utilities
One primary utility, Bartlett

Bartlett manages Pears database creation and updates. Bartlett reads the database description configuration file and:
  • initializes the database
  • calls the appropriate record handler for converting input records to BER format
  • calls the specified indexing routines
  • stores database records, index terms, and posting lists in the database's physical file

Accessible through user interface or command line

Four utilities

After input data exists in BER format, these utilities perform the Newton database build process:

Accessible through user interface or command line

User interface (batch operations)

Java-based SiteSearch Database Operations Tool (SSDOT) for Pears


Menu-driven interface for common database building and administration tasks

Perl-based SSDOT for Newton


Menu-driven interface for common database building and administration tasks

User interface (single-record operations)

Record Builder application

Record Builder application
Temporary journal file
Created during database update

Bartlett maintains a temporary journal file with all the changes to be applied to the database. Bartlett does not write these changes to the database until the end of a successful database update.
Not available
Reindexing Can add or remove indexes without reloading database Must reload database to add or remove indexes
Error handling Errors generate exceptions, which move up the Java class hierarchy until a class can handle the exception Errors during update may corrupt database
Database recovery
Possible by reapplying the temporary journal file to the database.

Bartlett does not add data to the .pdb file until the end of an update, so the database is intact if a hardware failure or system crash occurs.

If Bartlett successfully makes all changes to the temporary journal file, but the update fails while committing the changes to the .pdb file, you can subsequently reapply the changes stored in the temporary journal file to the database. Alternatively, you can discard the temporary journal file and run the update again.

Possible by restoring a backup and reapplying updates that occurred after the backup

Journaling
Optional

Journaling refers to maintaining a record of all transactions (adding, modifying, or deleting records) that occur during a database update.

Pears can create a journal file with all the BER records added, modified, or deleted from the database during an update.

Pears provides a record handler and record filter to apply these transactions to the database using specified filtering criteria.

Not available

Return to Contents


Database Files

Item

Pears

Newton

Physical database files

One .pdb file per database (except for partitioned databases)


The .pdb file has logical sections for database records, index terms, postings lists, and logical indirection.

Five physical .db files per database


  • HDIR (header directory)
  • HEDR (header file)
  • INDX (index file)
  • PDIR (postings directory)
  • POST (postings file)
Database file growth

Dynamic growth (assuming adequate disk space), with no need to allocate additional disk space manually

Must allocate space for database in advance and increase size before database exceeds available space
Partitioned (logical) databases distributed among across more than one physical file
Available

Can partition a single set of input records into physical partitions of a fixed size or divide them equally across a specified number of partitions.

Can create custom partitioning criteria by customizing partitioning classes

For very large databases, can store index(es) and database records in separate files if desirable for optimal performance.

Available

Create partitions manually by building and updating separate physical databases for each partition.

 

Maximum number of records per database 2 billion records (using a partitioned database) 2 billion records (using a partitioned database)
Record storage format BER BER

Return to Contents


Non-Latin Characters, Diacritics, and Technical Symbols

Item

Pears

Newton

Unicode support
Built-in

Can store Unicode characters in database records and indexing terms with Unicode characters.
Not available

Replaces some special characters to searchable characters by default; can specify substitutions for other characters in a pippin z-table (tablefile).

Return to Contents


Input Records

Item

Pears

Newton

Supported input formats
  • SGML/XML
  • Delimited text
  • USMARC
  • ChinaMarc
  • Unimarc
  • Newton databases
SGML
USMARC
Data conversion programs

Record handler classes for supported input formats.

Can extend an existing record handler to handle other input formats.

Record handlers can also export records.

sgmlconv
marcconv
Mapping between fields in input records and BER tag paths
Defined in Pears .tags file

for SGML/XML or delimited text input records
Defined in Newton .dtd file

for SGML input records
Input record filters (to reject or "filter out" records from an input file)
Available and customizable

The ORG.oclc.RecordHandler.FilterByTagPresence class includes or excludes records from an input file based on existence or nonexistence of a specified field.

Can extend FilterByTagPresence and include or exclude records based on the value of a specified field.

Not available

Return to Contents


Record Indexing

Item

Pears

Newton

Number of indexes per database Unlimited 255
Indexing routines
Flexible and customizable indexing routines

See Pears-Newton Indexing Routine Comparison for information about using Pears indexing routines to duplicate the most commonly used Newton indexing routines.
Note:    Pears indexing routines also function as query normalizers in the WebZ interface.
Fixed set of indexing routines

 

Maximum length of index terms 2000 characters (a practical, rather than absolute, limit) 72 characters
Sparse indexes to improve search performance
Created without user intervention.

Indexes automatically sorted by index number rather than index term.

 

 

Specify in index definition with sparse parameter
Term adjacency for keyword indexes

Add reference to ORG.oclc.pears.Bartlett.wordfield in index definition in the database description configuration file

Specify term adjacency definitions in .dsc file. Then invoke these term adjacency definitions for a given index individually.

Stopwords

Specify database-wide and index-specific stopwords in a section (usually called [stopwords]) in the database description configuration file.

Pears has a class (ORG.oclc.pears.IndexRoutines.StopwordEnforcer) that removes global stopwords from all of a database's indexes in one pass.

Specify database-wide and index-specific stopwords in .dsc file (see Stopwords Definition).

Restrictors Specify in database description configuration file. Created by ORG.oclc.pears.Bartlett.termrest, which requires the index number and the restrictor terms as input Specified in restrictors section of .dsc file. Created by restrictor Newton index routines
Synonyms Not supported Specify in synonyms section of .dsc file

Return to Contents


Record Export

Item

Pears

Newton

Batch-record export

Can export records in original input format with the same record handler used to add them to the database.

For delimited text records, you can specify fields to leave out of exported records.

Can export in BER record format only.
Single-record export Available in Record Builder in SiteSearch 4.2.0 only Available in Record Builder in SiteSearch 4.2.0 only

Return to Contents


See Also

Pears System Overview
Pears-Newton Indexing Routine Comparison
Converting a Newton Database to a Pears Database

 

[Main][Documentation][Support][Technical Reference][Community][Glossary][Search]

Last Modified: