|
Pears-Newton
Comparison
Contents
Introduction
Document Conventions
Database Building and Updates
Database Files
Non-Latin Characters, Diacritics, and Technical Symbols
Input Records
Record Indexing
Record Export
Introduction
This
document compares the Open SiteSearch Database Builder Pears software
introduced in SiteSearch 4.2.0 and the Newton Database Builder software
provided with previous versions of SiteSearch. It describes the similarities
and differences of Pears and Newton in several areas, as shown in the
Contents. Newton features not included in this document
are also available in Pears.
See
the Pears System Overview for more detailed
information about the features and elements of Pears.
Note: |
|
Database
Builder for SiteSearch 4.2.0 includes both Pears and Newton software. |
Document
Conventions
- <WebZ_root>
is the location of your WebZ environment.
- Version 4.0.x
refers to any SiteSearch 4 version prior to 4.1.0; that is, 4.0.0, 4.0.0a,
4.0.1, or 4.0.2.
- Version 4.1.x
refers to SiteSearch 4.1.0, 4.1.1,
4.1.2, or 4.1.2a.
- Class refers
to a Java class that performs a specific function in the Pears version
of Database Builder.
Database
Building and Updates
Item
|
Pears
|
Newton
|
Software
language |
Java |
C |
Configuration
file |
|
|
Data conversion |
Record
handler classes (ORG.oclc.RecordHandler package) for supported
input formats to convert data
|
sgmlconv
marcconv |
Database-building
utilities |
One
primary utility, Bartlett |
Bartlett
manages Pears database creation and updates. Bartlett reads
the database description configuration file and:
-
initializes the database
- calls
the appropriate record handler for converting input records
to BER format
- calls
the specified indexing routines
- stores
database records, index terms, and posting lists in the
database's physical file
Accessible
through user interface or command line
|
|
Four
utilities |
After
input data exists in BER format, these utilities perform the
Newton database build process:
Accessible
through user interface or command line
|
|
User
interface (batch operations) |
|
Perl-based
SSDOT for Newton
|
Menu-driven
interface for common database building and administration
tasks
|
|
User interface
(single-record operations) |
Record
Builder application
|
Record
Builder application |
Temporary
journal file |
Created
during database update |
Bartlett
maintains a temporary journal file with all the changes to be
applied to the database. Bartlett does not write these changes
to the database until the end of a successful database update.
|
|
Not available |
Reindexing |
Can add or
remove indexes without reloading database |
Must reload
database to add or remove indexes |
Error handling |
Errors generate
exceptions, which move up the Java class hierarchy until a class can
handle the exception |
Errors during
update may corrupt database |
Database
recovery |
Possible
by reapplying the temporary journal file to the database. |
Bartlett
does not add data to the .pdb file until the end of an update,
so the database is intact if a hardware failure or system
crash occurs.
If
Bartlett successfully makes all changes to the temporary
journal file, but the update fails while committing the
changes to the .pdb file, you can subsequently reapply the
changes stored in the temporary journal file to the database.
Alternatively, you can discard the temporary journal file
and run the update again.
|
|
Possible
by restoring a backup and reapplying updates that occurred after
the backup
|
Journaling |
Optional |
Journaling
refers to maintaining a record of all transactions (adding,
modifying, or deleting records) that occur during a database
update.
Pears
can create a journal file with all the BER records added,
modified, or deleted from the database during an update.
Pears
provides a record handler and record filter to apply these
transactions to the database using specified filtering criteria.
|
|
|
Return
to Contents
Database
Files
Item
|
Pears
|
Newton
|
Physical
database files |
One
.pdb file per database (except for partitioned
databases)
|
The
.pdb file has logical sections for database records, index
terms, postings lists, and logical indirection.
|
|
Five
physical .db files per database
|
- HDIR
(header directory)
-
HEDR (header file)
- INDX
(index file)
- PDIR
(postings directory)
- POST
(postings file)
|
|
Database
file growth |
Dynamic
growth (assuming adequate disk space), with no need to allocate
additional disk space manually
|
Must
allocate space for database in advance and increase size before database
exceeds available space |
Partitioned
(logical) databases distributed among across more than one physical
file |
Available |
Can
partition a single set of input records into physical partitions
of a fixed size or divide them equally across a specified
number of partitions.
Can
create custom partitioning criteria by customizing partitioning
classes
For
very large databases, can store index(es) and database records
in separate files if desirable for optimal performance.
|
|
Available |
Create
partitions manually by building and updating separate physical
databases for each partition. |
|
Maximum
number of records per database |
2 billion
records (using a partitioned database) |
2 billion
records (using a partitioned database) |
Record storage
format |
BER |
BER |
Return
to Contents
Non-Latin
Characters, Diacritics, and Technical Symbols
Item
|
Pears
|
Newton
|
Unicode
support |
Built-in
|
Can
store Unicode characters in database records and indexing terms
with Unicode characters. |
|
Not
available |
Replaces
some special characters to searchable characters by default;
can specify substitutions for other characters in a pippin
z-table (tablefile). |
|
Return
to Contents
Input
Records
Item
|
Pears
|
Newton
|
Supported
input formats |
- SGML/XML
- Delimited
text
- USMARC
- ChinaMarc
- Unimarc
- Newton
databases
|
SGML
USMARC |
Data
conversion programs |
Record
handler classes for supported input formats.
Can extend
an existing record handler to handle other input formats.
Record handlers
can also export records.
|
sgmlconv
marcconv |
Mapping between
fields in input records and BER tag paths |
|
|
Input
record filters (to reject or "filter out" records from an
input file) |
Available
and customizable |
The ORG.oclc.RecordHandler.FilterByTagPresence
class includes or excludes records from an input file based
on existence or nonexistence of a specified field.
Can
extend FilterByTagPresence and include or exclude records
based on the value of a specified field.
|
|
|
Return
to Contents
Record
Indexing
Item
|
Pears
|
Newton
|
Number of
indexes per database |
Unlimited
|
255 |
Indexing
routines |
Flexible
and customizable indexing routines |
See
Pears-Newton Indexing Routine
Comparison for information about using Pears indexing
routines to duplicate the most commonly used Newton indexing
routines.
Note: |
|
Pears
indexing routines also function as query normalizers in
the WebZ interface. |
|
|
|
Maximum length
of index terms |
2000 characters
(a practical, rather than absolute, limit) |
72 characters |
Sparse indexes
to improve search performance |
Created
without user intervention. |
Indexes
automatically sorted by index number rather than index term.
|
|
Specify in
index definition with sparse
parameter |
Term adjacency
for keyword indexes |
Add reference
to ORG.oclc.pears.Bartlett.wordfield in index definition in the
database description configuration file
|
Specify
term adjacency definitions
in .dsc file. Then invoke these term adjacency definitions for a
given index individually.
|
Stopwords |
Specify
database-wide and index-specific stopwords in a section (usually
called [stopwords]) in the database description configuration file.
Pears has
a class (ORG.oclc.pears.IndexRoutines.StopwordEnforcer) that removes
global stopwords from all of a database's indexes in one pass.
|
Specify
database-wide and index-specific stopwords in .dsc file (see Stopwords
Definition).
|
Restrictors |
Specify in
database description configuration file. Created by ORG.oclc.pears.Bartlett.termrest,
which requires the index number and the restrictor terms as input |
Specified
in restrictors section of
.dsc file. Created by restrictor
Newton index routines |
Synonyms |
Not supported |
Specify in
synonyms section of .dsc file |
Return
to Contents
Record
Export
Item
|
Pears
|
Newton
|
Batch-record
export |
Can export
records in original input format with
the same record handler used to add
them to the database.
For delimited
text records, you can specify fields to leave out of exported records.
|
Can export
in BER record format only. |
Single-record
export |
Available
in Record Builder in SiteSearch 4.2.0 only |
Available
in Record Builder in SiteSearch 4.2.0 only |
Return
to Contents
See
Also
Pears
System Overview
Pears-Newton Indexing Routine Comparison
Converting a Newton Database to a Pears Database
|