Main -> Documentation -> Database Builder – Newton -> An Introduction to Physical Database Files

An Introduction to Physical Database Files

An Open SiteSearch Database Builder database is comprised of five files referred to as physical database files. The Newton search engine uses these files interactively to produce a searchable database. During the database build process, the database files are created and indexed according to the definitions contained in the database description (.dsc) file. The five database files created during the build process are:

  • the Index File (INDX).
  • the Postings Directory (PDIR),
  • the Postings File (POST),
  • the Header Directory (HDIR), and
  • the Header File (HEDR),

The table below describes each of the five database files created during the database build process.

Database File

Description

INDX File

The index file (INDX) contains all of the index terms extracted from your data records. For each term there is a logical posting-list number. Index files are organized by a term and its index entry.

PDIR Directory

Keeps track of the physical arrangement of postings lists in the POST file. To do this, the postings directory (PDIR) uses pointers to map logical postings-list numbers with the lists' physical locations in the POST file. Thus, you can move a postings list in the POST file and maintain access to it by making just one pointer change in the PDIR directory.

POST File

Maintains a list of the logical record numbers for each indexed term in your data, including restrictors. Each indexed term and its associated list of record numbers are referred to as a postings list. Each postings list is identified in the postings file (POST) by a unique, logical postings-list number.

HDIR Directory

Keeps track of the physical arrangement of data records in the HEDR file. To do this, the header directory (HDIR) uses pointers to map a logical record number to the physical record stored in the HEDR file. When a record is updated and stored in the HEDR file, the Newton search engine can maintain access to it with just one pointer change in the HDIR directory.

HEDR File

Stores the actual records that make up your database. Each record is identified by a unique logical record number in the header (HEDR) file. Records are stored as ASN.1/BER-encoded records that conform to the following ISO standards:

  • Abstract Syntax Notation One (ASN.1) for describing data structures (ISO 8824).
  • Basic Encoding Rules (BER) for locating data and structure information within a record (ISO 8825).

Because the records follow the structure specified in these standards, they are easier to manipulate for indexing, converting, and exporting.

During the database build process, the database description (.dsc) file is added to region 0 of the HEDR file.

How the Physical Database Files Form a Database

The following example illustrates how the five physical database files work together to form a searchable Database Builder database. For this example, the user has submitted a search for the term 'sobell' in the author index.

1. The Newton search engine locates the term 'sobell' in the author index in the INDX file and finds a reference to the pointer in the PDIR directory.

2. The pointer in the PDIR directory leads to the list of logical record numbers in the POST file.

3. The logical record numbers in the POST file leads to the pointers in the HDIR directory.

4. The pointers in the HDIR directory lead to the records physical locations in the HEDR file.

5. Records from the HEDR file are displayed for the end user.

Note:

The HDIR and PDIR files are used to increase efficiency for record additions, replacements, and deletions during a database update. The pointers contained in these files are the only items that need to be edited by the software when records are moved.

Interaction of the Database Files


[Main][Documentation][Technical Reference][Glossary][Search]