Batch Loading Data into a Local DC Database Updated with Record Builder

Main -> Documentation -> OCLC SiteSearch Record Builder -> Record Builder Administration -> Batch Loading Data into a Local DC Database Updated with Record Builder

Batch Loading Data into a Local DC Database Updated with Record Builder

Contents

Introduction
Document Conventions
Rationale
Data Requirements
Procedure
Automating Data Conversion and Data Loading

Introduction

This document describes how to add data from an external data source to a local DC database that you plan to update with the Record Builder application included with Open SiteSearch Database Builder. This allows you and/or your staff to edit the database with Record Builder while also allowing your patrons to search the database through the WebZ interface. It provides patrons with Z39.50 access to all of the features you choose to implement in your WebZ interface, such as cross-database searching, custom record formats, and vocabulary-assisted searching. When you create the local database, you can take advantage of Database Builder's sophisticated indexing capabilities, such as plurals, restrictors, and sparse indexes.

Return to Contents

Document Conventions

<WebZ_root> is the location of the WebZ environment that includes Record Builder.
new_db is the top-level directory of the database to which you want to batch load data.
DC(2) refers to the Dublin Core(2) database framework introduced in SiteSearch 4.1.1. The DC(2) database framework uses Version 1.1 of the Dublin Core Metadata Element Set (DCMES).
DCQ refers to the Dublin Core with Qualifiers database framework introduced in SiteSearch 4.1.2. The DCQ database framework also uses Version 1.1 of the DCMES and qualifiers, as defined by the Dublin Core Metadata Initiative in Dublin Core Qualifiers.
DC database refers to a local SiteSearch database based on a DC database framework (DC(2) or DCQ).

Return to Contents

Rationale

See Batch and Individual Record Editing for DC Databases for an explanation of the reasoning behind this procedure.

Return to Contents

Data Requirements

You need an .sgml file compatible with the .dtd file for the DC(2) or DCQ database framework and the DC(2) or DCQ.dtd file. You can find the DC(2) .dtd file in <WebZ_root>/dbbuilder/dbs/dc and the DCQ .dtd file in <WebZ_root>/dbbuilder/dbs/dcq.

Each record in the .sgml file should:

Start with a <rec> tag and end with a </rec> tag.
Have an ID field with a unique identifier, surrounded by <ID><PCDATA> and </PCDATA></ID> tags, such as <ID><PCDATA>20516</PCDATA></ID>.
Use <PCDATA></PCDATA> tags to surround actual data values for a field, such as the following:

<DC:Type><PCDATA>Photograph</PCDATA></DC:Type>

where:

<DC:Type> is the opening field tag

<PCDATA>Photograph</PCDATA> indicates that Photograph is the data value for this field

</DC:Type> is the closing field tag.

Example file

Here is an example of a sample record in an .sgml file for a DC(2) database.

<rec>
<ID><PCDATA>1</PCDATA></ID>
<DC:Type><PCDATA>Photograph</PCDATA></DC:Type>
<DC:Format><PCDATA>Scanned from a photographic print using a
Microtek Scanmaker 9600XL at 120 dpi in JPEG format at compression
rate 3 and resized to 768x512 ppi. 4/1999.</PCDATA></DC:Format>
<DC:Description><PCDATA>Caption on image: "Scales looking South
Chilkoot Pass 1898." Original image in Hegg Album 1, page 28.
</PCDATA></DC:Description>
<DC:Language><PCDATA></PCDATA></DC:Language>
<DC:Rights><PCDATA>None</PCDATA></DC:Rights>
<DC:Identifier><RB:auth><RB:state><PCDATA>controlled</PCDATA>
</RB:state></RB:auth> <RB:Scheme><PCDATA>URL</PCDATA></RB:Scheme> <PCDATA>http://content.lib.washington.edu/hegg/image/4.jpg</PCDATA>
</DC:Identifier>
<DC:Date><PCDATA>1898</PCDATA></DC:Date>
<DC:Title><PCDATA>Klondikers and supplies at "The Scales" looking
south along the Chilkoot Trail, Alaska, 1898.<PCDATA></DC:Title>
<DC:Publisher><PCDATA></PCDATA></DC:Publisher>
<DC:Creator><PCDATA>Hegg, Eric A.</PCDATA></DC:Creator>
<DC:Subject><PCDATA>Chilkoot Trail, Trails--Alaska, Chilkoot Pass
(Alaska), Mountain passes--Alaska, Tents--Alaska--Chilkoot
Pass</PCDATA></DC:Subject>
<DC:Source><PCDATA>Eric A. Hegg Collection no. 274</PCDATA>
</DC:Source>
<DC:Relation><PCDATA></PCDATA></DC:Relation>
<DC:Contributor><PCDATA>University of Washington Libraries.
Manuscripts, Special Collections, University Archives Division
</PCDATA></DC:Contributor>
<DC:Coverage><PCDATA>United States--Alaska--Chilkoot Pass</PCDATA> </DC:Coverage>
</rec>

If your data contains Scheme and Modifier qualifiers, you can include them in the .sgml file (see the <RB:Scheme> tag within the <DC:Identifier> tag in the example above), but they are not required. In SiteSearch 4.1.2, modifiers are called qualifiers, and the scheme and qualifier tags are <DC:Scheme> and <DC:Qualifier>, respectively.

Return to Contents

Procedure

Follow these steps to add data from an existing database to a new local DC database.

1. Export your data in .sgml format as described in the Data Requirements section of this document.

2. Examine the .sgml file to find the last unique record identifier used in the records. You can find this identifier in the <ID> field in the .sgml file. Write down this identifier, as you will need it in in step 18.

3. Follow the procedure, Cloning a Record Database, to set up an empty Record Builder database for your data.

4. Copy the .sgml file with your data and your .dtd file to <WebZ_root>/dbbuilder/dbs/new_db.

5. Check the database's .dsc file to see whether it contains any sparse indexes.

Note:

In steps 6-15 you convert your data to BER format and then use SSDOT's Advanced Options to load the data into the database.

6. From the directory <WebZ_root>/dbbuilder/dbs/new_db, enter the following command to convert your data from SGML to BER format:

<WebZ_root>/dbbuilder/bin/sgmlconv new_db -inew_db.sgml \
-dnew_db.dtd -p

Note:

Do not include the backslash ("\") on the command line. It is only included for readability.

After sgmlconv executes, it displays messages about the input, output, and .dtd files uses, the number of records processed, and the size of the output file, new_db.ber.

7. Move new_db.ber to <WebZ_root>/dbbuilder/dbs/new_db/bers. Rename this file as new_db.ber.vol1.

8. Start the SiteSearch Database Operations Tool (SSDOT).

9. Type "5" and press Enter to select the Advanced Options menu from the SSDOT Main Menu.

Note:

If a job does not complete successfully in steps 11-16, check the appropriate log file for more information before continuing to the next step.

10. Add new records to the database and create index terms for these records (option 3 on the menu):

Type "3" and press Enter.
When prompted, type the database name and press Enter.
Type "1" and press Enter to select the BER volume shown. (This should be new_db.ber.vol1.)
Press Enter to return to the Advanced Options menu.
Type "j" and press Enter to verify that this job completed successfully.

11. Sort the index terms from the new records (option 6 on the menu):

Type "6" and press Enter.
When prompted, type the database name and press Enter.
Press Enter to return to the Advanced Options menu.
Type "j" and press Enter to verify that this job completed successfully.

12. Add the index terms to the database (option 7 on the menu):

Type "7" and press Enter.
When prompted, type the database name and press Enter.
Press Enter to return to the Advanced Options menu.
Type "j" and press Enter to verify that this job completed successfully.

13. Does the database have any sparse indexes (you checked for this in step 5 )?

Yes. Go to step 14.
No. Go to step 16.

14. Sort the sparse index terms from the new records (option 8 on the menu) :

Type "8" and press Enter.
When prompted, type the database name and press Enter.
Press Enter to return to the Advanced Options menu.
Type "j" and press Enter to verify that this job completed successfully.

15. Add the the sparse index terms to the database (option 9 on the menu):

Type "9" and press Enter.
When prompted, type the database name and press Enter.
Press Enter to return to the Advanced Options menu.
Type "j" and press Enter to verify that this job completed successfully.

16. Validate the database.

17. Exit SSDOT.

18. Refer to the record identifier you recorded in step 2. Edit <WebZ_root>/ini/dbbuilder/dbs/new_db/recordid.txt so that it contains an integer identifier greater than this identifier. (If the identifiers in the data you imported are not integers, you can start with any integer you wish in recordid.txt.).

19. Start a Record Builder session in your Web browser and verify that you can search or browse the database to locate the records you just loaded into the database. Then ensure that you can view and/or edit these records.

20. To make the database available to patrons, see the procedure Configuring Access to a Record Builder Database Through the WebZ Interface.

21. If you have additional data to add to the database in batch mode, or if you have additional local DC databases for which you need to batch load data, see Automating Data Conversion and Data Loading.

Return to Contents

Automating Data Conversion and Data Loading

The procedure in this document is the simplest method for a one-time batch load into a single DC database. If you have multiple data files to load into a single DC(2) database or you need to populate several local DC databases in batch mode, there is an alternate method for converting the source data to BER format and loading it into the database, as follows:

1. Follow the procedure, Adding a BER Conversion Program to SSDOT, to add the sgmlconv utility with the -p flag as an additional conversion BER conversion method. Name this method so that you can distinguish it from the default SGML conversion method within SSDOT.

2. Are you batch loading data into a new DC database or an existing DC database?

For an existing database:
- Edit the database registration information and modify item 13, Format of raw data. Specify the conversion method you created in step 1 above.
- Follow steps 1, 2, and 4 in the batch loading procedure above. Then go to step 3 below.
For a new database:
- Follow steps 1-4 in the batch loading procedure above.
- When you register the new database with SSDOT, modify item 13, Format of raw data when you register the database with SSDOT. Specify the conversion method you created in step 1 above. (Registering the database with SSDOT is one of the steps in the procedure for Cloning a Record Builder Database, which the batch loading procedure references in step 3.)
- Go to step 3 below.

3. Replace steps 6-15 in the batch loading procedure above with steps 4-5 below.

4. Start the SiteSearch Database Operations Tool (SSDOT).

5. Update the database with your source data.

6. Return to step 16 in the batch loading procedure above.

Return to Contents

[Main][Documentation][Support][Technical Reference][Community][Glossary][Search]

Last Modified: