|
Batch
Loading Data into a Local DC Database Updated with Record Builder
Contents
Introduction
Document Conventions
Rationale
Data Requirements
Procedure
Automating Data Conversion and Data Loading
Introduction
This document
describes how to add data from an external data source to a local DC database
that you plan to update with the Record Builder
application included with Open SiteSearch Database Builder. This allows
you and/or your staff to edit the database with Record Builder while also
allowing your patrons to search the database through the WebZ interface.
It provides patrons with Z39.50 access to all of the features you choose
to implement in your WebZ interface, such as cross-database
searching, custom record formats,
and vocabulary-assisted searching.
When you create the local database, you can take advantage of Database
Builder's sophisticated indexing capabilities,
such as plurals, restrictors,
and sparse indexes.
Return
to Contents
Document
Conventions
- <WebZ_root>
is the location of the WebZ environment that includes Record Builder.
- new_db
is the top-level directory of the database to which you want to batch
load data.
- DC(2)
refers to the Dublin Core(2) database framework introduced in SiteSearch
4.1.1. The DC(2) database framework uses Version
1.1 of the Dublin Core Metadata Element Set (DCMES).
- DCQ
refers to the Dublin Core with Qualifiers database framework introduced
in SiteSearch 4.1.2. The DCQ database framework also uses Version 1.1
of the DCMES and qualifiers, as defined by the Dublin Core Metadata
Initiative in Dublin
Core Qualifiers.
- DC database
refers to a local SiteSearch database based on a DC
database framework (DC(2) or DCQ).
Return
to Contents
Rationale
See Batch
and Individual Record Editing for DC Databases for an explanation
of the reasoning behind this procedure.
Return
to Contents
Data
Requirements
You need an .sgml
file compatible with the .dtd file for the DC(2)
or DCQ database framework and the DC(2) or DCQ.dtd file. You can find
the DC(2) .dtd file in <WebZ_root>/dbbuilder/dbs/dc and the
DCQ .dtd file in <WebZ_root>/dbbuilder/dbs/dcq.
Each record in
the .sgml file should:
- Start with
a <rec> tag and end with a </rec> tag.
- Have an ID
field with a unique identifier, surrounded by <ID><PCDATA>
and </PCDATA></ID> tags, such as <ID><PCDATA>20516</PCDATA></ID>.
- Use <PCDATA></PCDATA>
tags to surround actual data values for a field, such as the following:
<DC:Type><PCDATA>Photograph</PCDATA></DC:Type>
where:
- <DC:Type>
is the opening field tag
- <PCDATA>Photograph</PCDATA>
indicates that Photograph is the data value for this field
- </DC:Type>
is the closing field tag.
Example file
Here is an example
of a sample record in an .sgml file for a DC(2) database.
<rec>
<ID><PCDATA>1</PCDATA></ID>
<DC:Type><PCDATA>Photograph</PCDATA></DC:Type>
<DC:Format><PCDATA>Scanned from a photographic print
using a
Microtek Scanmaker 9600XL at 120 dpi in JPEG format at compression
rate 3 and resized to 768x512 ppi. 4/1999.</PCDATA></DC:Format>
<DC:Description><PCDATA>Caption on image: "Scales
looking South
Chilkoot Pass 1898." Original image in Hegg Album 1, page 28.
</PCDATA></DC:Description>
<DC:Language><PCDATA></PCDATA></DC:Language>
<DC:Rights><PCDATA>None</PCDATA></DC:Rights>
<DC:Identifier><RB:auth><RB:state><PCDATA>controlled</PCDATA>
</RB:state></RB:auth> <RB:Scheme><PCDATA>URL</PCDATA></RB:Scheme>
<PCDATA>http://content.lib.washington.edu/hegg/image/4.jpg</PCDATA>
</DC:Identifier>
<DC:Date><PCDATA>1898</PCDATA></DC:Date>
<DC:Title><PCDATA>Klondikers and supplies at "The
Scales" looking
south along the Chilkoot Trail, Alaska, 1898.<PCDATA></DC:Title>
<DC:Publisher><PCDATA></PCDATA></DC:Publisher>
<DC:Creator><PCDATA>Hegg, Eric A.</PCDATA></DC:Creator>
<DC:Subject><PCDATA>Chilkoot Trail, Trails--Alaska,
Chilkoot Pass
(Alaska), Mountain passes--Alaska, Tents--Alaska--Chilkoot
Pass</PCDATA></DC:Subject>
<DC:Source><PCDATA>Eric A. Hegg Collection no. 274</PCDATA>
</DC:Source>
<DC:Relation><PCDATA></PCDATA></DC:Relation>
<DC:Contributor><PCDATA>University of Washington Libraries.
Manuscripts, Special Collections, University Archives Division
</PCDATA></DC:Contributor>
<DC:Coverage><PCDATA>United States--Alaska--Chilkoot
Pass</PCDATA> </DC:Coverage>
</rec> |
If your data contains
Scheme and Modifier qualifiers,
you can include them in the .sgml file (see the <RB:Scheme> tag
within the <DC:Identifier> tag in the example above), but they are
not required. In SiteSearch 4.1.2, modifiers are called qualifiers, and
the scheme and qualifier tags are <DC:Scheme> and <DC:Qualifier>,
respectively.
Return
to Contents
Procedure
Follow these steps
to add data from an existing database to a new local DC database.
1. Export
your data in .sgml format as described in the Data
Requirements section of this document.
|
2.
Examine the .sgml file to find the last unique record identifier used
in the records. You can find this identifier in the <ID> field
in the .sgml file. Write down this identifier, as you will need it
in in step 18. |
3. Follow
the procedure, Cloning a Record Database,
to set up an empty Record Builder database for your data. |
4. Copy
the .sgml file with your data and your .dtd file to <WebZ_root>/dbbuilder/dbs/new_db.
|
5.
Check the database's .dsc file
to see whether it contains any sparse
indexes. |
Note: |
In
steps 6-15 you convert your data to BER format and then use
SSDOT's Advanced Options to load the data into the database. |
|
6. From
the directory <WebZ_root>/dbbuilder/dbs/new_db,
enter the following command to convert your data from SGML to BER
format:
<WebZ_root>/dbbuilder/bin/sgmlconv
new_db -inew_db.sgml \
-dnew_db.dtd -p
Note: |
Do
not include the backslash ("\") on the command line.
It is only included for readability. |
After sgmlconv
executes, it displays messages about the input, output, and .dtd
files uses, the number of records processed, and the size of the
output file, new_db.ber.
|
7. Move new_db.ber
to <WebZ_root>/dbbuilder/dbs/new_db/bers. Rename
this file as new_db.ber.vol1. |
8. Start
the SiteSearch Database Operations Tool (SSDOT). |
9. Type "5"
and press Enter to select the Advanced Options menu from the SSDOT
Main Menu. |
Note: |
If
a job does not complete successfully in steps 11-16, check the
appropriate log file
for more information before continuing to the next step. |
|
10. Add
new records to the database and create index terms for these records
(option 3 on the menu):
- Type
"3" and press Enter.
- When
prompted, type the database name and press Enter.
- Type
"1" and press Enter to select the BER volume shown.
(This should be new_db.ber.vol1.)
- Press
Enter to return to the Advanced Options menu.
- Type
"j" and press Enter to verify that this job completed successfully.
|
11. Sort
the index terms from the new records (option 6 on the menu):
- Type "6"
and press Enter.
- When prompted,
type the database name and press Enter.
- Press Enter
to return to the Advanced Options menu.
- Type "j"
and press Enter to verify that this job completed successfully.
|
12. Add the
index terms to the database (option 7 on the menu):
- Type "7"
and press Enter.
- When prompted,
type the database name and press Enter.
- Press Enter
to return to the Advanced Options menu.
- Type "j"
and press Enter to verify that this job completed successfully.
|
13. Does the
database have any sparse indexes (you checked for this in step
5 )?
|
14.
Sort the sparse index terms from the new records (option 8 on the
menu) :
- Type "8"
and press Enter.
- When prompted,
type the database name and press Enter.
- Press Enter
to return to the Advanced Options menu.
- Type "j"
and press Enter to verify that this job completed successfully.
|
15. Add the
the sparse index terms to the database (option 9 on the menu):
- Type "9"
and press Enter.
- When prompted,
type the database name and press Enter.
- Press Enter
to return to the Advanced Options menu.
- Type "j"
and press Enter to verify that this job completed successfully.
|
16.
Validate the database. |
17. Exit
SSDOT. |
18.
Refer to the record identifier you recorded in step
2. Edit <WebZ_root>/ini/dbbuilder/dbs/new_db/recordid.txt
so that it contains an integer identifier greater than this identifier.
(If the identifiers in the data you imported are not integers, you
can start with any integer you wish in recordid.txt.). |
19. Start
a Record Builder session in your Web browser and verify that you
can search or browse
the database to locate the records you just loaded into the database.
Then ensure that you can view and/or edit these records.
|
20. To make
the database available to patrons, see the procedure Configuring
Access to a Record Builder Database Through the WebZ Interface. |
21. If you
have additional data to add to the database in batch mode, or if you
have additional local DC databases for which you need to batch load
data, see Automating Data Conversion and
Data Loading. |
Return
to Contents
Automating
Data Conversion and Data Loading
The procedure
in this document is the simplest method for a one-time batch load into
a single DC database. If you have multiple data files to load into a single
DC(2) database or you need to populate several local DC databases in batch
mode, there is an alternate method for converting the source data to BER
format and loading it into the database, as follows:
Return
to Contents
See Also
Record
Builder Database Frameworks
Planning a New DC Database
Record Builder Interface
Performing Online Updates to Local Databases
with Record Builder
|