Configuring
Z39.50 Duplicate Detection Service
Successful implementation
of Z39.50 Duplicate Detection Service in the WebZ interface is dependent
upon the configuration of combined result
set functionality. Only after result sets are combined can they be
de-duplicated, sorted, and grouped according to a duplicate key that you
define when you configure the service in the ini files for the databases
that use it.
To configure your
interface to use Z39.50 Duplicate Detection Service, you need to perform
two general tasks:
- Define
the duplicate detection specification.
- Include
the duplicate formats in the individual database configuration files.
1.
Defining the Duplicate Detection Specification
Using your text editor,
you can configure the Z39.50
Duplicate Detection Service for a single database or a group of databases
by defining a duplicate detection specification in the corresponding database
ini file or group ini file. A duplicate detection specification contains
the following parameters and their values:
- Name of the duplicate
detection key
- Maximum number
of records in a result set that can be de-duplicated
- Number of duplicate
records that are maintained under a representative record
- Sort criterion
for ordering the duplicate records
- Indication of
whether duplicate detection should occur automatically for results less
than or equal to the value for MaxDedupRecords
For example, the
following is the duplicate detection specification from the OBI, version
1:
[dedup]
key*=StandardNumberKey
MaxDedupRecords = 500
MaxDedupRecordsToRetain = 100%
SortCriterion = DbnameOrder
preferredDatabaseNames = DRA endeavor III GEAC_Advance Horizon AutomaticDedupOnEverySearch
= true |
Note: |
This example
duplicate detection specification is for a group of databases defined
in the OPACGroup.ini file for the Virtual Catalog topic area in
the OBI, version 1. You can, however, configure this service for
an individual database by defining a duplicate detection specification
in the appropriate database ini file.
|
For complete steps
on defining a new group database configuration file, see Creating
a Database Configuration File.
Duplicate Detection Specification
Parameters
Some parameters are
required for the duplicate detection specification and others are optional.
The following is a breakdown of all the available parameters that you
can use to create a duplicate detection specification.
Parameter
|
Description
|
maxDedupRecords |
This parameter
declares the limit on the number of records in a result set that
can be de-duplicated. The default is 200.
Example:
maxDedupRecords
= 200
Any result
set with 200 members or less would be de-duplicated. Result sets
over 200 would not be de-duplicated
|
maxDedupRecordsToRetain |
This
parameter declares the maximum number of duplicates associated with
a representative record. The value for this parameter can be an integer
like 100 or a percentage like 100%. The default is 100%.
|
sortCriterion |
This is the
order in which the duplicates are sorted for a single cluster record.
The default is DbNameOrder, but other recognized values are LargestRecordsFirst
and SmallestRecordsFirst.
Example: |
sortCriterion
= DbNameOrder |
|
preferredDatabaseNames |
Used only in
conjunction with sortCriterion = DbNameOrder, this parameter declares
the preferred order in which duplicates are to be listed underneath
a representative record. Possible values are database names with
a space in between each name.
Example:
preferredDatabaseNames=DRA
Endeavor III
|
AutomaticDedupOnbEverySearch |
This parameter
indicates if de-duplication is to be performed on every search that
has a result count less than or equal to the value for maxDedupRecord.
The possible values are true or false. False is the default.
Note: |
When
AutomaticDedupOnEverySearch=false, the widget dedup=true can
be used with the QUERY or FETCH verb to de-duplicate a result
set selectively. |
|
key*
(optional) |
The value for
this parameter references a section elsewhere in the configuration
file that defines the specification for the de-duplication key.
Example: |
key
= standardNumberKey
[standardNumberKey]
name=Standard Number
use=8
parm1 = 022/1, 020/1
class = ORG.oclc.zsorts.StandardNumberKey |
|
Parameters of a Duplicate
Detection Key Specification
The parameters used
to define a de-duplication key specification are identical to those used
to define a sort key.
Parameter
|
Description
|
name
|
Name of the
sort key as it will appear in the WebZ interface.
|
use |
This
is an integer that identifies the index that is to be used to perform
the sort. |
parm<n> |
Complete BER
tag path of the data field on which to sort. Multiple paths are
specified as parm1, parm2, etc.
Examples:
parmn = 700/1,4
(finds subfield 700/1 or 700/4)
parmn = 700/1-4 (finds subfields 7001/ through 700/4)
parmn = 700/* (finds all subfields in field 700)
Note: |
You
can define only one field in a parm variable. |
|
class |
This is the
class that pulls data from the records on which de-duplication is
performed.
Note: |
Creation
of the duplicate detection key specification is a "plug point"
in the system, where you can put your own Java class to pull
data from records. (The class, ORG.oclc.zsorts.StandardNumberKey
is included in SiteSearch as a pre-defined class that uses Stand.Num.)
|
|
For more information
about sort key definitions
and their parameters, see Database Configuration
Files.
2.
Include the Duplicate Formats in the Database Configuration File(s)
In order to view
duplicates, you need to declare values for three parameters under the
[Formats] section in each of the database configuration files for the
current group.
- duplicates
- briefduplicates
- bookmarkDups
For example, the
OBI, version 1 is able to display duplicate record information for the
Virtual Catalog databases because the following formats are defined under
the [Formats] section in each of the configuration files for the databases
comprising the topic area:
Content of MarcCatalogDuplicates.ini
Notice that each
of the above formats references the formatting configuration file, MarcCatalogDuplicates.ini,
which contains the display and rule specifications that determine how
duplicate records are to be displayed in the OBI, version 1.
Composite Records
The gadget, FormatRecordsWithDuplicates,
ultimately is responsible for formatting duplicate records by retrieving
representative records and their duplicates from ZBase to create composite
records for display in the interface.
The following illustration
depicts this process:
For more information
about defining display specifications, rules specifications, and formatting
configuration files, see WebZ Rules-Based
Formatting.
Note: |
Duplicate
detection service depends on the activiation of SupportsMultiDbQuery
and SupportsMergeRead in the ZBase.ini file. Failure to set these
two parameters to "true" inactivates duplicate detection. |
See Also
WebZ
Interface
WebZ
Configuration Files
|