Main -> Documentation -> WebZ System Administration -> Cross-Database Searching Functionality -> Configuring Z39.50 Duplicate Detection Service

Configuring Z39.50 Duplicate Detection Service

Successful implementation of Z39.50 Duplicate Detection Service in the WebZ interface is dependent upon the configuration of combined result set functionality. Only after result sets are combined can they be de-duplicated, sorted, and grouped according to a duplicate key that you define when you configure the service in the ini files for the databases that use it.

To configure your interface to use Z39.50 Duplicate Detection Service, you need to perform two general tasks:

  1. Define the duplicate detection specification.
  2. Include the duplicate formats in the individual database configuration files.


1. Defining the Duplicate Detection Specification

Using your text editor, you can configure the Z39.50 Duplicate Detection Service for a single database or a group of databases by defining a duplicate detection specification in the corresponding database ini file or group ini file. A duplicate detection specification contains the following parameters and their values:

  • Name of the duplicate detection key
  • Maximum number of records in a result set that can be de-duplicated
  • Number of duplicate records that are maintained under a representative record
  • Sort criterion for ordering the duplicate records
  • Indication of whether duplicate detection should occur automatically for results less than or equal to the value for MaxDedupRecords

For example, the following is the duplicate detection specification from the OBI, version 1:

[dedup]
key*=StandardNumberKey
MaxDedupRecords = 500
MaxDedupRecordsToRetain = 100%
SortCriterion = DbnameOrder
preferredDatabaseNames = DRA endeavor III GEAC_Advance Horizon AutomaticDedupOnEverySearch = true

Note:

This example duplicate detection specification is for a group of databases defined in the OPACGroup.ini file for the Virtual Catalog topic area in the OBI, version 1. You can, however, configure this service for an individual database by defining a duplicate detection specification in the appropriate database ini file.

For complete steps on defining a new group database configuration file, see Creating a Database Configuration File.


Duplicate Detection Specification Parameters

Some parameters are required for the duplicate detection specification and others are optional. The following is a breakdown of all the available parameters that you can use to create a duplicate detection specification.

Parameter
Description
maxDedupRecords

This parameter declares the limit on the number of records in a result set that can be de-duplicated. The default is 200.

Example: maxDedupRecords = 200

Any result set with 200 members or less would be de-duplicated. Result sets over 200 would not be de-duplicated

maxDedupRecordsToRetain This parameter declares the maximum number of duplicates associated with a representative record. The value for this parameter can be an integer like 100 or a percentage like 100%. The default is 100%.
sortCriterion

This is the order in which the duplicates are sorted for a single cluster record. The default is DbNameOrder, but other recognized values are LargestRecordsFirst and SmallestRecordsFirst.

Example: sortCriterion = DbNameOrder
 
preferredDatabaseNames

Used only in conjunction with sortCriterion = DbNameOrder, this parameter declares the preferred order in which duplicates are to be listed underneath a representative record. Possible values are database names with a space in between each name.

Example: preferredDatabaseNames=DRA Endeavor III
AutomaticDedupOnbEverySearch

This parameter indicates if de-duplication is to be performed on every search that has a result count less than or equal to the value for maxDedupRecord. The possible values are true or false. False is the default.

Note: When AutomaticDedupOnEverySearch=false, the widget dedup=true can be used with the QUERY or FETCH verb to de-duplicate a result set selectively.
key* (optional)

The value for this parameter references a section elsewhere in the configuration file that defines the specification for the de-duplication key.

Example: key = standardNumberKey

[standardNumberKey]
name=Standard Number
use=8
parm1 = 022/1, 020/1
class = ORG.oclc.zsorts.StandardNumberKey

 

Parameters of a Duplicate Detection Key Specification

The parameters used to define a de-duplication key specification are identical to those used to define a sort key.

Parameter

Description

name

Name of the sort key as it will appear in the WebZ interface.

use This is an integer that identifies the index that is to be used to perform the sort.
parm<n>

Complete BER tag path of the data field on which to sort. Multiple paths are specified as parm1, parm2, etc.

Examples:

parmn = 700/1,4 (finds subfield 700/1 or 700/4)
parmn = 700/1-4 (finds subfields 7001/ through 700/4)
parmn = 700/* (finds all subfields in field 700)

Note: You can define only one field in a parm variable.
 
class

This is the class that pulls data from the records on which de-duplication is performed.

Note: Creation of the duplicate detection key specification is a "plug point" in the system, where you can put your own Java class to pull data from records. (The class, ORG.oclc.zsorts.StandardNumberKey is included in SiteSearch as a pre-defined class that uses Stand.Num.)

For more information about sort key definitions and their parameters, see Database Configuration Files.


2. Include the Duplicate Formats in the Database Configuration File(s)

In order to view duplicates, you need to declare values for three parameters under the [Formats] section in each of the database configuration files for the current group.

  • duplicates
  • briefduplicates
  • bookmarkDups

For example, the OBI, version 1 is able to display duplicate record information for the Virtual Catalog databases because the following formats are defined under the [Formats] section in each of the configuration files for the databases comprising the topic area:

[Formats] variables for de-duplication


Content of MarcCatalogDuplicates.ini

Notice that each of the above formats references the formatting configuration file, MarcCatalogDuplicates.ini, which contains the display and rule specifications that determine how duplicate records are to be displayed in the OBI, version 1.

Display specifications in MarcCatalogDuplicates.ini

Rule specifications in MarcCatalogDuplicates.ini


Composite Records

The gadget, FormatRecordsWithDuplicates, ultimately is responsible for formatting duplicate records by retrieving representative records and their duplicates from ZBase to create composite records for display in the interface.

The following illustration depicts this process:

ZBase creates composite records

For more information about defining display specifications, rules specifications, and formatting configuration files, see WebZ Rules-Based Formatting.

Note: Duplicate detection service depends on the activiation of SupportsMultiDbQuery and SupportsMergeRead in the ZBase.ini file. Failure to set these two parameters to "true" inactivates duplicate detection.

See Also

WebZ Interface
WebZ Configuration Files


[Home] [Documentation] [Support] [Search]
Last Modified: