Main -> Support Zone -> InSite Archives -> October 1997 Issue

InSite: Technical News for SiteSearch Users

Issue # 4

October 1, 1997

Contents


InSite is intended to create discussion about the use and implementation of Open SiteSearch software and explore the technology used by its components. Each issue will be available from the SiteSearch Web pages on a regular basis and is designed as an informational resource for SiteSearch system administrators. It is brought to you by the SiteSearch support team. Any suggestions or comments are appreciated.

Top Story

SiteSearch 4.0

Over the past few months, we have been planning and developing Open SiteSearch version 4.0. This version will be a complete redesign of the software using the Java programming language. Version 4.0 will retain all of the features of version 3.1, and bring many new benefits such as: simultaneous, cross-database searching and browsing, support for a wide range of platforms, use of Java in place of OCLC's proprietary WebZ entity language and Overcite Formatting Language.

We are excited about the new architecture and want to give you updates frequently. Recently, we sent letters to the library directors and main technical contacts with an overview of Open SiteSearch Version 4.0, important information about support, and a Question and Answer document. Links to these letters are included here. We are interested in working with you to develop conversion plans for implementing the new version and hope to hear from you with any questions or concerns.

Software Enhancements and Fixes for Release 3.1.1

As Open SiteSearch system administrator's report bugs or ask for enhancement requests, our development team makes changes to the SiteSearch software package. The newest patch release to version 3.1, 3.1.1 will be released this week. We will be providing each site with a new set of SiteSearch executables.

In this release, we have also made some changes to the Out-of-Box Interface by enhancing the advanced search page. You can see this page on our demonstration site. If you would like to receive the files to implement this version of the advanced search, please contact SiteSearch Support.

Patch release version 3.1.1 includes the following enhancements and bug fixes:

  1. Advanced Search Page: A new advanced search screen was developed to add more pull-down search options. Limits were also added to the search page for language, data, and record format.
  2. Httpgate: Truncated phrase searches are now properly sent to remote Z39.50 servers.
  3. Httpman: ClientHostName information is correctly sent to httpgate on all platforms.
  4. Zdemo: An enhancement was made to enable requests for UNIMARC records.
  5. CGI scripts: The bookmarking files, mark.pl and unmark.pl, now work with the Virtual Catalog, VCat.
  6. FCL files: Marcbrief_webz.fcl now puts the "</A>" tag in the proper place within the brief Marc record display. This change only affects the way information is displayed with the Lynx text browser.
  7. Overcite library: Quadrupled the buffer containing translate text tables from 4k to 16k. Also, fixed a bug which caused Overcite to behave improperly when formatting a record with large numbers of diacritics.
  8. Irpserv: Brief SUTRS records are now being returned properly. Also, irpserv now recognizes requests for UNIMARC records.
  9. New remote Z39.50 servers were added to the SiteSearch Installation packages: Iowa State University (NOTIS), New York University (GEAC), and Lebanon Valley College (Sirsi).

Changes to the Out-of-the-Box Interface can be viewed from the SiteSearch demo: http://cypress.dev.oclc.org:8000

FAQ's

In each issue of InSite, we highlight some frequently asked support questions that are of general interest. . To report an Open SiteSearch problem send an email describing the problem and components affected to .

Direct Access to Database Choice Screen with IP Authorization

Q: When using IP and Autho/Password control together, how do we set up the profile server to automatically put the user with a valid IP address from the opening OBI screen to the database choice screen?

A: The profile server and WebZ are designed to check either IP only, autho/password only, or both. It is possible to set up the profile server and WebZ to take the end-user directly to the database choice screen by performing the autho/password login for them. This involves making some changes to your profile database and some html navigation changes.

To set up WebZ to grant direct IP access to your database choice screen, you'll need to do the following:

  1. Set up obihome.html to have only the "Logon as" button - this will perform the ip authorization and if valid, takes the user to the database selection screen.

  2. Set up the "bad" widget of the "LOGIN" verb to point to an authorization page (e.g., reautho.html) so the user can enter "Userid/Password" and if valid, takes the user to the database selection screen. If the "Userid/Password" is not valid, you can send the user to an authorization failed page (e.g., authofail.html).

  3. In your profile data for the profile database, set up specific passwords for the ip address:

a. Add the following to your profile data information:
aut=ip:132.174.19.*,autho=autoautho,password=autopass,
(any other information for this ip address)
....
aut=autoautho,pwd=autopass
....

b. In pro_labels.ini in the [httpgate] section, add:
var<n>=autho
var<n>=password
where "n" is the next available number in the list

Note: you can change "autoautho" and "autopass" to you own choices, but be sure to follow the naming guidelines defined in the "SiteSearch Operator Guide - Using the SiteSearch Profile Server - Table 9-2 Pre-Defined Label Definitions".

  1. Rebuild your profile database.

Tips & Techniques

Are Web Robots Playing Nice with your OCLC SiteSearch System?

What is a web robot? Robots are programs that act as autonomous agents, retrieving information (web pages) in an algorithmic manner. These robots are used by web search engines or locator services (such as Alta Vista or Lycos) to locate and update the content of their databases.

How do they work? Starting from a URL, they retrieve the associated page. From this page, the robot distills an abstract, either from the page's content directly or from <META> tags, and sets a link to the page. The algorithm that determines how deep to follow these links may vary, but it is usually never a level or two below the starting point. As the robot follows a link it recursively applies the process, locating abstracts and creating more links to follow. This process continues until they reach the desired depth or they hit a dead end (a page that contains no additional links).

How are these robots controlled? Because robots are programs that can consume and request pages at an accelerated rate over their human counter parts, they do need to be controlled. The harvesting process can steal all (or a large portion) of a site resources (CPU, throughput), until the process is complete. This can cause long delays or even outages for actual users of the site. Granted, you may be receiving some benefit from the robot, (enabling the public with a way to locate your information resource), but you may want to control how it is accomplished. Web administrators can determine with more intelligence than the robot's algorithm what material is of most value for the index. Information about how to control robots can be found in the Robot Exclusion Standard maintained by Martijn Koster (see http://info.webcrawler.com/mak/projects/robots/robots.html).

How do robots work with WebZ? Robot technology has a shortcoming when visiting sites like WebZ where the content is dynamically generated. This is why we need to control where to direct our automated friends. You can control robots by creating a file named, "robots.txt" in your SiteSearch working directory. A well-behaved robot will request this file after retrieving the initial home page. This file informs the robot as to what robots are allowed to access your site and what directories are available for indexing or harvesting. For example:

User-agent: *       # All robots are allowed
Disallow: /            # Only the homepage please

This example file would instruct all robots that the only available page is the homepage and it should not traverse any lower. The home page is usually is the only page in a WebZ site that would be informative because of the stateful session based content of our pages. This control also helps the robot to be more effective preventing unnecessary use of resources.

FTP Resources

New Repository for Remote Z39.50 Server Configurations

Accessing remote Z39.50 servers with Open SiteSearch WebZ is a major component of many of your projects. We are committed to interoperability with all Z39.50 servers and work at testing and improving compatibility when we can. Through this testing, we have set up many of the vendor specific, WebZ configuration settings required for each server.

We are starting a repository of remote server configurations that you may use with WebZ to set up access to different Z39.50 servers. We will start by adding 7 of the servers that have been extensively tested and are currently part of either the OBI package or our interoperability demo. Please note that vendor configuration requirements may change as they update and improve their Z39.50 servers. We will update the templates as we find changes. We also ask for your help in adding to the collection and keeping them up to date and accurate.

The following Z39.50 server configuration templates are available on the anonymous ftp server: Ameritech Horizon, Ameritech NOTIS, DRA, GEAC Advance, Innovative Interfaces, Library of Congress, and Sirsi.

The following vendors' servers are tested and will soon to be added: Ebsco, IAC, Ovid, SilverPlatter, H.W. Wilson, PALS, MulitLIS/DRA.

Currently in testing (issues to resolve): Dynix, Carl, Endeavor.

Each template includes the database index information that we have for each vendor and a server section with the httpgate.ini specific variables required for that server. These templates are not complete until you add the database information, server host, and appropriate port number. Please refer to the SiteSearch Operators Guide, Chapter 7 or the WebZ Operator's Guide, Chapter 4 for more information.

To access the OCLC anonymous FTP server:

  1. ftp://ftp.rsch.oclc.org
  2. Name= anonymous; Password= <your email address>
  3. cd to /pub/SiteSearch/VENDORini
  4. Each vendor will have a single ini file with the index and server information. Some of the template sections contain comments that you should read and remove before installing.
  5. Download the templates that you need (e.g., get DRA.ini)

Java Resources

We have compiled a listing of Java resources that our software developers have utilized in the design and implementation of Open SiteSearch 4.0. When viewing these diverse sources, we suggest focusing on understanding the syntax and core of Java. Version 4.0 will not be using applets or the Abstract Windowing Toolkit (AWT) in the new architecture.

Web Sites of Interest:

Digital Focus. How Do I … ? 1996.
http://www.digitalfocus.com/digitalfocus/faq/howdoi.html

Harold, Elliotte Rusty. Café au Lait. 1997.
http://sunsite.unc.edu/javafaq/

Harold, Elliotte Rusty. Brewing Java: A Tutorial. 1997.
http://sunsite.unc.edu/javafaq/javatutorial.html

Java Compiler Compiler. The Java Parser Generator. 1997.
http://www.suntest.com/Jack/

JavaWorld: IDG’s magazine for the Java community. 1997.
http://www.javaworld.com/

Newmarch, Jan. GUI Programming using Java 1.1. 1997.
http://pandonia.canberra.edu.au/java/tut/tut2.html

Pietrowicz, Stephen R. JNN: The Java News Network. 1997.
http://lightyear.ncsa.uiuc.edu/~srp/java/javabooks.html

Reith, Markus. Java Developer’s Page.
http://www.ping.de/sites/maxwell/links/JAVA/java.html

Richmond, Alan & Lucy. The Web Developer’s Vitural Library. Learning to write Java. 1997.
http://WWW.Stars.com/Authoring/Java/

Silieceo, Omar P. The Java programming language. 1996.
http://acm.org/~ops/java.html

Sun Microsystems, Inc. The Java Tutorial: Object-Oriented Programming for the Internet. 1997.
http://java.sun.com:80/docs/books/tutorial/index.html

Sun Microsystems, Inc. The Source for Java. 1997.
http://www.javasoft.com/

Bibliography:

Arnold, Ken. The Java programming language. Reading, Mass.: Addison-Wesley Pub. Co., 1996.

Cornell, Gary. Core Java. 2nd ed. Mountain View, Calif.: SunSoft Press, 1997.

Chan, Patrick. The Java class libraries: an annotated reference. 2nd ed. Reading, Mass.: Addison-Wesley, 1998.

Flanagan, David. Java in a nutshell: a desktop quick reference. 2nd ed. Cambridge ; Sebastopol, CA: O'Reilly & Associates, 1997.

Grand, Mark. Java language reference. 2nd ed., updated for Java 1.1. Cambridge; Sebastopol, Calif.: O'Reilly, 1997.

Lemay, Laura. Teach yourself Java 1.1 in 21 days. 2nd ed. Indianapolis, Ind.: Sams.net, 1997.

SiteSearch training class schedule

SiteSearch WebZ, ZSS, & ISP - November 10-14, 1997 - Dublin, Ohio

Note: Training class schedules are subject to change. For more information or to register for upcoming classes, please contact .

Questions or comments regarding InSite should be sent to with 'InSite' in the subject line.

[return to top]


[Main][Documentation][Support][Technical Reference][Glossary][Search]