Main -> Support Zone -> InSite Archives -> October 1997 Issue |
Issue # 4 October 1, 1997 Contents
InSite is intended to create discussion about the use and implementation of Open SiteSearch software and explore the technology used by its components. Each issue will be available from the SiteSearch Web pages on a regular basis and is designed as an informational resource for SiteSearch system administrators. It is brought to you by the SiteSearch support team. Any suggestions or comments are appreciated. Top StorySiteSearch 4.0Over the past few months, we have been planning and developing Open SiteSearch version 4.0. This version will be a complete redesign of the software using the Java programming language. Version 4.0 will retain all of the features of version 3.1, and bring many new benefits such as: simultaneous, cross-database searching and browsing, support for a wide range of platforms, use of Java in place of OCLC's proprietary WebZ entity language and Overcite Formatting Language. We are excited about the new architecture and want to give you updates frequently. Recently, we sent letters to the library directors and main technical contacts with an overview of Open SiteSearch Version 4.0, important information about support, and a Question and Answer document. Links to these letters are included here. We are interested in working with you to develop conversion plans for implementing the new version and hope to hear from you with any questions or concerns. Software Enhancements and Fixes for Release 3.1.1As Open SiteSearch system administrator's report bugs or ask for enhancement requests, our development team makes changes to the SiteSearch software package. The newest patch release to version 3.1, 3.1.1 will be released this week. We will be providing each site with a new set of SiteSearch executables. In this release, we have also made some changes to the Out-of-Box Interface by enhancing the advanced search page. You can see this page on our demonstration site. If you would like to receive the files to implement this version of the advanced search, please contact SiteSearch Support. Patch release version 3.1.1 includes the following enhancements and bug fixes:
Changes to the Out-of-the-Box Interface can be viewed from the SiteSearch demo: http://cypress.dev.oclc.org:8000 FAQ'sIn each issue of InSite, we highlight some frequently asked support questions that are of general interest. . To report an Open SiteSearch problem send an email describing the problem and components affected to . Direct Access to Database Choice Screen with IP AuthorizationQ: When using IP and Autho/Password control together, how do we set up the profile server to automatically put the user with a valid IP address from the opening OBI screen to the database choice screen? A: The profile server and WebZ are designed to check either IP only, autho/password only, or both. It is possible to set up the profile server and WebZ to take the end-user directly to the database choice screen by performing the autho/password login for them. This involves making some changes to your profile database and some html navigation changes. To set up WebZ to grant direct IP access to your database choice screen, you'll need to do the following:
Tips & TechniquesAre Web Robots Playing Nice with your OCLC SiteSearch System?What is a web robot? Robots are programs that act as autonomous agents, retrieving information (web pages) in an algorithmic manner. These robots are used by web search engines or locator services (such as Alta Vista or Lycos) to locate and update the content of their databases. How do they work? Starting from a URL, they retrieve the associated page. From this page, the robot distills an abstract, either from the page's content directly or from <META> tags, and sets a link to the page. The algorithm that determines how deep to follow these links may vary, but it is usually never a level or two below the starting point. As the robot follows a link it recursively applies the process, locating abstracts and creating more links to follow. This process continues until they reach the desired depth or they hit a dead end (a page that contains no additional links). How are these robots controlled? Because robots are programs that can consume and request pages at an accelerated rate over their human counter parts, they do need to be controlled. The harvesting process can steal all (or a large portion) of a site resources (CPU, throughput), until the process is complete. This can cause long delays or even outages for actual users of the site. Granted, you may be receiving some benefit from the robot, (enabling the public with a way to locate your information resource), but you may want to control how it is accomplished. Web administrators can determine with more intelligence than the robot's algorithm what material is of most value for the index. Information about how to control robots can be found in the Robot Exclusion Standard maintained by Martijn Koster (see http://info.webcrawler.com/mak/projects/robots/robots.html). How do robots work with WebZ? Robot technology has a shortcoming when visiting sites like WebZ where the content is dynamically generated. This is why we need to control where to direct our automated friends. You can control robots by creating a file named, "robots.txt" in your SiteSearch working directory. A well-behaved robot will request this file after retrieving the initial home page. This file informs the robot as to what robots are allowed to access your site and what directories are available for indexing or harvesting. For example: User-agent: * #
All robots are allowed This example file would instruct all robots that the only available page is the homepage and it should not traverse any lower. The home page is usually is the only page in a WebZ site that would be informative because of the stateful session based content of our pages. This control also helps the robot to be more effective preventing unnecessary use of resources. FTP ResourcesNew Repository for Remote Z39.50 Server ConfigurationsAccessing remote Z39.50 servers with Open SiteSearch WebZ is a major component of many of your projects. We are committed to interoperability with all Z39.50 servers and work at testing and improving compatibility when we can. Through this testing, we have set up many of the vendor specific, WebZ configuration settings required for each server. We are starting a repository of remote server configurations that you may use with WebZ to set up access to different Z39.50 servers. We will start by adding 7 of the servers that have been extensively tested and are currently part of either the OBI package or our interoperability demo. Please note that vendor configuration requirements may change as they update and improve their Z39.50 servers. We will update the templates as we find changes. We also ask for your help in adding to the collection and keeping them up to date and accurate. The following Z39.50 server configuration templates are available on the anonymous ftp server: Ameritech Horizon, Ameritech NOTIS, DRA, GEAC Advance, Innovative Interfaces, Library of Congress, and Sirsi. The following vendors' servers are tested and will soon to be added: Ebsco, IAC, Ovid, SilverPlatter, H.W. Wilson, PALS, MulitLIS/DRA. Currently in testing (issues to resolve): Dynix, Carl, Endeavor. Each template includes the database index information that we have for each vendor and a server section with the httpgate.ini specific variables required for that server. These templates are not complete until you add the database information, server host, and appropriate port number. Please refer to the SiteSearch Operators Guide, Chapter 7 or the WebZ Operator's Guide, Chapter 4 for more information. To access the OCLC anonymous FTP server:
Java ResourcesWe have compiled a listing of Java resources that our software developers have utilized in the design and implementation of Open SiteSearch 4.0. When viewing these diverse sources, we suggest focusing on understanding the syntax and core of Java. Version 4.0 will not be using applets or the Abstract Windowing Toolkit (AWT) in the new architecture. Web Sites of Interest:Digital Focus. How Do I
? 1996. Harold, Elliotte Rusty. Café au Lait. 1997. Harold, Elliotte Rusty. Brewing Java: A Tutorial.
1997. Java Compiler Compiler. The Java Parser Generator.
1997. JavaWorld: IDGs magazine for the Java community.
1997. Newmarch, Jan. GUI Programming using Java 1.1. 1997. Pietrowicz, Stephen R. JNN: The Java News Network.
1997. Reith, Markus. Java Developers Page. Richmond, Alan & Lucy. The Web Developers
Vitural Library. Learning to write Java. 1997. Silieceo, Omar P. The Java programming language. 1996. Sun Microsystems, Inc. The Java Tutorial:
Object-Oriented Programming for the Internet. 1997. Sun Microsystems, Inc. The Source for Java. 1997. Bibliography:Arnold, Ken. The Java programming language. Reading, Mass.: Addison-Wesley Pub. Co., 1996. Cornell, Gary. Core Java. 2nd ed. Mountain View, Calif.: SunSoft Press, 1997. Chan, Patrick. The Java class libraries: an annotated reference. 2nd ed. Reading, Mass.: Addison-Wesley, 1998. Flanagan, David. Java in a nutshell: a desktop quick reference. 2nd ed. Cambridge ; Sebastopol, CA: O'Reilly & Associates, 1997. Grand, Mark. Java language reference. 2nd ed., updated for Java 1.1. Cambridge; Sebastopol, Calif.: O'Reilly, 1997. Lemay, Laura. Teach yourself Java 1.1 in 21 days. 2nd ed. Indianapolis, Ind.: Sams.net, 1997. SiteSearch training class scheduleSiteSearch WebZ, ZSS, & ISP - November 10-14, 1997 - Dublin, Ohio Note: Training class schedules are subject to change. For more information or to register for upcoming classes, please contact . Questions or comments regarding InSite should be sent to with 'InSite' in the subject line. |
[Main][Documentation][Support][Technical Reference][Glossary][Search] |