Main -> Documentation -> Database Builder Newton -> Creating a New SiteSearch Database -> Database Description (.dsc) File: Structure and Syntax -> Restrictor Definitions |
Restrictors allow searches to limit, or restrict, a result set without performing a second search. They are stored in the postings (POST) physical database file as a bitmap and are included in each postings entry to specify for every indexed term if a document meets a particular criteria. The Newton search engine uses this bitmap to identify the documents in a result set that match the criteria. For example, a publication year restrictor adds an offset identifying the publication year to each POST file. This allows Newton to restrict a search result to documents published in a single year or a range of years without performing another search and then ANDing the results. This improves efficiency and response time. Restrictors are defined before your database indexes in the database description (.dsc) file. When placing restrictions upon your database indexes, keep in mind that each restrictor should be focused on a meaningful category with a finite number of values. The following restrictor variables are discussed in this section:
Restrictors are not required to create a database, but a field specifying the restriction value size in bytes is required. Each byte of memory is made of eight bits. The bits are used as switches that hold a value of either "1" or "0." Bits cannot be shared among several restrictors; at least one bit is allocated to each restrictor. Refer to the mask bit value in the Restrict Variable section below for further explanation. When you are setting restriction values, you should decide how many bits each restriction you place on the database will need to store the necessary information. Remember that restrictsize is based on byte values. If you need 12 bits to store your restrictor information, you will set the restrictsize variable equal to 2 bytes, or 16 bits. Refer to the Syntax and Example sections below.
If you do not use restrictions to limit a database, the system will default restrictsize to a value of 0. The valid values for the number of bytes (up to 32 bits) are: 0, 1, 2, or 4. Syntax restrictsize: number of bytes
Example A common restrictor that users implement is a language restrictor. Suppose that every indexed term in a database contains a field that describes the language type. The majority of records in this database are stored in the English language with several other languages represented by smaller sets of records. The majority of patrons search English records only. The restriction values can be set as "english" and "nonenglish" to accommodate this collection. With only two possible values in this category, the language restrictor can have the value "english" or "non-english." Every record that contains the term "english" in the specified restrictor field will have its restrictor set to "english," and records with another language in the field will have the restrictor value of "nonenglish." The language restrictor only needs 2 bits of memory to store the necessary information about each indexed term. The software would "switch on" one bit to represent "english" by changing the value from 0 to 1, while the other bit would be used to represent "nonenglish." No other restrictions will be placed upon the indexed terms in this database. Because the restrictsize value is set to bytes and not bits, a restrict size of 1 byte (8 bits) would be defined in the restrictsize variable as shown below to allow for the 2 bits. The restrictsize variable would be written like the following:
You can define multiple restrict variables to anticipate the searching needs of your patrons. This is the primary definition to define your restrictors which in turn relies on the values in the restrictsize and restrictwords variables. The restrict variable specifications will follow the format described below. Syntax restrict(indexid): restriction routine name([parms]) from(fieldlist)\
Example Refer to the following four .dsc file restrict definitions as examples of the types of information that restrictions are placed upon and how the different definitions relate to one another.
In order, the restrictors above are placing limits on the following indexed information: language, publication type, example, and year. Notice in the mask bit value variables in the examples above that each restrictor has designated bit positions that it may use to store values. For instance, the first example uses the first two positions in the bit pattern, as denoted by the "1" value in the following syntax:
The first two positions are not used by any of the other restrictors because the memory has now been allocated to the first restrictor. As you look through the rest of the examples, you will notice that this rule is true throughout the definitions. This unique bit pattern for each of the restrict variables above serves as a thumbprint for each particular database restriction. The restrictwords variable is used in conjunction with the stroccur_strct and strmatch_strct restrict routines to set the restriction value. The terms in restrictwords are not case sensitive; all compares are performed in lower case. Syntax restrictwords(restrictword id number):\
Example The following is an example of how the restrictwords definition can be used in the .dsc file.
Notice that the restrictwords(1) definition above works directly with the restrict(3) definition example in the restrict variable section. The restrictwords id number of 1 matches the parameter value denoted within the retrict strmatch_strct routine. The two examples are shown together below for your comparison.
Additional Information about Restrictwords The positional delimiter for the restrictwords variable is the semicolon(;). As stated earlier, the positional placement of the restrictword terms corresponds to the value of the restrictor. Restrictword terms in position 1 correspond to a restriction value of 1. Multiple terms can appear in each position; they are separated by a plus (+) sign, effectively completing an "or" statement. The restriction value is set if any one of the terms in the position are found. Each text string must be delimited by double quotes ("). The question mark symbol (?) allows for truncated matching. If the text matches the restrictword term up until the question mark, it is considered a match. Negation is enabled by using the exclamation symbol (!). See Also Creating a Database Description (.dsc) File |
[Main][Documentation][Support][Technical
Reference][Community][Glossary][Search]
|