Main -> Documentation -> Database Builder – Newton -> Creating a New SiteSearch Database -> Database Description (.dsc) File: Structure and Syntax -> Restrictor Definitions

Restrictor Definitions

Restrictors allow searches to limit, or restrict, a result set without performing a second search. They are stored in the postings (POST) physical database file as a bitmap and are included in each postings entry to specify for every indexed term if a document meets a particular criteria. The Newton search engine uses this bitmap to identify the documents in a result set that match the criteria. For example, a publication year restrictor adds an offset identifying the publication year to each POST file. This allows Newton to restrict a search result to documents published in a single year or a range of years without performing another search and then ANDing the results. This improves efficiency and response time.

Restrictors are defined before your database indexes in the database description (.dsc) file. When placing restrictions upon your database indexes, keep in mind that each restrictor should be focused on a meaningful category with a finite number of values.

The following restrictor variables are discussed in this section:

  • Restrictsize Variable - specifies the total size of the bitmaps that hold all restrictor values.
  • Restrict Variable - specifies the specific restriction routine, such as lang_strct, ifthere_strct, or strmatch_strct, that will outline the preferences involved for limiting a search.
  • Restrictwords Variable - works in conjunction with two of the restrict routines to set the restriction value.
Note:

The restrictsize variable should be included in every .dsc file, regardless of whether or not you plan to define limits for searching. The restrict and restrictwords variables are only used if you include restrictions for your database.

Restrictsize Variable

Restrictors are not required to create a database, but a field specifying the restriction value size in bytes is required. Each byte of memory is made of eight bits. The bits are used as switches that hold a value of either "1" or "0." Bits cannot be shared among several restrictors; at least one bit is allocated to each restrictor. Refer to the mask bit value in the Restrict Variable section below for further explanation.

When you are setting restriction values, you should decide how many bits each restriction you place on the database will need to store the necessary information. Remember that restrictsize is based on byte values. If you need 12 bits to store your restrictor information, you will set the restrictsize variable equal to 2 bytes, or 16 bits. Refer to the Syntax and Example sections below.

Note:

The restrictsize variable is used only to state how much memory you will need to store your restrictors. The bit slots of memory are actually allocated for a specific restrictor by using the mask bit value in the restrict variable.

If you do not use restrictions to limit a database, the system will default restrictsize to a value of 0. The valid values for the number of bytes (up to 32 bits) are: 0, 1, 2, or 4.

Syntax

restrictsize: number of bytes

Element

Description

restrictsize:

Begins the restrictsize definitions in the .dsc file.

number of bytes

Sets the size (in bytes) available for the restriction values.

Example

A common restrictor that users implement is a language restrictor. Suppose that every indexed term in a database contains a field that describes the language type. The majority of records in this database are stored in the English language with several other languages represented by smaller sets of records. The majority of patrons search English records only. The restriction values can be set as "english" and "nonenglish" to accommodate this collection. With only two possible values in this category, the language restrictor can have the value "english" or "non-english." Every record that contains the term "english" in the specified restrictor field will have its restrictor set to "english," and records with another language in the field will have the restrictor value of "nonenglish."

The language restrictor only needs 2 bits of memory to store the necessary information about each indexed term. The software would "switch on" one bit to represent "english" by changing the value from 0 to 1, while the other bit would be used to represent "nonenglish." No other restrictions will be placed upon the indexed terms in this database. Because the restrictsize value is set to bytes and not bits, a restrict size of 1 byte (8 bits) would be defined in the restrictsize variable as shown below to allow for the 2 bits. The restrictsize variable would be written like the following:

restrictsize: 1

Restrict Variable

You can define multiple restrict variables to anticipate the searching needs of your patrons. This is the primary definition to define your restrictors which in turn relies on the values in the restrictsize and restrictwords variables. The restrict variable specifications will follow the format described below.

Syntax

restrict(indexid): restriction routine name([parms]) from(fieldlist)\
mask(mask bit value) [terms(termlist)] [norm(normalization value)]\
[init(init restrictor value)]

Element

Description

indexid

Displays the index number assigned to the restrictor. The index number must be numeric and between 1 and 254.

This variable is required.

restriction routine name

Invokes a routine, with any necessary parameters, in the Database Builder software to set the restriction value. See Restrict Routines.

This variable is required.

from(fieldlist)

Contains a list of fields that the restriction function uses to compute values. The format of the fieldlist is an ASN.1/BER tag path. Field 245 subfield "1" is specified as "245/1." Multiple ASN.1/BER tag paths can be specified; use a comma-space to separate them. For example, "245/1, 650/1, 100/1" specifies multiple fields.

This variable is required.

mask(mask bit value)

Contains a bit string with as many bits as declared in the restrictsize variable. For example, if restrictsize equals 1 (byte), the bit string would contain 8 positions. For each restriction, the bits are turned "on" for the number of positions needed to hold that particular restriction value.

For example, a restriction can have 3 possible values, and the restrictsize is 2 (bytes), the mask bit value is "1110000000000000." The first 3 bit positions cannot be used by any other restrictions specified in this database. The next bits available in this example start in bit position 4.

This variable is required.

terms(termlist)

Contains the text that the system will search on to retrieve sets of restricted records. The positional placement of the text corresponds to the restriction value. For example, the text in the second position of the termlist corresponds to a restriction value of 2. The syntax of the termlist is as follows:

terms("term1","term2","term3")

This variable is required.

norm(normalization value)

Represents a numeric base value that is used to compute the restriction. It is subtracted from the numeric restriction value and that result is stored as the restriction. Using the normalization value results in fewer bit positions to store the restriction.

init(init restrictor value)

Allows the user to pre-define the restriction value. The system defaults the result to a zero value, but if the user wants to initialize the restriction to another value before processing, this option allows it. This is useful in conjunction with the ifthere_strct function. The result can be initialized to one value, and the restriction only gets changed if the specified field occurs in the record.

Example

Refer to the following four .dsc file restrict definitions as examples of the types of information that restrictions are placed upon and how the different definitions relate to one another.

restrict(1): lang_strct(2) from(245/1) mask(1100000000000000)\
terms("english","nonenglish")
restrict(2): strmatch_strct(1) from(100/1) mask(0011100000000000)\
terms("bks","ser","med","mss","rec","mrf","map")
restrict(3): ifthere_strct(1) from(200/2) mask(0000011000000000)\
terms("datathere","nodata") init(2)
restrict(4): year_strct(3) from(011/1) mask(0000000111111111)\
norm(1900)

In order, the restrictors above are placing limits on the following indexed information: language, publication type, example, and year. Notice in the mask bit value variables in the examples above that each restrictor has designated bit positions that it may use to store values. For instance, the first example uses the first two positions in the bit pattern, as denoted by the "1" value in the following syntax:

mask(1100000000000000)

The first two positions are not used by any of the other restrictors because the memory has now been allocated to the first restrictor. As you look through the rest of the examples, you will notice that this rule is true throughout the definitions. This unique bit pattern for each of the restrict variables above serves as a thumbprint for each particular database restriction.

Restrictwords Variable

The restrictwords variable is used in conjunction with the stroccur_strct and strmatch_strct restrict routines to set the restriction value. The terms in restrictwords are not case sensitive; all compares are performed in lower case.

Syntax

restrictwords(restrictword id number):\
"text1"[+]"text2";"text3 and text4";"[!]text5";"text[?]";

Element

Description

restrictword id number

Assigns an id number to the entry that corresponds to the parameter in the stroccur_strct or strmatch_strct function.

text n

Refers to the text strings defined within the restrict variable. This variable may be defined in any of the methods shown in the syntax above to manipulate the text strings appropriately for searching.

Example

The following is an example of how the restrictwords definition can be used in the .dsc file.

restrictwords(1): "textstring1";"!textstring1";
restrictwords(2): "textstring1"+"textstring2";"textstring3?";

Notice that the restrictwords(1) definition above works directly with the restrict(3) definition example in the restrict variable section. The restrictwords id number of 1 matches the parameter value denoted within the retrict strmatch_strct routine. The two examples are shown together below for your comparison.

restrict(3): strmatch_strct(1) from(100/1) mask(0011100000000000)\
terms("bks","ser","med","mss","rec","mrf","map")
restrictwords(1): "textstring1";"!textstring1";

Additional Information about Restrictwords

The positional delimiter for the restrictwords variable is the semicolon(;). As stated earlier, the positional placement of the restrictword terms corresponds to the value of the restrictor. Restrictword terms in position 1 correspond to a restriction value of 1. Multiple terms can appear in each position; they are separated by a plus (+) sign, effectively completing an "or" statement. The restriction value is set if any one of the terms in the position are found. Each text string must be delimited by double quotes ("). The question mark symbol (?) allows for truncated matching. If the text matches the restrictword term up until the question mark, it is considered a match. Negation is enabled by using the exclamation symbol (!).

See Also

Creating a Database Description (.dsc) File
Database Description (.dsc) File: Structure and Syntax
Database Description (.dsc) File Example


[Main][Documentation][Support][Technical Reference][Community][Glossary][Search]

Last Modified: