INV 1
NAME
Inv — Make an inverted index from output of mkey.
SYNOPSIS
inv [−danpv] [-hn] [-i [u] name] outfile]
DESCRIPTION
The inv program computes the hash codes and writes the inverted files. It reads the output of mkey and writes the set of files described earlier in this section. It expects one argument, which is used as the base name for the three (or four) files to be written. Assuming an argument of Index (the default) the entry file is named Index.ia, the posting file Index.ib, the tag file Index.ic, and the key file (if present) index.id.
The inv programm recognize the following options:
- -a: Append the new keys to a previous set of inverted files, making new files if there is no old set using the same base name.
- -d: Write the optional key file. This is needed when you can not check for false drops by looking for the keys in the original inputs, i.e. when the key derivation procedure is complicated and the output keys are not words from the input files.
- -hn: The hash table size is n (default 997); n should be prime. Making n bigger saves search time and spends disk space.
- -i [u] name: Take input from file name, instead of the standard input; if u is present name is unlinked when the sort is started. Using this option permits the sort scratch space to overlap the disk space used for input keys.
- -n: Make a completely new set of inverted files, ignoring previous files.
- -p: Pipe into the sort program, rather than writing a temporary input file. This saves disk space and spends processor time.
- -v: Verbose mode; print a summary of the number of keys which finished indexing.
About half the time used in inv is in the contained sort. Assuming the sort is roughly linear, however, a guess at the total timing for inv is 250 keys per second. The space used is usually of more importance: the entry file uses four bytes per possible hash (note the -h option), and the tag file around 15-20 bytes per item indexed. Roughly, the posting file contains one item for each key instance and one item for each possible hash code; the items are two bytes long if the tag file is less than 65336 bytes long, and the items are four bytes wide if the tag file is greater than 65536 bytes long. To minimize storage, the hash tables should be over-full; for most of the files indexed in this way, there is no other real choice, since the entry file must fit in memory.
FILES
@BINDIR@/inv Executable. Assuming an argument of Index (the default): Index.ia Entry file. Index.ib Posting file. Index.ic Tag file. Index.id Key file.
LICENSE
The text of this manual page comes from Some application of Inverted Indexes in the UNIX System by M. E. Lesk, which is distributed under the bsd4 license. The inv software is distributed under the cddl license.
SEE ALSO
refer(1), referformat(7), mkey(1), hunt(1), and Some application of Inverted Indexes in the UNIX System by M. E. Lesk.
AUTHORS
M. E. Lesk. Modified by Pierre-Jean Fichet