start page | rating of books | rating of authors | reviews | copyrights

Unix Power ToolsUnix Power ToolsSearch this book

16.5. Adding Words to ispell's Dictionary

ispell (Section 16.2) uses two lists for spelling verification: a master word list and a supplemental personal word list.

The master word list for ispell is normally the file /usr/local/lib/ispell/ispell.hash, though the location of the file can vary on your system. This is a "hashed" dictionary file. That is, it has been converted to a condensed, program-readable form using the buildhash program (which comes with ispell) to speed the spell-checking process.

The personal word list is normally a file called .ispell_english or .ispell_words in your home directory. (You can override this default with either the -p command-line option or the WORDLIST environment variable (Section 35.3).) This file is simply a list of words, one per line, so you can readily edit it to add, alter, or remove entries. The personal word list is normally used in addition to the master word list, so if a word usage is permitted by either list it is not flagged by ispell.

Custom personal word lists are particularly useful for checking documents that use jargon or special technical words that are not in the master word list, and for personal needs such as holding the names of your correspondents. You may choose to keep more than one custom word list to meet various special requirements.

You can add to your personal word list any time you use ispell: simply use the I command to tell ispell that the word it offered as a misspelling is actually correct, and should be added to the dictionary. You can also add a list of words from a file using the ispell -a (Section 16.3) option. The words must be one to a line, but need not be sorted. Each word to be added must be preceded with an asterisk. (Why? Because ispell -a has other functions as well.) So, for example, we could have added a list of Unix utility names to our personal dictionaries all at once, rather than one-by-one as they were encountered during spell checking.

Obviously, though, in an environment where many people are working with the same set of technical terms, it doesn't make sense for each individual to add the same word list to his own private .ispell_words file. It would make far more sense for a group to agree on a common dictionary for specialized terms and always to set WORDLIST to point to that common dictionary.

If the private word list gets too long, you can create a "munched" word list. The munchlist script that comes with ispell reduces the words in a word list to a set of word roots and permitted suffixes according to rules described in the ispell(4) reference page that will be installed with ispell from the CD-ROM [see http://examples.oreilly.com/upt3]. This creates a more compact but still editable word list.

Another option is to provide an alternative master spelling list using the -d option. This has two problems, though:

  1. The master spelling list should include spellings that are always valid, regardless of context. You do not want to overload your master word list with terms that might be misspellings in a different context. For example, perl is a powerful programming language, but in other contexts, perl might be a misspelling of pearl. You may want to place perl in a supplemental word list when documenting Unix utilities, but you probably wouldn't want it in the master word list unless you were documenting Unix utilities most of the time that you use ispell.

  2. The -d option must point to a hashed dictionary file. What's more, you cannot edit a hashed dictionary; you will have to edit a master word list and use (or have the system administrator use) buildhash to hash the new dictionary to optimize spell checker performance.

To build a new hashed word list, provide buildhash with a complete list of the words you want included, one per line. (The buildhash utility can only process a raw word list, not a munched word list.) The standard system word list, /usr/dict/words on many systems, can provide a good starting point. This file is writable only by the system administrator and probably shouldn't be changed in any case. So make a copy of this file, and edit or add to the copy. After processing the file with buildhash, you can either replace the default ispell.hash file or point to your new hashed file with the -d option.

--TOR and LK



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.