Wintertree Spelling Server |
|
Home |
You are here: Home > Support > Wintertree Spelling Server > Detecting profanity
This article describes some techniques for using Spelling Server to detect the presence of specific words in text, such as profanity.
Usually, Spelling Server is used to detect words which are not present in a dictionary of words. Words not in the dictionary are deemed to be misspelled and are reported so they can be corrected. Detecting the presence of specific words is, in certain respects, the reverse of this.
Each word in a dictionary used by Spelling Server has an action code associated with it. When a word in the text being checked matches a word in a dictionary, Spelling Server examines the action code associated with the word and performs the indicated action. The most common action tells Spelling Server to ignore or skip over the word, usually because the word is correctly spelled so no further action is associated with it.
Other action codes supported by Spelling Server cause the word to be automatically or conditionally replaced with another word. These actions are typically used to "auto correct" certain frequently misspelled words, such as replacing "recieve" with "receive." Automatic or conditional replacements are not made by Spelling Server directly. Instead, Spelling Server reports to the calling application (i.e., your application) that the word should be replaced with another word. Normally, the calling application then makes the replacement by calling certain methods in Spelling Server's API, perhaps after confirming the replacement with the user. The key thing to note here is that certain words can be assigned an action code which causes Spelling Server to report to your application when those words are encountered. This is exactly what is needed to detect the presence of those words.
Entries in text dictionaries contain a word, an action, and a replacement word. Suppose you want to detect the presence of the words "dog," "cat," or "pig" in the text (we'll pretend these words are profanity). You could add three entries to a new text dictionary by calling the AddToUserDict method three times, one for each word you want to detect. Actions which can be associated with words in lexicons are listed in the Spelling Server's user guide under "How to create custom dictionaries." We'll use WSS_CONDITIONAL_CHANGE_ACTION as the action. The replacement word will be an encoded string. The encoded string will tell our application that the word is profanity. To keep things simple, we'll just use "XXX" as the replacement word. The three entries can be created by calling the AddToUserDict method:
object.AddToUserDict("profanity.tlx", "dog", WSS_CONDITIONAL_CHANGE_ACTION, "XXX");
object.AddToUserDict("profanity.tlx", "cat", WSS_CONDITIONAL_CHANGE_ACTION, "XXX");
object.AddToUserDict("profanity.tlx", "pig", WSS_CONDITIONAL_CHANGE_ACTION, "XXX");The next step is to add profanity.tlx to the set of dictionaries used by Spelling Server. This is done by changing the Dictionaries string value in the system registry for whatever language id you are using in your application. If you are using American English, you would change the Dictionaries value under HKEY_LOCAL_MACHINE\Software\Wintertree\SpellingServer\Languages\24941. By default, the Dictionaries value for this language would contain "ssceam.tlx;ssceam2.clx" (and possibly one or two other dictionary files as well). You can add the profanity.tlx file by changing the Dictionaries value to:
"ssceam.tlx;ssceam2.clx;profanity.tlx"
The idea is to add profanity.tlx to the end of the list, separated from the last existing dictionary file name by a semicolon (;). Note that you will also need to move profanity.tlx to the folder indicated by the DictionaryPath value, which is "\Program Files\Wintertree Spelling Server" by default. After changing the registry, it is necessary to restart Spelling Server, as it reads the registry only on startup.
With profanity.tlx in use, the CheckText method will return WSS_CONDITIONAL_CHANGE_WORD_RSLT whenever "dog", "cat", or "pig" are encountered in the text. Your application can examine CheckText's otherWord parameter to determine if the word is profanity: If otherWord is "XXX", a match has been found.
rv = object.CheckText(text, lang, options, userDictFileName, ignoreChangeList, cursor, word, otherWord)
if (rv = WSS_CONDITIONAL_CHANGE_WORD_RSLT) Then
if (otherWord = "XXX") Then
Response.Write("Tsk, tsk tsk!<BR>")
End If
End If
The replacement word can be any string, so additional information can be encoded in it. For example, you might want to follow "XXX" with a digit indicating the "offensiveness," with "1" meaning "mildly inappropriate" and "9" being reserved for words that would make Tony Soprano blush.
The same approach can be used in other circumstances where you want to detect the presence of certain words: Categorizing e-mail into folders, filtering spam, detecting part numbers, etc.
Copyright © 2005 Wintertree Software Inc. Last modified