Wintertree Software Inc.

Sentry Spelling Checker Engine - Support

Home Site index Contact us Catalog Shopping Cart Products Support Search

You are here: Home > Support > Sentry Spelling Checker Engine > Profanity detection


Profanity detection

Product: Sentry Spelling Checker Engine Windows SDK, Sentry Spelling Checker Engine Source SDK

This article describes some techniques for using Sentry Spelling Checker Engine to detect the presence of specific words in text, such as profanity.

Usually, the Sentry spelling engine is used to detect words which are not present in a dictionary or lexicon of words. Words not in the dictionary are deemed to be misspelled and are reported so they can be corrected. Detecting the presence of specific words is, in certain respects, the reverse of this.

Each word in a dictionary used by the Sentry spelling engine has an action code associated with it. When a word in the text being checked matches a word in a dictionary, the Sentry engine examines the action code associated with the word and performs the indicated action. The most common action tells the Sentry engine to ignore or skip over the word, usually because the word is correctly spelled so no further action is associated with it.

Other action codes supported by the Sentry engine cause the word to be automatically or conditionally replaced with another word. These actions are typically used to "auto correct" certain frequently misspelled words, such as replacing "recieve" with "receive." Automatic or conditional replacements are not made by the Sentry engine directly. Instead, the Sentry engine reports to the calling application (i.e., your application) that the word should be replaced with another word. Normally, the calling application then makes the replacement by calling certain methods in the Sentry API, perhaps after confirming the replacement with the user. The key thing to note here is that certain words can be assigned an action code which causes the Sentry engine to report to your application when those words are encountered. This is exactly what is needed to detect the presence of those words.

Entries in text lexicons contain a word, an action, and a replacement word. Suppose you want to detect the presence of the words "dog," "cat," or "pig" in the text (we'll pretend these words are profanity). You could create a new text lexicon (by calling SSCE_CreatLex) and add three entries to it, one for each word you want to detect. Actions which can be associated with words in lexicons are listed in the Sentry programmer's guide under "Action codes." We'll use SSCE_CONDITIONAL_CHANGE_ACTION as the action. The replacement word will be an encoded string. The encoded string will tell our application that the word is profanity. To keep things simple, we'll just use "XXX" as the replacement word. The three entries can be created by calling the SSCE_AddToLex function:

SSCE_AddToLex(sid, lexId, "dog", SSCE_CONDITIONAL_CHANGE_ACTION, "XXX");
SSCE_AddToLex(sid, lexId, "cat", SSCE_CONDITIONAL_CHANGE_ACTION, "XXX");
SSCE_AddToLex(sid, lexId, "pig", SSCE_CONDITIONAL_CHANGE_ACTION, "XXX");

With this lexicon open, the SSCE_CheckString and SSCE_CheckBlock functions will return SSCE_CONDITIONAL_CHANGE_WORD_RSLT whenever "dog", "cat", or "pig" are encountered in the text. Your application can examine the function's otherWord parameter to determine if the word is profanity: If otherWord is "XXX", a match has been found.

rv = SSCE_CheckString(sid, text, cursor, errWord, sizeof(errWord), otherWord, sizeof(otherWord));
if (rv == SSCE_CONDITIONAL_CHANGE_WORD_RSLT)
{
if (strcmp(otherWord, "XXX") == 0)
{
printf("Tsk, tsk tsk!\n");
}
}

The replacement word can be any string, so additional information can be encoded in it. For example, you might want to follow "XXX" with a digit indicating the "offensiveness," with "1" meaning "mildly inappropriate" and "9" being reserved for words that would make Tony Soprano blush.

The same approach can be used in other circumstances where you want to detect the presence of certain words: Categorizing e-mail into folders, filtering spam, detecting part numbers, etc.


Home Site index Contact us Catalog Shopping Cart Products Support Search


Copyright © 2004 Wintertree Software Inc. Last modified