You are here: Home > Products > Developer tools > Sentry Spelling Checker Engine > Java > Applet White Paper
This document describes how to use the Sentry Spelling Checker Engine Java SDK to create a Java applet that can check text entered in Web pages. The technique presented here
Works in all browsers that support Java 1.1 applets
Runs entirely on the client side and requires no server side code, servlets, or CGI scripts
Works for both signed and unsigned applets
Source code for a working example applet is included with the Sentry Spelling Checker Engine Java SDK. Click here to see and use the example applet.
This document assumes familiarity with the Java language, developing Java applets, adding applets to Web pages, etc. See http://java.sun.com for a good source of information on Java and applets.
Spelling checkers work by comparing words being checked against a dictionary of works known to be spelled correctly. Any words not found in the dictionary are reported as misspelled. To avoid annoying the user with spurious error reports, the dictionary should contain most common words in a given language (see How many words should be in the spell checker's dictionary? for a discussion of dictionary size). This requirement means the dictionary must contain a large number of words -- typically, 100,000 or more. For efficiency, the dictionary should be compressed to reduce memory and disk space usage and indexed for fast access. As a result, dictionaries are implemented as large, complex data structures that are typically stored in disk files.
Most Web browsers prevent unsigned applets from accessing disk files on the local computer for security reasons. One solution to this restriction is to digitally sign the applet. However, this approach introduces complications (see "Creating signed, persistent Java applets," Dr. Dobb's Journal, Feb. 1999 for more information):
Internet Explorer and Netscape require different signature types
Netscape requires the browser user to explicitly grant permissions to the signed applet, resulting in additional complexity and confusion that discourages casual use of the applet
The dictionary files must be downloaded to the client computer and placed in a known location, or the location of the files must be configured in the applet in some way.
Two alternative approaches for accessing dictionary files exist which do not have these complications:
Store the dictionary files in the archive (JAR or ZIP file) containing the applet, and access them as resources
Store the dictionary file on the same Web server as the applet and access them as URLs.
Both of these approaches require that the dictionary files be accessed as InputStreams. Beginning with version 5.7, Wintertree Software's Sentry Spelling Checker Engine Java SDK allows lexicons (dictionaries) to be constructed from InputStreams.
One further complication exists: Netscape allows applets to access file resources in JAR or ZIP archives only if the file has an extension included in a list of acceptable extensions (see http://developer.netscape.com/docs/technote/java/getresource/getresource.html for more information). Wintertree Software's dictionaries use "clx" for compressed lexicons and "tlx" for text lexicons, neither of which are included in Netscape's list of allowed extensions. New extensions can be added to the list, but this requires Netscape-specific code which contradicts the design goal of a single solution for all browsers. A simpler solution is to rename the dictionary files to use an allowed extension, such as "t" in place of "clx" and "txt" in place "tlx".
We will assume that the applet spell-checks text contained in a Java TextArea component, and that it has a button or some other event source to start the spelling check. We will use the SpellingDialog class from Sentry's AWTDemo to interact with the user when spelling errors are detected. (We used an AWT-based applet, but JFC/Swing could be used just as well.) We'll also use the PropSpellingSession class, which is part of the Sentry class library, to construct a spelling session and initialize it from settings contained within a properties (java.util.Properties) file. A spelling session is an instance of the spell-check engine. It contains methods for checking the spelling of text, looking up suggestions for misspelled words, etc. When the applet is deployed, we will store its classes and the properties file in a JAR file.
The properties file lists the spelling options (e.g., "ignore capitalized words" or "report doubled words") and the dictionaries used by the spelling checker. More importantly, it specifies the location of the dictionaries, and the method used to access them. In this design, the dictionary files will be located on the Web server in the same directory as the Web page containing the applet. They could also be located in sub-directories, but cannot be located in higher-level directories because some browsers won't allow this. The dictionary files will be accessed through URL streams for reasons that will be given shortly. The properties-file lines that specify the location and access method for dictionaries (lexicons) might look like the following:
The properties file lines specify the name of the dictionary file (e.g., correct.tlx), the method of accessing the file ("url", meaning the files are accessed as URL streams), and the format of the dictionary ("t" for text lexicons and "c" for compressed). Note that Netscape's restriction on file extensions does not apply when files are accessed as URL streams.
We could have elected to store the dictionary files in the JAR file containing the applet. The PropSpellingSession class supports this, and the JAR-file approach does have the advantage of keeping the applet and its files together in one place. However, compressed main dictionary files tend to be large (ssceam2.clx, the American English dictionary, is over 300K). If the applet's JAR file is large, the Web page containing the applet will take a long time to load on computers with slow Internet connections. If the dictionaries are accessed as URL streams, loading of them can be deferred until the spelling check starts.
The user enters some text in the applet's TextArea, then clicks the button to start the spelling check. In response to the button press, the applet creates a PropSpellingSession object, which initializes the spelling-checker engine by setting options and opening dictionaries specified in the properties file. Because the properties file is stored in the applet's JAR file, we use getResourceAsStream, which is a method of java.lang.Class. The getResourceAsStream method locates a file in the applet's code base (the JAR file), opens it, and returns an InputStream object. The InputStream is used to load properties into the java.util.Properties object. PropSpellingSession takes care of the details required to load the dictionary files as URLs. See http://java.sun.com/products//jdk/1.1/docs/guide/misc/resources.html for more information on accessing files as resources.
Because we will be checking the contents of a TextArea component, we can use the TextAreaWordParser class which is part of Sentry's AWTDemo program. This class implements Sentry's WordParser interface, which is used by the engine to enumerate individual words in a text source. WordParser-derived classes like TextAreaWordParser also allow misspelled words to be corrected.
The next and final step for the applet is to construct a SpellingDialog object. SpellingDialog takes over from this point. It calls on the TextAreaWordParser object to obtain words from the TextArea one by one and passes them to the spelling-checker engine for checking. When it encounters a misspelled word, it displays the word and asks the engine for a set of suggested replacements, which it also displays. SpellingDialog also asks TextAreaWordParser to highlight the misspelled word in the TextArea so the user can see the word in context. The user can dispose of misspelled words by ignoring them or replacing them. Any replacements are made directly in the TextArea. When all words have been checked, the SpellingDialog closes. The TextArea contains the checked and possibly corrected text at this point.
Once the applet has been compiled and tested locally (using AppletViewer), it is ready for deployment. The applet doesn't have to be signed to support the spell-check features; of course, you can sign the applet if necessary for other purposes. The following steps are required to deploy the applet in a Web page:
Create a JAR file containing the applet's classes and properties file.
Upload the JAR file to the Web site directory where the Web page which uses the applet will reside.
Upload any dictionary files to the same directory on the Web site as the JAR file.
Upload the ssce.jar file (the Sentry class library) to the same directory on the Web site as the JAR file.
Create a Web page with an APPLET tag similar to the following:
Upload the Web page to the same directory on the Web site as the JAR file.
Open the Web page in a browser, and you should be able to enter text in the TextArea and check its spelling.
At this point, we've described how to create and deploy an applet that can check the spelling of some text, but not much else. If you need to check spelling of text entered into an existing applet, or a new applet you plan to develop, then the technique described so far will be useful to you. Presumably your applet does something useful with the text entered by the user.
Many Web pages accept text entry from the user in HTML forms. These forms typically contain "Submit" buttons that send data entered in the form via a POST operation to a CGI script on the Web server. An applet can implement the entire form as AWT (or JFC) components, and submit the text to the CGI script on the server within the applet. This is a general Java programming technique, so we will let the Java experts explain it: See http://java.sun.com/docs/books/tutorial/networking/urls/readingWriting.html.
document.emailForm.body.value = document.spellingApplet.getText();
This code would be invoked as the "onclick" attribute of a button in the form.
Copyright © 2015 Wintertree Software Inc.