ThesDB Thesaurus Engine |
You are here: Home > Support > ThesDB Thesaurus Engine > About the thesaurus structure and file format
In ThesDB, a thesaurus is a collection of word categories. A category contains words with the same specific meaning. A category also has a descriptive name, a word class (part-of-speech designation), and optionally the name of an antonym category. For example, a category named "happy" might contain words of class "adjective" such as "joyful," "delighted," "elated," etc. It might have a category named "sad" as its antonym.
Given a key word (a word for which you want to locate synonyms), ThesDB can determine which categories contain synonyms for it. Given a category name, ThesDB can produce the set of synonyms it contains, its word class, and the name of its antonym category (if one exists). This organization is needed because a key word might have several meanings. Each meaning is represented by its own category.
The following table shows some possible categories and synonyms related to the key word "set."
Category |
Synonyms |
place (verb) |
locate, place, post, situate, stand |
collection (noun) |
assemblage, assembly, assortment, band, bloc, body, bunch, collection, collage, corps... |
permanent (adjective) |
abiding, constant, enduring, everlasting, fixed, immutable, lasting, permanent, perpetual, persistent, unchangable... |
In ThesDB, categories are groups of words having the same meaning. The words in each category are of the same class or part of speech: nouns, verbs, adjectives, or adverbs. Each category has a unique name. The name is used to identify a specific category.
Each category name actually consists of two parts: a description and a word class. The two parts are separated by a period, with the word class at the end. The word class is represented by the following abbreviations:
Word Class |
Abbreviation |
Noun |
n |
Verb |
v |
Adjective |
adj |
Adverb |
adv |
The descriptive part is a word or short phrase that best represents the meaning of the synonyms contained within the category. The word or phrase is in the same word class as the category itself, so it serves as a representative of the synonyms in the class. In almost all cases, the descriptive part of the category name appears as a synonym in the category (e.g., category happy.adj contains the synonym "happy").
The following examples show how these rules are applied:
Category Name |
Description |
happiness.n |
Category containing nouns synonymous with happiness. |
happy.adj |
Category containing adjectives synonymous with happy. |
ThesDB searches for key words, categories, and antonyms in open thesaurus files. All of the open thesaurus files are treated like one large thesaurus (except when searching for antonyms, which is described below). When your application searches for a key word, all open thesauri are searched. When your application requests the synonyms contained within a category, and that category is defined several times in several open thesauri, then the synonyms from each instance of the category are merged.
The following table shows two thesaurus files and the categories and synonyms they contain.
File: MYTHES.TTH
Category |
Antonym Category |
Synonyms |
happy.adj |
sad.adj |
happy, joyful |
sad.adj |
(None) |
lachrymose, pensive, sad, sullen |
File: MAIN.CTH
Category |
Antonym Category |
Synonyms |
happy.adj |
melancholy.adj |
bubbly, delighted, ecstatic, happy |
indifferent.adj |
(None) |
indifferent, nonchalant, unemotional |
melancholy.adj |
happy.adj |
lachrymose, melancholy, pensive |
sad.adj |
(None) |
depressed, melancholy, sad |
Action |
Result |
Notes |
Look up key word "pensive" |
Found in categories melancholy.adj, sad.adj |
Found in categories in each open thesaurus |
Get synonyms for category sad.adj |
depressed, lachrymose, melancholy, pensive, sullen, sad |
Synonyms merged from sad.adj in MYTHES.TTH and MAIN.CTH. "Sad" appears only once, even though it was defined in both thesaurus files. |
A category may optionally contain the name of an antonym category. Your application can request ThesDB to locate the name of an antonym category associated with a regular (non-antonym) category. ThesDB searches all open thesauri for a category with the specified name. When it finds one, it checks whether the category has an antonym name defined. If so, ThesDB stops searching and returns that antonym name. If not, the search continues. The thesauri are searched in the order in which they were opened. Note that if a category with the same name exists in two different thesauri, and each category has a different antonym name, then the antonym name of the first category will be returned.
The following table shows how this works, using the previous example thesaurus files:
Action |
Result |
Notes |
Find antonym for happy.adj |
sad.adj |
First occurrence of happy.adj has antonym sad.adj defined. |
Find antonym for sad.adj |
happy.adj |
First occurrence of sad.adj (in MYTHES.TTH) contains no antonym, so searching continues until sad.adj is found in MAIN.CTH. |
ThesDB thesaurus files come in two formats: text and compressed. Text thesaurus files are ASCII files which can be modified at run time. Compressed thesaurus files contain binary data and are read-only at run time. We use the extensions "TTH" for text thesaurus files and "CTH" for compressed. These are conventions only feel free to use whatever extensions you prefer.
Text-format thesaurus files are stored in a special layout defined by ThesDB. Text files submitted to the thesaurus compression utility program must also be stored in this layout. This section defines the layout of text-format thesaurus files.
A thesaurus file contains zero or more categories. Each category contains a category name, an optional antonym category name, and a set of zero or more synonyms. The categories within the thesaurus file must be in alphabetical order by category name.
A category is represented in the file by a category definition line followed by zero or more lines containing comma-separated synonyms. A category definition line starts with a colon (":") in column one. The category name starts in column two.
The antonym category name, if defined, follows the category name and is preceded by a forward slash ("/"). The antonym category name immediately follows the slash. The slash may be separated from the end of the category name by zero or more spaces. Following are some example valid category definition lines:
:happy.adj
:happy.adj/sad.adj
:happy.adj /sad.adj
The category name and antonym name can each contain up to 31 characters. The names can contain any printable ASCII character (including spaces) except forward slashes. The names can contain only one period ("."), which is used to separate the descriptive part of the category name from the word class. Case is not significant.
The word class part of a category name can contain one to four alphabetic characters. Although any combination can be used, the following class abbreviations are predefined:
adj: Adjective
The set of synonyms contained by the category immediately follows the category definition line. Each synonym is terminated by a comma (the terminating comma is optional for the last synonym in a category). More than one synonym can appear on a line. The set of synonyms can span multiple lines, but multi-word synonyms should appear entirely on one line. No fixed limit exists on the number of synonyms per category (under MS-DOS and 16-bit Windows, each category can hold 2048 synonyms. On platforms with linear addressing, the number is limited by available memory).
Each synonym can contain up to 31 characters. The synonym can contain any printable ASCII character (including spaces) except commas. The synonyms can appear in any order. Case is not significant.
Following are the example contents of a small text thesaurus containing three categories:
:obedient.adj /disobedient.adj
acquiescent, compliant, devoted, faithful,loyal, meek, obedient, servile, submissive:permanent.adj /transient.adj
abiding, constant, enduring, everlasting, fixed, immutable, lasting, permanent, perpetual, persistent, unchangeable:qualify.v
allow, except, limit, mitigate, modify, qualify, reserve, stipulate, temper
Copyright © 2015 Wintertree Software Inc.