Search Options

The TextExtractor offers the ability to find data in a variety of unique and different ways. One of the most powerful search mechanisms is the ability to search on synonyms. The CCAT has a built in standard thesaurus as well as a customized thesaurus. This is a standard search option that can be used while searching with the TextExtractor. The standard thesaurus includes a wide array of standard synonyms covering most of the English language. An example of how the standard thesaurus is valuable would be: Suppose a search is issued for “hit and head”. Utilizing the standard thesaurus would not only find hit and head in the same document but also find other words like strike, smack, punch and synonyms related to head as well.

The custom thesaurus offers the ability to define and search on uncommon synonyms that may not be defined in the custom thesaurus. Such as slang words, gang names, nicknames, aliases, drug names, etc. An example of a search using the custom thesaurus would be: Let’s suppose a local gang named called the “Hit Man Posse” had known gang members with a combination of names and nicknames named “Frank Brown, Franky, FB, Shawn Evans, Shawn Boy……” Searching on “Hit Man Posse” would find and highlight any documents with the names or nicknames listed above. Conversely searching on “Franky” would find all of the other names listed above including the gang name “Hit Man Posse”. Creating custom synonyms is accomplished by entering them in an easy to use web based interface.

Command Line Search Options

Command line options are very powerful because they typically can apply to a single word or and entire search. A few of the command line search options are listed below with and example of how to use them:

Wildcards: Use * to match any number of characters and ? to match any single character. Examples: victim* will find victims and victimology victim? will find only victims.

Stemming: Use ~ to find other grammatical forms of the word in your search request. To invoke a stemming search please add the stemming character to the end of the word you want stemmed. Example: slashing~ would also find slashed, slasher or slash.

Phonic Search:Use # at the front of a word such as #Smith to find other words that sound like Smith and begins with the same letter. Example: #Smith will find both Smith and Smythe.

Fuzzy Search: The % helps users sift through scanning (OCR) and typographical errors. Fuzziness adjusts from 1 to 10 depending on the degree of misspellings. A search for alphabet with a fuzziness of 1 would find alphaqet, with a fuzziness of 3, it would find both aphaqet and apkaqet. The level of fuzziness is specified by the number of % symbols imbedded in the search word. Examples: she%ry Specifies that the word must begin with she and at most have one differences between it and sherry. Stile??o Specifies that the word must begin with b and at most have two differences between it and stiletto.

Any words: Use quotation marks around phrases such as "due process of law". It is also legal to put + (plus) in front of any word or phrase that is required, and - (minus) in front of a word or phrase to exclude it. Examples: knife + "white handle" "sherry"-bob +"suzi"
All words: like an "any words" search except that all of the words in the search request must be present for a document to be retrieved.

Boolean search: A group of words, phrases, or macros linked by connectors such as AND/OR/NOT that indicate the relationship between them. Examples: pistol and rifile both words must be present blood or spatter either word can be present blood w/5 spatter spatter must occur within 5 words of blood crack not w/5 cocaine cocaine must occur, but not within 5 words of crack pistol and not rifle only pistol must be present.

Weighted search: You can use variable term weighting in a search request to weight some words more heavily than others in ranking search results. Example: knife:5 and shank:3