Envion Software’s Line of Proprietary Natural Language Processing Products

Challenge

As one of the world’s leading providers of Natural Language Processing software, Envion boasts a line of premium-quality proprietary products that was started by Envion’s mother company, Circle Noetic Services, back in 1984. Envion’s flagship NLP product, WordFan Natural Language ToolBox, has been actively and continually enhanced and improved by us ever since the release of its first version in the early 1990-s.

All our NLP software products have been acclaimed internationally and used by some of the world’s leading corporates and institutes of higher learning for many a year now.

Envion’s NLP product line consists of 3 main products: Password Spellchecker, DashesHyphenator, and WordFan Natural Language Toolbox. 

 

Password Spellchecker

 

Password Spellchecker is a multilingual application intended for language learners. The application can also be efficiently used in search engines and to ensure correct spelling during typing.

The application’s compression algorithm, invented by Envion’s experts, makes it possible to analyze up to 40000 words per second in more than 20 languages (English, French, German, Italian, Arabic, Russian, Spanish, and more), including their several regional varieties.

The solution uses exhaustive word lists meticulously created by Envion’s linguistic experts and suggests appropriate corrections while ranking the latter according to the degree of their structural and phonetic similarity to the misspelled word. 

 

Dashes Hyphenator

 

Envion’s Dashes Hyphenator is designed taking into account the often complicated rules of currently more than 30 languages to ensure proper hyphenation in these tongues.

The utterly novel approach to the problem of hyphenation adopted by Envion gave birth to a breakthrough proprietary algorithm that reversed the universally held idea of the lexical stress being unpredictable. Initially, we managed to identify and algorithmize the convoluted language patterns that regulate the lexical stress in the English language, and gradually extended this expertise to cover 31 more languages.

While performing hyphenation, the library ranks the suggested hyphens in accordance with their stylistic value. Non-standard lexical units are precisely hyphenated using a custom set of rules specifically designed by Envion’s linguistic experts.

Currently, Dashes is capable of processing more than 100 000 words per second to show 99.9% hyphenation accuracy.

Importantly, the library is also able to properly hyphenate Germanic compound words by breaking them down into constituent elements.

 

WordFanNatural Language ToolBox

 

WordFan Natural Language ToolBox is an easy-to-use but extremely knowledge-intensive multilingual stemming library.

The software allows exploring in depth the entire range of paradigmatic relations of a lexical unit.

There are several types of lookup supported by the library to allow the user to trace the linguistic relations they are interested in. Depending on the type of lookup used, the library can show any of the following types of results:

 

  • All the forms of the input word with their grammatical meaning (conjugation).
  • Base forms of the input word with their grammatical meaning (normalization).
  • All forms similar to the input (approximate lookup).
  • All the exact matches of the input word (exact lookup).
  • Constituent elements of the input compound word (decompounding).

The library provides a detailed grammatical description for each of the displayed derivatives. This description includes information that ranges from the word’s part of speech to its category of case, thus allowing a comprehensive view of the base entry.

The library’s processing speed varies depending on the language used and request type, the lowest being around 2,000 words per second for Approximate lookup, and the fastest being more than 100,000 words per second for Exact lookup.

Currently, the library supports 6 languages: Arabic, Danish, English (AmE, BrE, AusE), German (both modern and pre-reformed), French, Russian, and Polish. The lexical coverage of the languages provided by the library is very broad and it has been meticulously optimized by us to avoid any possible ambiguity and misleading overlaps.

 

Solution

The development of the first in Envion’s suite of NLP products, Dashes Hyphenator, started in July of 1984. The first fully functional version of the library was released in early 1985. Identifying the language patterns that are responsible for the positioning of the lexical stress took our project team a great deal of effort and skill. As a result, we became victorious in dealing with the problem that had previously been considered insolvable. 

The development of Password Spellchecker started in 1986, and our project team spent 4 years building the software. The project was implemented by a team of 2 software developers and 3 in-house linguistic experts. The Envion NLP project team is now engaged in improving and expanding the application.

The development of WordFan Natural Language ToolBox, being part of Envion’s NLP product line, began in the early 1990-s and is still ongoing.

In addition to the vast amount of diverse linguistic data to be processed, implementing the WordFan NLP ToolBox project posed a number of significant technical challenges. Several proprietary techniques had to be developed and applied to ensure efficient dictionary storage. An additional effort was required to implement the required data structure. Significant volumes of data (up to 80 000 base entries) made it difficult to achieve the desired processing power of 40000 words per second.In order to achieve the desired processing power for the library, Envion’s experts started looking for a suitable compaction technology, and invented a superior compaction algorithm to become a valuable addition to the existing range of NLP tools and techniques. This proprietary algorithm’s compaction coefficient constitutes 93.7%.

One of the project’s greatest challenges was associated with the definition and implementation of a number of rules that regulate the use of affixes, approximate search parameters, and more.Envionresponded to this challenge by coming up with a single, highly innovative, and flexible mechanism that is capable of describing the many various morphological changes that occur in different languages.

Several times the solution was also customized to meet the specific requirements of Envion’s B2B clients. 

Currently, Envion’s NLP project team is engaged in adding more dictionaries to the set of dictionaries used and providing overall technical support for the software.

 

Technology Stack

The Envion project team used the following technologies to implement the products that make up the company’s NLP Product Suite:

 

Password Spellchecker

  • C, C+

Dash Hyphenator

  • C, C++

WordFan Natural Language ToolBox

  • C, C++ 

 

Result

Envion has been able to earn the reputation of a global leader in the development of advanced NLP software. The NLP product suite has allowed us to attract a large number of clients that include several world-leading companies from different industry sectors, as well as a large number of US, UK, Canadian, and Australian universities and colleges. 

The consummate NLP expertise and a wealth of related experience gained while developing our NLP product line often help us attract clients interested in Envion’s outsourcing service offering.

Contact Us

captcha