Accurate hyphenation is essential for all printed material, be it a newspaper, magazine, book, brochure, technical document or scientific paper. Particularly for languages with long inflected or compound words. What can you do when your page layout application produces false hyphenation? Does your staff spend additional work hours correcting hyphenation errors and ugly justification by hand? Is there a solution that maintains readability and quality impression with much less effort? Yes there is.
- Lingsoft Lexical Algorithm for accurate recognition of compound and inflection boundaries of millions, if not billions, of word forms.
- A separate strict algorithm for the usually minor amount of unknown words. The two combined produce a stunning accuracy of 99,9%.
- Three-level ranking can be used e.g. for selective hyphenation of dynamic web content. Ranking levels are available also in hyphenation exception dictionaries.
- Extremely compact and fast due to robust and reliable finite-state technology.
- Available as plugins for market leading page layout applications, and easy to integrate to a variety of application environments.
Lingsoft's hyphenation components adhere to common spelling norms based on the best available resources and references of respective language. Millions of Microsoft Office users have appreciated their rigorously tested quality and robustness for more than a decade. Lingsoft's hyphenation components are included in Proofreader plugins for the two market-leading page layout applications, Adobe InDesign and QuarkXPress, and consequently many publishers already enjoy the benefits of using them for improved quality and savings in manual labor.
Lingsoft knows how to hyphenate compounds and inflections
Lingsoft created a unique approach by applying two different hyphenation algorithms per language, one for words recognized by its language model, and one for unrecognized words. Morphology-based hyphenation is able to handle some typical features of compounding and inflecting languages. A good example is the triple consonant of Swedish compounds:
The Swedish word for "allow" is "tillåta", a composition of "till" and "låta" (to let). The third consonant is hidden, except when the word is hyphenated: "till-låta". However, non-compound words with double consonants, like "kallaste" (the coldest) should not be triple-hyphenated "kall-laste". A hyphenation algorithm that is unaware of the word structures cannot handle this and many other similar features consistently. That's why default hyphenation applications even in the two leading page layout suites are insufficient for languages like Swedish.
The morphology-based hyphenation uses compound segment boundaries and exceptional hyphenation boundaries given by the morphology model, and adds the remaining hyphenation points based on a set of rather generous hyphenation rules. If conflicting boundaries are found, no hyphenation point is given for that syllable boundary in order to prevent incorrect hyphenation.
Each model covers the core vocabulary of the language. For some languages that means recognizing millions if not billions of word forms, including inflections, compounds and derivations. Therefore a great majority of syllable boundaries are covered by this extremely accurate algorithm.
Words not recognized by the model are hyphenated with a second set of rules, which are more careful than the rules for recognized words, in order to suggest only correct hyphenation points. In most languages the accuracy as a result of this dual approach is a stunning 99,9% or more.
Selective hyphenation enhances typography and readability
When you hyphenate a narrow newspaper column, you certainly need practically all acceptable hyphenation points in order to create beautiful justification. The hyphenation algorithm is designed to be consistent in cases where multiple hyphenation points at a given syllable boundary are permitted. Lingsoft's hyphenation component can usually produce enough consistent hyphenation points for smooth justification.
The hyphenation points are categorized and ranked by type. You can use this feature for selective hyphenation. Unhyphenated left-aligned text is quite often ragged and ugly in languages with a lot of long compound words. You can improve the appearance and readability of such text by allowing hyphens at compound boundaries. The text flows evenly, but is not aggressively split into short syllables. Why not apply automated hyphenation for your dynamic web content like on this Finnish or this Swedish page.
Lingsoft's hyphenators are compact, robust and superfast
Lingsoft's finite state morphology models are the core sources of linguistic intelligence for Lingsoft's spelling and grammar checkers, thesauri and hyphenators, along with taggers and parsers for search and text analysis applications for the corresponding language. The regular maintenance of the models benefits all these application areas.
The lexical content and rules of the corresponding language model are compiled to a compact and fast finite state transducer, which together with the hyphenation algorithms and additional data is included in the hyphenation component for that language. This platform-independent binary file is only some megabytes in size, yet hyphenating thousands of words per second. A Lingsoft hyphenation components is an easy load for almost any type of application, including network-based services.
The hyphenation components can be used with Lingsoft's proprietary hyphenation application programming interface LSHYPH-API, available for Windows, Linux, Mac OS X, and others on demand. A Java wrapper is also available. The character set used with LSHYPH-API is Unicode.
Easy customization with exceptions and language model upgrades
Even though Lingsoft's hyphenation components offer unmatched accuracy out of the box, you may want to permanently override a particular hyphenation suggestion, or in some instances apply a different hyphenation norm than Lingsoft adheres to. You probably also want some company names never hyphenated.
LSHYPH provides all you need for defining and maintaining a user dictionary. You can define words with exceptional hyphenation points with up to three quality levels, as well as words you don't want hyphenated, such as some company names. A well-organized language management platform and workflow takes care of validating and distributing the corporate hyphenation exceptions throughout your organization.
Product specifications | How to buy