For French, Labbe (1990) covers some of the lemmatisation
issues in connection to the << agglutinating hyphens >> which can be similarly observed in English for constructions such as the what's-his-name, the what do you call it, the might-have-been.
Keywords: web services, workflows, morphosyntactic tagging, lemmatisation
, definition extraction
The reliability of lemmatic bundles, on the other hand, depends on the consistency and procedure of lemmatisation
in the corpus.
, the TEP may be meaningful and ordered for the terminographer since he/she can gather under the same lemma the spelling variant and morphological forms of the same word.
Once the second taste has been performed, lemmatisation
Its main shortcoming (as put forward by Kohnen 2007) is the lack of lemmatisation
and/or tagging, which may be taken as a significant difficulty when dealing with OE and Middle English (ME) texts due to the morphological and orthographical variation existing in those periods.
MALEX can thus do what a stemmer does, and a lot more besides, and lemmatisation
, morphological analysis and stemming all turn out to be different aspects of exactly the same lexical problem.
allows the transformation of a term to its canonical form or lemma.
Maniez(6),>> semble toutefois d'une automatisation plus difficile, en particulier dans le cas de phrasemes non figes, contenant des verbes, qui necessite d'une part un processus prealable de lemmatisation
et d'etiquetage grammatical, et d'autre part, un reperage des elements constitutifs du phraseme en contexte, ce dernier etant sans doute la difficulte principale de la demarche d'extraction automatique>>.
The section on types of corpora and other text collections is a useful overview, in which the author discusses types of corpora, their design and compilation (with appropriate mention of the problems of representativeness), and their annotation to include textual and extra-textual information, also various types of linguistic information (morphosyntactic, as in corpora tagged for parts of speech; syntactic, in parsed corpora, or "treebanks"; lemmatisation
of word forms; prosodic annotation; semantic tagging; incorporation of pragmatic and discoursal information, and software available for these purposes).
This project sets out to discover and develop techniques for the lemmatisation
of a historical corpus of the Cornish language in order that a lemmatised dictionary macrostructure can be generated from the corpus.
is a general normalisation procedure in text processing, where all inflected forms of a lexical word are normalised to a single lemma (i.