lemmatization


Also found in: Wikipedia.
Translations

lemmatization

[ˌlemətaɪˈzeɪʃən] Nlematización f

lemmatization

n (Ling) → Lemmatisierung f
References in periodicals archive ?
Next, module performs pre-processing [16] that include URLS removal, hash-tags, username & special characters, performing spelling correction with the aid of a dictionary, abbreviation substitutions, performing lemmatization and stop words removal.
Table 1 - Dates of lemmatization of rare diseases YEAR OF PATHOLOGY PUBLICATION 1732 Dengue 1734 Leprosy 1884 Albinism Diphtheria 1899 Brachycephaly 1925 Hydrocephalus 1927 Scleroderma Ichthyosis Acromegaly 1936 [1939] Hemophilia Microcephaly 1970 Achondroplasia Botulism 1984 Thalassemia 1989 Brucellosis Phenylketonuria Glioma 2001 Legionellosis Narcolepsy Nevus Source: Own elaboration.
Although the compilers of the corpus claim that it is equipped with various types of monolingual annotation, (41) such as tokenization, sentence splitting, lemmatization, word sense annotation, and so on, a manual check showed that the frequency results correspond only to the particular token in the search field.
The workflow, or automated set of procedures, might perform what linguists refer to as lemmatization on the string of words, which is to say, the trimming of each word into its smallest meaningful components, as well as removing plurals, capitalization, punctuation, and tense.
The first experiments in Croatian include [Tadic and Sojat, 2003] who use PoS filtering, lemmatization and mutual information to identify candidate terms as a preprocessing step for terminological work, [Delac et al.
TreeTagger is used to classify extracted terms (concepts/relations) using the annotation and lemmatization information [15].
When a search term is preceded by one of these operators, the automatic synonymization and lemmatization (finding grammatical variants) of search terms is turned off, and only exact matches for the query term should be retrieved.
This article deals with the lemmatization of Old English and, more specifically, with the lemmas of verbs of the second weak class.
The stopword, the stemming, and the lemmatization are representative pre-processing techniques in text mining.
A preliminary lemmatization of the transcribed corpus (329,837 words) led to a final list of 150 keywords, each with a minimum of 99 occurrences.
TreeTagger is a part-of-speech tagger and a lemmatization tool that is written in C++ [10].