Computer Science and Programming

Posts

Morphological segmentation of words

In this post we will look at a simple python library called Morfessor which uses unsupervised training to split a given word into its constituent morphemes. Morfessor is a family of probabilistic machine learning methods that find morphological segmentations for words of a natural language, based solely on raw text data. So what are morphemes and why are they useful? Morphemes are the smallest meaningful elements of a word . For example, chair, dog, bird, table, compute are all morphemes. They express a direct meaning and cannot be further separated into smaller parts. Sometimes a single word can carry a number of morphemes. Consider the word 'unsegmented'; it consists of 3 morphemes - 'un', 'segment' and 'ed'. Morphemes are used in a variety of linguistic tasks. They help in understanding word structure and word formation. In Natural Language Processing, morphology is used in text preprocessing tasks (word stemming and lemmatization) and genera...

Computer Science and Programming

Search This Blog

Posts

Paper Explained: A Structured Self-Attentive Sentence Embedding

Morphological segmentation of words