|
28th Annual Conference on Current
Trends in Theory and Practice of Informatics
|
|
November 24 - December 1, 2001
|
|
Lemmatizer For Document Information Restrival Systems in JAVA
by Leo Galambos
Abstract:
Stemming is a widely accepted practice in Document information Retrieval
Systems (DIRs), because it is more benefical than harmful as well as
having the virtue of improving retrieval efficiency by reducing the size
of the term index. We will present a technique of semi-automatic stemming
that is fine designed for JAVA environment. The method works without deep
knowledge of grammar rules of a language in contradistinction to
well-known Porter's algorithm.