Extracting Concepts from dynamic legislative text collections
Jan 2005
Selecting discriminating terms in order to represent the contents of texts is a critical problem for many applications in Information Retrieval. Most of the Information Retrieval systems index documents based on individual words that are not specific enough to evidence the contents of texts. As a consequence, there has been a growing interest in developing techniques for automatic term extraction. In this context, we propose a new architecture for retrieving relevant documents in a dynamic text collection. It combines the SINO search engine with the SENTA software designed for the automatic extraction of multiword lexemes. In this paper, we will particularly focus on the SENTA module that has recently been added to the global architecture.
Keywords: Multiword Lexical Unit Extraction, Information Retrieval, Web Interface.
Meaningful Texts: The Extraction of Semantic Information from Monolingual and Multilingual Corpora