Automatic Extraction of Explicit and Implicit Keywords to Build Document Descriptors
Keywords are single and multiword terms that describe the semantic content of documents. They are useful in many applications, such as document searching and indexing, or to be read by humans. Keywords can be explicit, by occurring in documents, or implicit, since, although not explicitly written in documents, they are semantically related to their contents. This paper presents a statistical approach to build document descriptors with explicit and implicit keywords automatically extracted from the documents. Our approach is language-independent and we show comparative results for three different European languages.