Overcoming lexical information incompleteness (Superando a incompletude da informação lexical)
Incompleteness or miscoding of lexicons is a major problem faced by current parsing systems and leads to the impossibility to completely parse real text. Generally, this situation is solved by ignoring those parts of the text that can not be parsed, by doing robust parsing. In this paper we will work on a complementary problem by developing a methodological approach for diagnosing the causes that led to partial parses of the input text. Once this is done, the statistical validation of the hypothesis found, as well as the learning of new information from corpus and the update of the lexica are processes that we will not address in this paper. In this paper we claim that robust parsing should incorporate both a correction phase and a learning phase, guided by the computational linguistics theory underlying the existing systems. And so, we will show how to extend chart parsing in order to explain partial parses previously obtained, diagnosing possible causes that once they are assumed will enable the diagnostic chart parser to get better parses (partial or complete). In this paper we will not elaborate on the proposed architecture. But, it will be clear how distributed diagnosis, hypothesis validation and learning can cooperate and enable the construction of evolving parsers that will learn by parsing real text.
Linguística Computacional: Investigação Fundamental e Aplicações