Graduation details

[PhD] - Infra-estrutura sintáctica para a deteção e superação de falhas em sistemas de processamento computacional de línguas naturais

Oct 1996 - Oct 2002

Abstract: Automatic detection and correction of errors and missing information in natural language processing systems is a fundamental stage in the evolution and adaptation of these systems, contributing for new linguistic knowledge learning. This thesis proposes a parsing infrastructure that not only supports mechanisms for partial parsing but also, in a latter phase, mechanisms for fault diagnosis. These faults are indicated by the fact that the previously obtained parses are partial. In both cases, a new approach based on partial parsing and tabling is used. Partial parses supply useful information even when the text under analysis has errors, or the available lexical information is wrong/incomplete, or the information supplied by statistical/empirical methods (as part-of-speech tagging) isn’t 100% precise. The proposed infrastructure partially parses a text, and is able, a posteriori, to improve the obtained parse by diagnosing the causes that hindered a full parse. Flexibility, modularity and generality of this infrastructure allow that several instances of it, built for different tasks (normal parsing with several syntactic levels or diagnosis of various types of faults), may be easily combined in a multi-agent architecture with the goal of acquiring lexical and grammatical knowledge in order to advance evolution and/or adaptation of the system. The thesis starts by presenting the state of the art in partial parsing and by locating the work developed and described here. Next, a cascaded architecture for partial parsing is proposed. The proposed infrastructure is abstractly specified in the framework of deductive parsing [SSP95], giving it the degree of flexibility that allows the diversity of mentioned applications. The infrastructure’s implementation in the DyALog system [Cle93, CL94], that incorporates tabling mechanisms, is described and experimental results certify its efficiency, which is at the same level of the fastest systems of this kind. The mechanisms used for fault diagnosis are described in detail and experimental results are presented, which were obtained in experiments on diagnosis of incomplete verbal and nominal subcategorization, wrong part-of-speech tagging, and lack of noun gender and number information. We emphasize the knowledge gains obtained and precision of the results.

Start Date: 1 Oct 1996

End Date: 25 Oct 2002

Post-Graduation by: Vitor Rocio

Post-Graduation Supervisor(s): Gabriel Pereira Lopes, Éric Villemont de la Clergerie

Post-Graduation Jury(s): Pedro Barahona, António Porto, Irene Pimenta Rodrigues, Pedro Rangel Henriques, Manuel Villares Ferro