Graduation details

  • [PhD] - Aquisição Automática de Subcategorização Sintáctico-Semântica e sua utilização em Sistemas de Processamento de Língua Natural
  • Nov 2001 - Nov 2006
  • Development of robust syntactic parsers for natural language texts requires resolution of syntactic ambiguity. Most modern natural language processing techniques rely on a subcategorization lexicon to restrict possible parses. Words are combined following specific linguistic constraints. The constraints imposed by a particular word in order to limit the words with which it can combine are known as subcategorization restrictions. Subcategorization is expressed at both syntactic (subcategorization frames) and semantic (selection restrictions) levels of abstraction. Syntactic frames are based on constraints referring to morphosyntactic categories and syntactic contexts. Selection restrictions, on the other hand, require arguments to match a specific semantic class. The parser needs both syntactic constraints and selection restrictions information to prefer some parses from several possible grammatical ones. The purpose of this work is to investigate the process of automatic subcategorization acquisition from data. In order to do that, it is proposed an unsupervised strategy to acquire syntactic-semantic requirements of nouns, verbs, and adjectives from partially parsed text corpora. The main aim of the learning strategy presented in this thesis is to cluster similar contexts by identifying the words that extensionally define the requirements of those contexts. This strategy allows us to learn the syntactic and semantic requirements of words in different contexts. This information is used to build a subcategorization lexicon and to solve parsing attachment ambiguities. The results obtained show that the learning strategy is robust in relation to the noise present in the input data and also in relation to input sparseness problem.
  • 16 Nov 2001
  • 23 Nov 2006
  • Alexandre Agustini
  • Gabriel Pereira Lopes, Pablo Gamallo
  • Joaquim Ferreira da Silva, Gabriel Pereira Lopes, Pablo Gamallo, João Tiago Mexia, Irene Pimenta Rodrigues, Aline Villavicencio, António Branco, Pavel Brazdil