In proceedings details

  • Improving LocalMaxs Multiword Expression Statistical Extractor
  • Jul 2023
  • LocalMaxs algorithm extracts relevant Multiword Expressions from text corpora based on a statistical approach. However, statistical extractors face an increased challenge of obtaining good practical results, compared to linguistic approaches which benefit from language-specific, syntactic and/or semantic, knowledge. First, this paper contributes to an improvement to the LocalMaxs algorithm, based on a more selective evaluation of the cohesion of each Multiword Expressions candidate with respect to its neighbourhood, and a filtering criterion guided by the location of stopwords within each candidate. Secondly, a new language-independent method is presented for the automatic self-identification of stopwords in corpora, requiring no external stopwords lists or linguistic tools. The obtained results for LocalMaxs reach Precision values of about 80% for English, French, German and Portuguese, showing an increase of around 12-13% compared to the previous LocalMaxs version. The performance of the self-identification of stopwords reaches high Precision for top-ranked stopword candidates.
  • Springer LNCS
  • Springer
  • Joaquim Ferreira da Silva, José Cardoso e Cunha
  • Proceedings of the International Conference on Computer Science 2023 (to be published)
  • 3 Jul 2023