Translation Lexicons are known to improve the quality of
parallel corpora alignment at sub-sentence granularity, the quality of newly extracted translations, and as a consequence, Machine Translation and cross language information retrieval. Bilingual pairs (entries) that
are part of such translation lexicons should be correct if they are to contribute positively to the improvement of application's quality. This paper proposes and focuses on a method for classifying bilingual entries that were automatically extracted from aligned parallel corpora as correct or incorrect, by using a Support Vector Machine based classifier. Experimental results demonstrate that the classification approach enabled a Micro f-measure higher than 85% for language pair English-Portuguese.
Keywords: Translation equivalents, Translation Lexicon, Translation tables, Bilingual translation pairs, Phrase table Filtering, Classification, Support Vector Machine, SVM