seminars
Detail
Publication date: 1 de June, 2021Predicting the Sentence-Level Quality of Machine Translation Systems
The notion of “quality” in Machine Translation (MT) can have different interpretations depending on the intended use of the translations (e.g., fluency and adequacy, post-editing time, etc.). Nonetheless, the assessment of the quality of a translation is in general done by the user, who needs to read the translation, and the source text, to then be able to judge whether it is a good translation or not. This is a time consuming task and may not even be possible, if the user does not have knowledge about the source language. Therefore, automatically assessing the quality of translations is a crucial problem, either to filter out the low quality ones, e.g. to avoid professional translators spending time reading / post-editing bad translations, or to present them in such a way as to make end-users aware of the quality. This task, referred to as Confidence Estimation (CE), is concerned about predicting the quality of a system’s output for a given input, without any information about the expected
output.
In this talk I will present the work we have been doing at Xerox Research Centre Europe, in collaboration with Bristol University, on predicting the quality of sentences produced by MT systems when reference translations are not available. The problem is addressed as regression and classification tasks and we propose a method that takes into account the contribution of different sources of information, i.e. features, to the problem. I will present results from experiments with this method and a large set of features and translations produced by various MT systems for different language pairs, annotated with quality scores both automatically and manually. I will also discuss potential uses of CE estimates for several multilingual applications, including multilingual text mining.
Date | 05/02/2009 |
---|---|
State | Concluded |