Irina Temnikova, Silvia Gargova, Tsvetelina Stefanova, Iva Marinova, Ruslana Margova, Nevena Grigorova, Alexander Komarov, Dan Sultanescu, and Kalina Bontcheva
While cross-lingual manual and automatic fact-checking are important, and Machine Translation (MT) is among the methods, used for them, there are no orienting guidelines, nor evaluation metrics that could assist with determining which MT engine would be appropriate for such a task. This article presents an evaluation approach that fills a gap by providing a numerical estimate of Machine Translation (MT) engines’ suitability for translating texts for cross-lingual claim matching and fact-checking. The approach focuses on elements important for the task, such as the correct translation of Named Entities (NEs), while making others less important (for example the style and the fluency of the translations). Our contributions include an MT error classification, evaluation guidelines, formulas to obtain a normalized numerical score, and a Python script for calculating it. The numerical weights of the score’s components can be modified, which allows flexibility, reflecting what is possible for the subsequent stage. We also present the results of two experiments in which we apply our approach (with a choice of weights) and determine the most suitable freely accessible MT tools for translating Bulgarian and Romanian news articles and social media texts into English. Our results show that eTranslation is the best MT tool for Romanian-English translation direction, while the HuggingFace Helsinki Opus MT model is best for Bulgarian-English.
Keywords: Evaluation of machine translation · Bulgarian · Romanian
Second International Conference ‘New Trends in Translation and Technology’ (NeTTT’2024)
Disinformation detection and research
Yes