» Evaluating Machine Translation for Cross-Lingual Fact-Checking

While cross-lingual manual and automatic fact-checking are important, and Machine Translation (MT) is among the methods, used for them, there are no orienting guidelines, nor evaluation metrics that could assist with determining which MT engine would be appropriate for such a task. This article presents an evaluation approach that fills a gap by providing a numerical estimate of Machine Translation (MT) engines’ suitability for translating texts for cross-lingual claim matching and fact-checking. The approach focuses on elements important for the task, such as the correct translation of Named Entities (NEs), while making others less important (for example the style and the fluency of the translations). Our contributions include an MT error classification, evaluation guidelines, formulas to obtain a normalized numerical score, and a Python script for calculating it. The numerical weights of the score’s components can be modified, which allows flexibility, reflecting what is possible for the subsequent stage. We also present the results of two experiments in which we apply our approach (with a choice of weights) and determine the most suitable freely accessible MT tools for translating Bulgarian and Romanian news articles and social media texts into English. Our results show that eTranslation is the best MT tool for Romanian-English translation direction, while the HuggingFace Helsinki Opus MT model is best for Bulgarian-English.

Keywords: Evaluation of machine translation · Bulgarian · Romanian

Published in:

Second International Conference ‘New Trends in Translation and Technology’ (NeTTT’2024)

Department:

Disinformation detection and research

Open access repository:

Yes

About us

Contact

Founders

Funding