TURJUMAN: A Public Toolkit for Neural Arabic Machine Translation



 

 

Assistant Professor and Canada Research Chair Dr. Muhammad Abdul-Mageed along with Postdoctoral fellows Dr. El Moatez Billah Nagoudi and Dr. AbdelRahim Elmadany have received the Best Paper Award for their work “TURJUMAN: A Public Toolkit for Neural Arabic Machine Translation“. The paper was published in the 5th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT5), at the 14th Conference on Language Resources and Evaluation (LREC 2022).

Turjuman (“translator” or “interpreter” in Arabic) is a machine translation toolkit that translates from 20 languages into Arabic. Supported languages include those from which there are sizeable amounts of data such as English, French, and Spanish as well as those with fewer data like Cebuano, Tamashek, and Yoruba. This tool exploits the AraT5 model, a sequence-to-sequence Transformer-based model developed by the Deep Learning and NLP Group in previous work.

The toolkit offers the possibility of employing a number of diverse decoding methods, making it suited for acquiring paraphrases for the Arabic translations as an added value. The team has also created a demo based on the software, available here, and encourages those interested to try it.

Explore this article on the UBC Language Sciences website to learn more about Turjuman and its applications.