Language Technology to Improve Data Quality and Engagement
THE CHALLENGE
Needs assessments, which are essential for humanitarian strategy development, are often inaccurate and difficult to conduct in multilingual contexts. Data collectors often do not speak the languages of conflict-affected people, do not understand humanitarian terminology, and have limited training. This means that important information may be translated several times, making the process slow and prone to error, and resulting in aid efforts being designed based on incomplete and inaccurate data.
THE CHALLENGE
Recognizing the gap in effective translation services available to humanitarian actors, specifically when conducting critical assessments in contexts where minority languages are spoken, Translators Without Borders built a cutting-edge language technology which provides text- and voice-based translation to better capture, understand, and analyze information quickly and accurately. Under CHIC funding, Translators Without Borders built several foundational datasets for automating humanitarian data collection processes in Hausa—a language widely spoken in the conflict-affected zones of northeast Nigeria for which few automated translation options are available. The team used collected data to create two prototype tools: a bidirectional Hausa-English translator and a Hausa speech recognizer. While both prototypes performed significantly better than Google, further quality improvements are needed before they are integrated into humanitarian field operations. Nonetheless, the team succeeded in introducing Hausa into the open-source voice data collection platform Common Voice, and the work was selected to be featured at the prestigious speech technology conference Interspeech.