UCREL Corpus Research Seminar

Systematic Inequalities in Language Technology Performance across the World's Languages

Abstract: Natural language processing (NLP) systems have become a central technology in communication, education, medicine, artificial intelligence, and many other domains of research and development. While the performance of NLP methods has grown enormously over the last decade, this progress has been restricted to a minuscule subset of the world's 6,500 languages. We introduce a framework for estimating the global utility of language technologies as revealed in a comprehensive snapshot of recent publications in NLP. Our analyses involve the field at large, but also more in-depth studies on both user-facing technologies (machine translation, language understanding, question answering, text-to-speech synthesis) as well as more linguistic NLP tasks (dependency parsing, morphological inflection). In the process, we (1) quantify disparities in the current state of NLP research, (2) explore some of its associated societal and academic factors, and (3) produce tailored recommendations for evidence-based policy making aimed at promoting more global and equitable language technologies.

Bio: Antonios Anastasopoulos is an Assistant Professor in Computer Science at George Mason University. He received his PhD in Computer Science from the University of Notre Dame, advised by David Chiang, and then did a postdoc at Languages Technologies Institute at Carnegie Mellon University. His research is on natural language processing with a focus on low-resource settings, endangered languages, and cross-lingual learning, and is currently funded by the National Science Foundation, the National Endowment for the Humanities, Google, Amazon, Meta, and the Virginia Research Investment Fund.

UCREL Corpus Research Seminar

University Centre for Computer Corpus Research on Language

Computing & Communications | Linguistics and English Language

Systematic Inequalities in Language Technology Performance across the World's Languages

Antonios Anastasopoulos

George Mason University

Week 14 2021/2022

Thursday 10th February 2022
3:00-4:00pm

Microsoft Teams

UCREL Corpus Research Seminar

University Centre for Computer Corpus Research on Language

Computing & Communications | Linguistics and English Language

Systematic Inequalities in Language Technology Performance across the World's Languages

Antonios Anastasopoulos

George Mason University

Week 14 2021/2022

Thursday 10th February 20223:00-4:00pm

Microsoft Teams

Thursday 10th February 2022
3:00-4:00pm