Automatic processing of languages

Research Achievements

IGERT Trainee Jennifer Gillenwater, in collaboration with other researchers in Computer Science (Kuzman Ganchev and Ben Taskar) has been exploring ways in which the written texts of languages can be syntactically analyzed even when those languages do not already have large human annotated corpora to train the computer algorithms. Their approach has been to take "parallel" texts, in which English was translated to another language. Using generative and discriminative models of grammar induction, this team has begun to successfully extract the grammars of translated texts using the syntactic analyses of English. They evaluated their approach on Bulgarian and Spanish "shared task data" and show that their system consistently outperforms unsupervised methods and can sometimes outperform supervised learning for limited training data. This work has important implications for developing automatic processing of languages from countries where resources preclude large-scale human annotation.