Syntactic patterns in Pite Saami: A corpus-based exploration of 130 years of variation and change
General and Comparative Linguistics, Experimental Linguistics, Typology, Non-European Languages
Applied Linguistics, Computational Linguistics
Final Report Abstract
This project was a linguistics project focussed on exploring linguistic structures in Pite Saami, a highly endangered Uralic language spoken by around 30 individuals from the Arjeplog municipality in Swedish Lapland. The project’s goal was to understand syntactic patterns in spoken-language texts spanning more than a century of documented Pite Saami language use. Specifically, the project looked at syntactic structures; in other words, how can individual words be combined to form phrases, and how can phrases in turn be combined to form clauses and sentences. Aside from adding to our general knowledge about how human language can work, the project also explored how to formalize this understanding of Pite Saami syntactic structures in a way that a computer can understand. As a result of the project, computation tools were developed that automatically analyze Pite Saami texts; in other words, you can input a Pite Saami sentence into the developed computer applications, which then output which words are involved (lemmatization), which linguistic categories are present (morphology and part of speech), and even provide a rough word-for-word English translation. In addition to the usefulness of this research in automatically analyzing large amounts of Pite Saami texts for further research, these can also be used for developing spell-checkers and grammar-checkers, which can be especially valuable for such a small language community like Pite Saami. Lastly, the Pite Saami collection at the Endangered Languages Archive grew in size and quality as a result of this project; this collection will serve as a dataset for future investigations into Pite Saami language and culture.
Publications
- 2017. “Instant Annotations: Applying NLP Methods to the Annotation of Spoken Language Documentation Corpora”. In Proceedings of the Third International Workshop on Computational Linguistics for Uralic Languages: Proceedings of the Workshop. ACL Anthology. St. Petersburg, Russia: Association for Computational Linguistics. 25-36
Gerstenberger, Ciprian, Niko Partanen, Michael Rießler, J. Wilbur
(See online at https://doi.org/10.18653/v1/w17-0604) - 2018. Pite Saami Finite State Transducer (morphological parser) and Disambiguator (Constraint Grammar syntactic disambiguator)
Wilbur, J.
- 2019. “ELAN as a search engine for hierarchically structured, tagged corpora”. In Proceedings of the 5th International Workshop for Computational Linguistics for Uralic Languages (IWCLUL 2019). Tartu: Association for Computational Linguistics. 90-103
Wilbur, J.
(See online at https://dx.doi.org/10.18653/v1/W19-0308) - 2019. “Using computational approaches to integrate endangered language legacy data into documentation corpora. Past experiences and challenges ahead”. In Proceedings of the Workshop on Computational Methods for Endangered Languages. Vol. 2. Honolulu: Association for Computational Linguistics. 24-30
Blokland, Rogier, Niko Partanen, Michael Rießler, J. Wilbur
(See online at https://doi.org/10.33011/computel.v2i.451) - 2021. “Envisioning digital methods for fieldwork in the Arctic.” In M. Lehtimäki, A. Rosenholm & V. Strukov (eds.), Visual Representations of the Arctic: Imagining Shimmering Worlds in Culture, Literature and Politics. Routledge Interdisciplinary Perspectives on Literature. London: Routledge. 313-339
Partanen, Niko, Michael Rießler, J. Wilbur
(See online at https://doi.org/10.4324/9781003158295-22)