Project Details
Projekt Print View

Computational Models of the Emergence and Diachronic Change of Multi-Word Expression Meanings

Subject Area Applied Linguistics, Computational Linguistics
Term since 2021
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 462212526
 
In Natural Language Processing (NLP), combinations of words are considered multi-word expressions (MWEs) if they are semantically idiosyncratic to some degree, i.e., the meaning of the combination is not entirely (or even not at all) predictable from the meanings of the constituents. MWEs subsume multiple morpho-syntactic types, including noun compounds (such as "flea market") and particle verbs (such as "give up"). They have been explored extensively and across research disciplines from synchronic perspectives, but state-of-the-art studies are lacking empirical large-scale approaches towards diachronic models of MWE meaning.Our project SemChangeMWE goes beyond the restricted synchronic concept of MWE meaning and provides a novel perspective on MWE emergence, MWE meaning changes and MWE compositionality (i.e., meaning transparency) by computationally modelling their diachronic properties and changes of properties. We selected the two multifaceted MWE types noun compounds and particle verbs to explore them cross-linguistically for German and English. The project brings together our expertises in (a) computational models of MWE compositionality and meaning analogy, (b) computational models of diachronic meaning changes and meaning divergences in language variation, and (c) datasets of meaning components and meaning relatedness, in order to address the interdisciplinary lack of computational diachronic models of MWE meaning.Methodologically, we will exploit qualitative and quantitative approaches (such as statistical measures of productivity; distributional, information-theoretic and topic- and graph-based probabilistic models; visualisation of collocational strength) and enhance vector representations and computational algorithms to shed light on (i) synchronically salient empirical characteristics of MWEs at the time of emergence (such as frequency, generality, grammatical variation), (ii) diachronic MWE meaning changes, (iii) the role of synchronic and diachronic polysemy in MWE sense innovation and reduction, and (iv) analogical developments of MWE meanings with regard to their present-day compositionality. To enable extensive interdisciplinary assessment and validation for theoretical and computational research, we will evaluate our empirical knowledge and computational models not only on general semantic-change benchmarks and MWE-specific novel change datasets, but also (i) by validating them against theory-driven categorisations of MWEs; (ii) by applying them to further language variation tasks (i.e., domain-/register- and dialect-specific sense divergences), and (iii) by integrating them into statistical machine translation as an external NLP application.
DFG Programme Research Grants
 
 

Additional Information

Textvergrößerung und Kontrastanpassung