Distributional Approaches to Semantic Relatedness: Generalisation, Evaluation, Visualisation
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Final Report Abstract
The project Distributional Approaches to Semantic Relatedness (SemRel) explored the potential and the limits of distributional approaches to lexical semantics. In this vein, phase 1 distinguished three types of semantic relatedness, to shed light on distributional modelling from different perspectives. Phase 2 studied semantic relatedness from a meta-level perspective, across relatedness types, word classes, word senses and feature types. Our work was performed within an interdisciplinary framework between theoretical, cognitive and computational linguistics: each type of relatedness concerning paradigmatic relations, preposition senses and compound compositionality received input and feedback from human judgements, and was applied to statistical machine translation. Our main contributions from the project include • an extensive collection of distributional information and an interface for German subcategorisation information; • a substantial collection of human judgements regarding paradigmatic semantic relations, compositionality ratings, association and feature norms ; • a novel framework based on hard as well as soft clustering to identify ambiguous words, and a visualisation tool to explore their features; • assessment and own development of evaluation measures for soft clustering; • various soft-clustering and/or multi-modal approaches to predict relatedness and identify salient features; • various approaches to predict the compositionality of German multi-word expressions with specific attention to the linguistic and empirical properties; • various approaches to distinguish between paradigmatic relations, using both count and neural predict models; • a hierarchical SMT system integrating syntactico-semantic subcategorisation information; • a phrase-based SMT system making use of synthetic phrases to model noun phrase and prepositional phrase complements, • a phrase-based SMT system combining approaches to model morphology, syntax and lexical choice; • a phrase-based SMT system integrating compositionality ratings; • an SMT system with support-verb constructions. Overall, we demonstrated a large potential for distributional information to model the various types of semantic relatedness, and to integrate them into an SMT model. We also showed that default features might represent a first step as salient distributional properties, but are outperformed by phenomenon-related features. Contrasting text-based with multi-modal variants provided first insights into the strengths and complementary properties of the modalities. Most surprising for us was the difficulty to capture all three semantic relatedness types within one framework, as the two types of multi-word expressions already showed a very different behaviour, across various models. Distinguishing between the salient features and relating them to human judgements in a reasonable way remains a major challenge. Concerning ambiguity in vector spaces, our work as well as discussions in a reading group and in a workshop showed that in addition to defining appropriate techniques, the underlying gold standard ratings also need to take into account ambiguity, to allow a reasonable assessment of the models.
Publications
- A Multimodal LDA Model integrating Textual, Cognitive and Visual Modalities. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1146–1157, Seattle, WA, 2013
Stephen Roller and Sabine Schulte im Walde
- Using Subcategorization Knowledge to improve Case Prediction for Translation to German. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL), pages 593–603, Sofia, Bulgaria, 2013
Marion Weller, Alexander Fraser, and Sabine Schulte im Walde
- Chasing Hypernyms in Vector Spaces with Entropy. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pages 38–42, Gothenburg, Sweden, 2014
Enrico Santus, Alessandro Lenci, Qin Lu, and Sabine Schulte im Walde
- Combining Word Patterns and Discourse Markers for Paradigmatic Relation Classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL), pages 524–530, Baltimore, MD, 2014
Michael Roth and Sabine Schulte im Walde
- Association Norms for German Noun Compounds and their Constituents. Behavior Research Methods, 47(4):1199–1221, 2015
Sabine Schulte im Walde and Susanne Borgwaldt
(See online at https://doi.org/10.3758/s13428-014-0539-y) - How to Account for Idiomatic German Support Verb Constructions in Statistical Machine Translation. In Proceedings of the 11th Workshop on Multiword Expressions (MWE), pages 19–28, Denver, Colorado, USA, 2015
Fabienne Cap, Manju Nirmal, Marion Weller, and Sabine Schulte im Walde
- Integrating Distributional Lexical Contrast into Word Embeddings for Antonym-Synonym Distinction. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL), pages 454–459, Berlin, Germany, 2016
Kim-Anh Nguyen, Sabine Schulte im Walde, and Thang Vu
- The Role of Modifier and Head Properties in Predicting the Compositionality of English and German Noun-Noun Compounds: A Vector-Space Perspective. In Proceedings of the 5th Joint Conference on Lexical and Computational Semantics (SEM), pages 148–158, Berlin, Germany, 2016
Sabine Schulte im Walde, Anna Hätty, and Stefan Bott
- Addressing Problems across Linguistic Levels in SMT: Combining Approaches to Model Morphology, Syntax and Lexical Choice. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pages 625–630, Valencia, Spain, 2017
Marion Weller-Di Marco, Alexander Fraser, and Sabine Schulte im Walde
- Factoring Ambiguity out of the Prediction of Compositionality for German Multi-Word Expressions. In Proceedings of the 13th Workshop on Multiword Expressions (MWE), pages 66–72, Valencia, Spain, 2017
Stefan Bott and Sabine Schulte im Walde
(See online at https://dx.doi.org/10.18653/v1/W17-1708)