Project Details
IREM: Interpretability of Retrieval Models
Applicant
Professor Dr. Avishek Anand
Subject Area
Data Management, Data-Intensive Systems, Computer Science Methods in Business Informatics
Term
since 2020
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 440551765
Complex machine learning (ML) models trained on large amounts of data have vastly improved performance in many application domains. On the other hand, they tend to be opaque and less interpretable to previous classical models. Consequently, interpretability of complex models has been studied in various domains - in image classification and captioning, sequence to sequence modeling etc. - to better understand the decisions they make. Howevr, there has been limited work on interpreting retrieval or ranking models, that are used to rank documents given a user specified keyword query. The objective of this proposal is to understand and address key challenges in interpreting retrieval models, which are considered central in information retrieval.One of the key challenges in information retrieval is dealing with intent under-specification. To counter this, raking models employ extensive query modeling, exploit contextual information and learn from user behaviour using a large feature space. As a consequence, modern ranking approaches such as learning to rank and neural models have become effective but, following the increasing complexity, also more opaque. However, unlike regression and classification models, where there have been many proposals for interpretability, there is limited understanding of ranking models in the context of problems specific to information retrieval. We identify three central problems, among others, that determine the success of a retrieval model: the problem of intent, the problem of context and the problem of learning.The problem of intent refers to how well a retrieval model understands the users information need, especially in cases where the queries are under-specified and ambiguous. The problem of context refers to how well a retrieval model leverages the search context - user profile, historical user modeling etc. - for ranking results. Finally, the problem of learning refers to the ability of a retrieval model to learn from various signals from queries, documents, user behaviour, clicks, etc., in order to determine which documents are more relevant to a given query. When using ML models to rank results, the training data (clicks, human annotations) informs how signals/features should be combined or how latent feature representations can be constructed from raw inputs (non-linearities over plain text). Towards this, various models have recently been proposed, ranging from linear regression and decision trees to complex deep neural networks.In this proposal, we aim to investigate and propose approaches for post-hoc interpretability of complex ranking models that are learned from data. Our aim is to interpret already trained ranking models in terms of what query intents and contextual information they actually infer and which features really matter for a given ranking.
DFG Programme
Research Grants
International Connection
Netherlands