Project Details
Online Preference Learning with Bandit Algorithms
Applicant
Professor Dr. Eyke Hüllermeier, since 3/2017
Subject Area
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term
from 2017 to 2022
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 317046553
In machine learning, the notion of multi-armed bandit (MAB) refers to a class of online learning problems, in which an agent is supposed to simultaneously explore and exploit a given set of choice alternatives in the course of a sequential decision process. Combining theoretical challenge with practical usefulness, MABs have received considerable attention in machine learning research in the recent past. This project is devoted to a variant of standard MABs that we refer to as the preference-based multi-armed bandit (PB-MAP) problem. Instead of learning from stochastic feedback in the form of real-valued rewards for the choice of single alternatives, a PB-MAB agent is allowed to compare pairs of alternatives in a qualitative manner. The goal of this project is twofold. First, by consolidating existing work and addressing a number of open theoretical questions and algorithmic problems, we wish to provide a complete and coherent understanding of the PB-MAB setting. Second, we shall develop methods for practically motivated extensions of this setting, namely, contextual PB-MABs that allow preferences between alternatives to depend on a decision context, and PB-MABs with generalized feedback that go beyond pairwise comparisons and permit preference information of different kind.
DFG Programme
Research Grants
Ehemaliger Antragsteller
Dr. Robert Busa-Fekete, until 2/2017