Detailseite
Projekt Druckansicht

Präferenz-basierte Monte-Carlo Baumsuche

Fachliche Zuordnung Bild- und Sprachverarbeitung, Computergraphik und Visualisierung, Human Computer Interaction, Ubiquitous und Wearable Computing
Förderung Förderung von 2015 bis 2021
Projektkennung Deutsche Forschungsgemeinschaft (DFG) - Projektnummer 280805726
 
Erstellungsjahr 2020

Zusammenfassung der Projektergebnisse

The main outcome of this project are algorithms that allow to employ Monte-Carlo tree search, a very successful strategy for solving sequential decision problems, to scenarios where numerical feedback is not naturally available or has to carefully tuned. Instead, we only assume the availability of pairwise comparisons between different lines of actions (in the general case), or, more specifically, the availability of an ordering of the available reward signals, but without the use of numerical distances. We developed a proof-of-concept implementation which works for the general case, but was expectedly too inefficient for practical purposes, the main problem being the quadratic number of comparisons that has to be performed. To tackle this problem, we developed an algorithm that works for the case of ordinal preference signals, an important subclass of the problem with many practical applications. For this setting, we worked out an efficient base algorithm, which we subsequently improved. As the computational complexity of this algorithm crucially depended on the number of different ordinal feedback levels, a key idea for improvement was the an on-line algorithm that allows to dynamically bin feedback levels into a set of ordinal buckets, thereby simplifying the decision problem via abstraction. We showed that our ordinal MCTS algorithm could obtain competitive results in various games of the general video game playing AI competition (GVGAI), where we have developed one of the leading systems. We also worked on employing Monte Carlo tree search in sequential recommendation problems, but developed a slightly different solution for this scenario, where iterative exploration was limited by the off-line available data points. The key idea to this approach was to first abstract available preferences into a numerical utility function, which could then be used by a conventional MCTS variant for efficiently selecting the best sequence and hence a product that maximizes this learned utility. Finally, we also transposed our framework for handling ordinal feedback values to the related problem of reinforcement learning, which has – in the context of deep learning – received increased attention recently. The main result of this line of work was the first algorithm for deep ordinal reinforcement learning, for which we could show that in many cases, it is competitive to its numerical counterparts even though it uses only the order information of the received feedback signals, and is thus much robust and less sensitive to a careful tuning of a numerical reward signal.

Projektbezogene Publikationen (Auswahl)

 
 

Zusatzinformationen

Textvergrößerung und Kontrastanpassung