Towards a Synthesis of Local and Global Pattern Discovery
Final Report Abstract
Local pattern mining is a key technique in descriptive data mining, where the task is to find descriptions of parts of the data that deviate from the overall distribution of the items in a given dataset. On the other hand, many applications in machine learning and data mining aim at learning a predictive, global model that allows to make predictions for new, unseen data. The goal of this project was to investigate whether local patterns may serve as a description of a given dataset, upon which a subsequent global modeling phase can be based. To this end, we developed the LEGO framework which consists of the three phases local pattern discovery, pattern set selection, and global modeling. As rules are the most commonly used representation for local patterns, the work on this project increased our general understanding of inductive rule learning as a global modeling technique. Much of the work in the project was therefore devoted to the study of components of commonly used rule learning algorithms from the point of view of how they combine local patterns to global models. From that we gained several insights about the rule learner’s search algorithms, the popular covering strategy, and most notably about rule learning heuristics. For example, somewhat surprisingly, the goal of global modeling imposes a stronger bias towards consistency as frequently used local pattern evaluation measures such as weighted relative accuracy provide. These results also led to improved rule learning algorithms. In particular, we found that a clear separation between rule refinement heuristics and rule evaluation measures, which are typically treated uniformly in conventional algorithms, leads to improved predictive performance. Moreover, we discovered that the use of inverted heuristics, which clearly model the top-down search strategy that is most common in subgroup discovery, also lead to longer but interestingly no less general rules. Other results include rule stacking, as a technique for compressing multiple sets of local patterns into a single comprehensible model, and a novel algorithm for rule-based regression. Based on the framework of preference learning, which we developed in a different project, we are now able to provide rule-based solutions to a wide variety of global modelling tasks such as such as label and instance ranking, hierarchical and ordinal classification, or multipartite ranking and multilabel classification.
Publications
- On meta-learning rule learning heuristics. In Proceedings of the 7th IEEE International Conference on Data Mining (ICDM-07), pages 529–534, Omaha, NE, 2007
Frederik Janssen and Johannes Fürnkranz
- From local patterns to global models: The LeGo approach to data mining. In Arno J. Knobbe, editor, From Local Patterns to Global Models: Proceedings of the ECML/PKDD-08 Workshop (LeGo-08), pages 1–16, Antwerp, Belgium, 2008
Arno J. Knobbe, Bruno Cremilleux, Johannes Fürnkranz, and Martin Scholz
- A re-evaluation of the over-searching phenomenon in inductive rule learning. In Haesun Park, Srinivasan Parthasarathy, Huan Liu, and Zoran Obradovic, editors, Proceedings of the SIAM International Conference on Data Mining (SDM-09), pages 329– 340, Sparks, Nevada, 2009
Frederik Janssen and Johannes Fürnkranz
- An empirical comparison of probability estimation techniques for probabilistic rules. In João Gama, Vítor Santos Costa, A. Jorge, and Pavel B. Brazdil, editors, Proceedings of the 12th International Conference on Discovery Science (DS-09), pages 317– 331. Springer-Verlag, 2009
Jan-Nikolas Sulzmann and Johannes Fürnkranz
- On the quest for optimal rule learning heuristics. Machine Learning, 78(3):343–379, March 2010
Frederik Janssen and Johannes Fürnkranz
- Heuristic rule-based regression via dynamic reduction to classification. In Toby Walsh, editor, Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI-11), pages 1330–1335, 2011
Frederik Janssen and Johannes Fürnkranz
- Rule stacking: An approach for compressing an ensemble of rule sets into a single classifier. In Tapio Elomaa, Jaakko Hollmén, and Heikki Mannila, editors, Proceedings of the 14th International Conference on Discovery Science (DS-11), pages 323– 334, Espoo, Finland, 2011. Springer
Jan-Nikolas Sulzmann and Johannes Fürnkranz
- Foundations of Rule Learning. Springer-Verlag, 2012
Johannes Fürnkranz, Dragan Gamberger, and Nada Lavra
- Multi-label LeGo – Enhancing multi-label classifiers with local patterns. In Jaakko Hollmén, Frank Klawonn, and Allan Tucker, editors, Advances in Intelligent Data Analysis XI – Proceedings of the 11th International Symposium on Data Analysis (IDA-11), pages 114–125. Springer, October 2012
Wouter Duivesteijn, Eneldo Loza Mencía, Johannes Fürnkranz, and Arno J. Knobbe
- Unsupervised generation of data mining features from linked open data. In International Conference on Web Intelligence and Semantics (WIMS’12), 2012
Heiko Paulheim and Johannes Fürnkranz
- Separating rule refinement and rule selection heuristics in inductive rule learning. In Toon Calders, Floriana Esposito, Eyke Hüllermeier, and Rosa Meo, editors, Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD-14), Part III, pages 114–129, Nancy, France, 2014. Springer
Julius Stecher, Frederik Janssen, and Johannes Fürnkranz
(See online at https://doi.org/10.1007/978-3-662-44845-8_8) - Stacking label features for learning multilabel rules. In Proceedings of the 17th International Conference on Discovery Science (DS-14), pages 192–203, Bled, Slovenia, 2014
Eneldo Loza Mencía and Frederik Janssen
(See online at https://doi.org/10.1007/978-3-319-11812-3_17)