Project Details
Projekt Print View

Statistical Foundations of Semi-supervised Learning with Graph Neural Networks

Subject Area Theoretical Computer Science
Mathematics
Term since 2021
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 463402401
 
The theory of deep learning has been an active area of research in recent years and has provided rigorous understanding of the performance of supervised models trained on labelled data. However, recent practical developments in foundation models heavily rely on the availability of massive amounts of unlabelled data. Hence, it is equally important to understand how better models are learned in practice by exploiting unlabelled data through semi-supervised or unsupervised deep learning. This was the goal of our project in the first funding phase of the Priority Program and has led to results on both unsupervised representation learning and semi-supervised deep learning on graphs. The broad goal of the continuation project is to understand two important questions of modern machine learning: (i) How does unlabelled data improve or reduce the accuracy of predictions? (ii) Why does attention mechanism and transformer architecture improve statistical performance? We address both questions in the context of semi-supervised deep learning on graphs, specifically through study of graph neural networks for node classification and link prediction problems. The main technical contributions of the project are: (i) derivation of infinite-width neural tangent kernel and Gaussian process limits of graph attention networks and graph transformers; (ii) computation of exact statistical risk of kernel approximations for graph neural networks, including both convolutional and attention-based architectures; (iii) statistical guarantees of graph neural networks under a class of random graphs (partially aligned contextual stochastic block models). These contributions will help to precisely compare the statistical performance of convolutional and attention-based models, thereby answering why deep attention models are superior for learning long-range interactions in data. Furthermore, the analysis on random graphs will characterize when side information (here, in the form of graph edges) can help or hurt predictive power. The analysis will also help to identify the limitation of current graph neural networks, and explore alternative architectures that are near-optimal on contextual stochastic block models.
DFG Programme Priority Programmes
 
 

Additional Information

Textvergrößerung und Kontrastanpassung