Project Details
Coded Distributed Computation with Application to Machine Learning
Applicant
Professorin Dr.-Ing. Antonia Wachter-Zeh
Subject Area
Communication Technology and Networks, High-Frequency Technology and Photonic Systems, Signal Processing and Machine Learning for Information Technology
Term
since 2021
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 470027923
Machine learning algorithms deal with massive amounts of data on which intensive calculations have to be performed. The idea of distributed computation is to divide the task into smaller subtasks (which are easy to solve) and distribute the subtasks to several workers. Each worker solves a subproblem and returns the result to the master node. The master node has to assemble the results to obtain the overall result. A severe problem in distributed computation are slow workers (called stragglers).Waiting for their responses causes a tremendous delay for the machine learning algorithm. Therefore, error-correcting codes are used with the goal to be able to reconstruct the overall result from many (but not all) workers.This project develops new concepts for reliable and private coded computation. This includes coded distributed matrix multiplication, stochastic gradient descent and convolution products. For distributed matrix multiplication, the main goal is to reduce the problem of ill-conditioned real-valued Vandermonde matrices and apply new coding strategies from interleaved codes, rank-metric codes, and LDPC codes. The distributed convolution product has first to be split into suitable subtasks in order to design tailored coding schemes to deal with stragglers and adversarial workers. For the distributed computation of stochastic gradient descent, the convergence of accelerated gradient descent and the minimization of the communication load will be investigated as well as new coding techniques.A crucial point is to guarantee the privacy of the involved data. We will therefore propose schemes that guarantee the privacy of the users and analyze how the encoding/decoding complexity and the privacy constraints influence the overall latency of the system and the computation/communication trade-off. Due to the increasing interest in Federated Learning, we plan to incorporate coding techniques that can speed up learning and guarantee privacy of the involved data. In all approaches, the possible heterogeneity of the network should be leveraged by using rateless codes.
DFG Programme
Research Grants