Detailseite
Format-aware Detection of Malicious Documents (FORMAD)
Antragsteller
Professor Pavel Laskov, Ph.D.
Fachliche Zuordnung
Softwaretechnik und Programmiersprachen
Förderung
Förderung von 2013 bis 2015
Projektkennung
Deutsche Forschungsgemeinschaft (DFG) - Projektnummer 217981196
The project addresses the problem of detecting malicious content in formatted documents. Embedding of malicious code in documents is frequently used in modern attacks against computer systems. Successful detection of document-based attacks is only possible if detection methods are fully aware of format-specific syntax and semantics. In previous work, the format-aware analysis was only done for special cases, for example, embedded JavaScript code. The goal of the proposed project is to develop a general methodology for the format-aware analysis to be used for detection of malicious documents. The main idea is to use an intermediate document representation in the form of hierarchical key/value pairs (HKV) for the essential processing steps. Such representation will decouple analysis from format peculiarities while retaining a general semantics of the document content. Adaptation of the proposed methodology to new document formats would only require conversion to the HKV format instead of a complete re-design of detection methods. The main scientific challenge of the project is to develop analysis techniques suitable for the HKV representation. Only limited prior work has addressed such representation, and previous methods lack the scalability required for complex document formats. This challenge will be addressed by applying machine learning methods suitable for analysis of large amounts of high-dimensional data. New methods will be developed for assessing the plausibility of the values of specific keys as well as the overall risk associated with a document.
DFG-Verfahren
Sachbeihilfen