Project Details
Reasoning over Large Amounts of Data in Ontologies via Abstraction and Refinement
Applicant
Professorin Dr. Birte Glimm
Subject Area
Theoretical Computer Science
Term
from 2015 to 2020
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 266736200
Ontology based data access (OBDA) is an increasingly popular paradigm in the area of knowledge representation and information systems. An ontology in this context is a combination of a TBox with background domain knowledge and an ABox, which contains facts about elements of the application domain. The TBox is used to enrich and integrate large, incomplete, and possibly semi-structured data, which users can then access via queries. For example, a large part of Wikipedia is available in machine-processable form, which, together with an ontological TBox, is an important information source for many applications. To efficiently handle large ABoxes, OBDA approaches assume that the data is stored in a database. Nevertheless, the assumption of complete data that is typically made in databases (closed world assumption) does not hold and reasoning is required to answer queries. A standard reasoning approach is materialization, i.e., all entailed consequences are added to the ABox before the system accepts queries. For large ABoxes, however, the materialization can take several hours.Within this project extension, we suggest a novel approach to materialization, where we do not compute the materialization directly on the (usually large) ABox, but where we work instead on a smaller ``abstraction'' of the data. For the abstraction, we define criteria under which individuals from the ABox are considered equivalent. Such indistinguishable individuals are then represented just once in the abstraction. For TBoxes that are small compared to the ABox, the abstraction is usually significantly smaller than the original ABox and, hence, the entailed consequences can be computed efficiently in main-memory. Through the entailed consequences individuals that were indistinguishable may become distinguishable. To account for that, the initial abstraction is iteratively refined until a fix-point is reached. The results obtained so far are to be extended in several directions: 1) The developed technique for handling disjunctions is to be extended to more expressive ontology languages (while still guaranteeing soundness and completeness). 2) Relevant parts of the abstraction that must be refined, are to be identified and incrementally treated in order to minimize the communication with the database backend. 3) Based on the incremental refinements we plan to develop techniques for handling updates to the ontology. 4) The abstraction approach seems well-suited for improving the ontology debugging process in particular for large ABoxes that are learned from text via the generation of explanations directly from the abstraction. The proposed project supports the efficient use of the ever growing sources of structured data by combining well-established database technologies with in-memory-based reasoning techniques in a novel way.
DFG Programme
Research Grants