Project Details
Learning Table Similarity Measures
Applicant
Professor Dr. Ulf Leser
Subject Area
Data Management, Data-Intensive Systems, Computer Science Methods in Business Informatics
Term
from 2017 to 2021
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 388146305
Tables are an efficient and popular mean to embed structured pieces of information in unstructured texts, such as reports, publications, or web pages. However, the particular properties of tables (two-dimensional structure, headers, semantic homogeneity in columns or rows, ...) are disregarded in typical retrieval methods. On the other hand, directly finding tables matching a given search criterion would offer fast access to a wealth of structured information. One way to achieve such functionality is table similarity search: Given a query table, find the most similar tables in a given table corpus. In this project, we will research methods to learn high-quality tablesimilarity measures as fundamental pieces of table similarity search methods, but also for other applications such as table information extraction, table clustering, or table fusion. In particular, we will study deep learning methods for designing supervised table similarity measures with the objectives of 1) automatic identification of table orientation, 2) learning appropriate table representations at multiple levels of abstraction, and 3) merging these representations into a single table similarity score. All methods will be evaluated on a gold standard annotated table corpus and compared to different state-of-the-art methods. All corpora and software will be published under a permissive open access license.
DFG Programme
Research Grants