Detailseite
Projekt Druckansicht

Visuelle fein-granulare Erkennung von Objekten

Antragsteller Professor Dr.-Ing. Joachim Denzler, seit 1/2017
Fachliche Zuordnung Bild- und Sprachverarbeitung, Computergraphik und Visualisierung, Human Computer Interaction, Ubiquitous und Wearable Computing
Förderung Förderung von 2015 bis 2020
Projektkennung Deutsche Forschungsgemeinschaft (DFG) - Projektnummer 275610656
 
Erstellungsjahr 2019

Zusammenfassung der Projektergebnisse

During the funding period of the project, we tackled three major work packages. The first point is the unsupervised part constellation discovery as well as representations for these parts. Especially in fine-grained recognition, finding and leveraging subtle differences at specific locations which differ for each class are essential for successful classification. Whereas it is possible to learn this in a supervised way, the annotation of these locations by experts is very time consuming and expensive. To resolve this problem, we developed an unsupervised part constellation model, which first generates a large set of part proposals. Then it identifies relevant parts by checking for consistent constellations of their detections, which are constrained by their relative position. Furthermore, we developed an attention-based pooling technique, which we later generalized, to learn and fine-tune part and object representations. Calculating local feature descriptions as well as locating attention improved the performance of low and medium complexity models for few-shot fine-grained recognition tasks where only a very limited number of samples per class is available. Although, we find that training the whole pipeline for our first method can be problematic, the later can be integrated into an end-to-end learnable framework. With the second main work package, we introduced methods for exemplar-specific model estimation. We point out two ways to influence the locality. On the one hand, the straightforward approach of fine-tuning a network for a specific domain already increases the region locality significantly. On the other hand, the presented α-pooling approach is a direct way to manipulate the aggregation of the local features and therefore also the locality of predictions. It is important to note that this is also learnable from data alone. The presented visualization method helps to understand the region locality of the decision for a test sample by showing the most influential regions from the training data. Additionally, we investigated how to improve local representation via fine-tuning the CNN model with a subset of the most relevant training images via new selection schemes. While local learning is beneficial for small architectures, we found that it does not yield improvements for more complex architectures. Furthermore, it appears that locality of representations increases for more complex models. Thus, we conclude that these models already focus only on very few training examples, even without additional fine-tuning, due to their complexity. Lastly, the third point we investigated during the funding period is domain adaptation for part detectors and part feature representations. Instead of performing domain adaptation directly, we investigated the influence of domain shifts by analyzing noise patterns. For that, we investigated a variety of different noise types and found that at time of publication all state-of-the-art models are strongly affected by all noise types, which were not present during training. Additionally, we showed that it is possible to estimate noise sensitivity efficiently by computing a first-order approximation of the output change given an image.

Projektbezogene Publikationen (Auswahl)

 
 

Zusatzinformationen

Textvergrößerung und Kontrastanpassung