Visuelle Aufmerksamkeit für Assistenzrobotersysteme (AVRAM) - Eine Methode zum frühen Clustering für die künstliche visuelle Aufmerksamkeit zur Erhöhung der Arbeitsleistung von Aktiven Sehsystemen
Zusammenfassung der Projektergebnisse
Mobile vision systems face special challenges in visual perception and recognition because of the diversity of environments they have to work in. High amount of robustness and flexibility is needed in machine vision and cognition algorithms for visual identification, self localization, obstacle avoidance, and path planning. Inspiration from biology can lead to significant advancements in artificial cognitive systems since the nature provides examples of highly efficient cognitive systems, able to perform the said tasks perfectly. Visual attention is one of the important components of natural vision that helps in selection of relevant and important information from the environment. Development of computational models for this phenomenon has been a focus of interest for researchers during the last couple of decades. Theories and discoveries in psychophysics, ophthalmology, and neurobiology are used as guidelines for progress in the field of attention modeling. One of the major goals of the project under discussion was to investigate such a foundation for attention modeling that could lead to harmony between the processes of saliency selection and other vision related tasks. The list of subgoals in order to achieve this objective included the computation of high-level shape based features for visual attention, making the output of basic image processing steps re-usable in the attention procedures as well as machine vision, and incorporation of further feature channels, that exist in nature, into artificial attention. Another target of the work was to make a step towards extending the scope of visual attention from 2D digital images (or videos) to actual 3D space as per capabilities of natural vision systems. As an increment towards this goal the project aimed to include saliency maps based upon depth from stereo and contrast between known and unknown objects. For developing the capability of detecting temporal saliency into the system, a module to compute motion saliency was included in the goals. In order to measure the achievements made by the work on visual attention modeling it was planned to extensively test the developed algorithms on simulated virtual robots as well as real robots. As standardized metrics and measurement methods for quantitative evaluation of the results from attention models did not exist until the time of the proposal, it was also proposed to make some efforts in designing some metrics and establishing a basic infrastructure for a test bed on which objective assessment of attention models could be performed by comparing them with the human benchmark. The region-based foundation for computation of saliency provided many advantages towards achievement of the planned goals. First is the reduction of data that has to be processed by attention routines leading to increase of overall efficiency. Secondly, it leads to articulation between the processes of attentional selection and machine vision because the features of regions selected for attention remain intact throughout the visual pathway. The grouped pixels in early segmentation also allowed computation of more shape-based feature channels, handling global contrast, and precise localization of the salient objects. The shape features also allowed construction of target-dependent top-down feature maps using the once segmented regions. The top-down and bottom-up pathways were integrated into a single architecture with inclusion of behavior control. The same system is enabled to perform different visual behaviors by simply activating the required configuration. This is an important milestone in visual attention modeling. The saliency computation is extended with channels for depth from stereo, temporal saliency of motion and saliency due to contrast between known and unknown objects. The memory mechanism introduced for this purpose will be helpful in future advancements in attention based learning and knowledge-based vision. Another contribution is the development of measurement methods and quantitative metrics for evaluation of output from visual attention models. A web-based software is also designed that will enable the research community to share their results and observe the factual progress in this field. The depth saliency-using stereo, as done in this project, can be extended by inclusion of laser range sensor in order to increase the depth resolution and accuracy needed by the mobile systems. This will also lead to further research in multi-modal attention. On the other hand, the motion saliency computation done in this project has room for reduction in computational complexity. There is a need for further efforts to enhance the motion saliency method to obtain same or better quality of results with less run time to make it feasible for robotic applications. The work done on knowledge driven saliency can be extended by including autonomous learning by mobile systems for spatial inhibition of return in 3D and reducing the search space by ignoring static environment. The web-based evaluation system designed in this project needs to be populated with human benchmark data not only on still images but dynamic scenes with mobile vision systems as well. A large amount of input data in form of images and videos has to be created and categorized into different levels of visual complexity and attentional behaviors. Human response on such a data will be useful for objective evaluation of computational attention models. So far such categorized benchmark database is not available to the research community.
Projektbezogene Publikationen (Auswahl)
- Enhanced Motion Parameters Estimation for an Active Vision System. Pattern Recognition and Image Analysis, Vol. 18. 2008, No. 3, pp.373-375.
M. Shafik, B. Mertsching
(Siehe online unter https://dx.doi.org/10.1134/S1054661808030024) - Behavior Adaptive and Real-Time Model of Integrated Bottom-Up and Top-Down Visual Attention. PhD Dissertation, Universität Paderborn, 2009.
Z. Aziz
- Occlusion as a Monocular Depth Cue Derived from Illusory Contour Perception. In: KI 2009: Advances in Artificial Intelligence: 32nd Annual German Conference on AI, Paderborn, Germany, September 15-18, 2009. Proceedings. Lecture Notes in Computer Science, Vol. 5803. 2009, pp 97-105.
In: LNAI 5803, KI 2009: Advances in Artificial Intelligence, Paderborn, Germany, 2009, pp.97-105.
M. Hund, B. Mertsching
(Siehe online unter https://dx.doi.org/10.1007/978-3-642-04617-9_13) - Perzeptuelle Organisation von Objektgrenzen unter Verwendung anisotroper Regularisierungsmethoden. PhD Dissertation, Universität Paderborn, 2009.
M. Hund
- Real-Time Scan-Line Segment Based Stereo Vision for the Estimation of Biologically Motivated Classifier Cells. In: KI 2009: Advances in Artificial Intelligence, Paderborn, Germany, 2009. Lecture Notes in Computer Science, Vol. 5803. 2009, pp 89-96.
M. Shafik, B. Mertsching
(Siehe online unter https://dx.doi.org/10.1007/978-3-642-04617-9_12) - Survivor Search With Autonomous UGVs Using Multimodal Overt Attention. Safety Security and Rescue Robotics (SSRR), 2010 IEEE International Workshop on 26-30 July 2010, Bremen, Germany, pp. 1-6.
M. Z. Aziz, B. Mertsching
(Siehe online unter https://dx.doi.org/10.1109/SSRR.2010.5981566) - Knowledge Driven Saliency: Attention to the Unseen. In: Advanced
Concepts for Intelligent Vision Systems (ACIVS 2011), Ghent, Belgium, Lecture Notes in Computer Science, Vol. 6915. 2011, pp 34-45.
M. Z. Aziz, Michael Knopf, B. Mertsching
(Siehe online unter https://dx.doi.org/10.1007/978-3-642-23687-7_4) - The Cognitive Architecture Based on Biologically Inspired Memory.
Industrial Electronics and Applications (ICIEA), 2011 6th IEEE Conference on 21-23 June 2011, pp. 936 - 941.
L. Kleinmann, B. Mertsching
(Siehe online unter https://dx.doi.org/10.1109/ICIEA.2011.5975721) - 3D Motion Analysis for Mobile Robots. PhD Dissertation, Universität Paderborn, 2012.
S. Shafik
- Continuous Region-Based Processing of Spatiotemporal Saliency. In: International Conference on Computer Vision Theory and Applications (VISAPP 2012), Rome, Italy, 2012, pp 230-239.
J. Tünnermann, B. Mertsching