Unsupervised Learning of Hierarchical Features for Visual Self-Localization and Navigation in Seasonally Changing Outdoor Environments

Project Details

Project start:
Project end:
Total project budget:

Principal Investigator(s)


Künstliche Intelligenz

Research areas

DFG Areas

Official Statistics Areas


Sensor-driven self-localization within a self-acquired map is a key capability for efficient task-driven navigation on autonomous mobile robots. Lidar-based self-localization and mapping (SLAM) is the established state-of-the-art approach for indoor robots and outdoor applications like autonomous cars, but has the drawbacks of requiring expensive laser-based sensors and the necessity of manual editing and correcting sensor traces in the acquired maps. Visual SLAM (V-SLAM) describes the problem and a class of implementations for using camera data for incrementally building a map while the own position is uncertain and the map is incomplete. The main scientific target of this project is to advance vision-based autonomous self-localization towards strong temporal environment variation (e.g. seasonal change) and the development of flexible incremental and efficient learning mechanisms that can deal with this challenging requirement. Here we generally assume no prior knowledge of the environment and concentrate on algorithms that can be realized on low-end computing hardware without special-purpose sensors. Our approach to tackle mid-term temporal environment variation is self-supervised incremental learning of invariant representations using loop-closures as training signals. We will investigate this concept to deal with the effects of seasonal change, as empirically studied by a long-term data collection in a real garden environment. Using the slow feature-gradient-based mapping and localization we learn hierarchical feature representations with minimal assumptions on the environment. Slow features represent the similarity relation of states experienced during training. Thus, navigation on the gradients of these features can replicate route preferences of the learning phase. States experienced in temporal proximity during training are encoded with similar features and thus the change between the states is small or slow. If however, the representation fails, e.g. due to drastic changes never seen before, the representations change quickly, which can be used as a surprise or low-confidence signal. The method has shown the potential of the system to work on a low-end DSP and may be orders of magnitude more efficient than state of the art V-SLAM systems.

Last updated on 2019-13-03 at 11:05