How can computers perceive and understand their environment?

On September 22, 2023, the Lower Austrian Research Festival attracted over 5,000 visitors to the Palais Niederösterreich in Vienna's city center. Numerous exhibitors from Lower Austrian research institutions offered interested individuals and families the opportunity to experience science and research firsthand and explore it through hands-on experiments.

At one of the FOTEC booths, visitors learned how robots can perceive and understand their environment using cameras and deep learning, using object recognition as an example.

Object Recognition

A robot perceives its surroundings using cameras. Objects are first located and recorded, and then analyzed using algorithms. The identification of objects based on their association with a previously known or comparable object is called object recognition.

Object recognition requires deep learning, a method of machine learning in which the computer processes information in different layers.

This example illustrates how object recognition works in practice:

FOTEC*

The value within the context of the object being detected indicates how confidently the object was identified, i.e., the probability of a correct or incorrect result. The higher the value, the more confident the program is that its result is correct. These assumptions are based on statistical probabilities and help to reduce false positives.

Deep Learning

Deep learning, as a subfield of artificial intelligence, relies on the use of artificial neural networks (ANNs). The algorithms are modeled on the structure of the human brain. This method is not a new development but has been used since the early 1940s. However, due to advancing digitalization, it is being used more and more frequently and is gaining attention.

The training of artificial neural networks is based on data. Deep learning systems function by layering multiple layers of information on top of each other. There is an input layer, one to countless hidden layers, and an output layer. Input data is analyzed and undergoes a step-by-step process, with the result of each layer being passed on to the next, and so on. Errors are minimized by adjusting the weights of the input data. The more hidden layers are used, the more deep learning takes place.

A neural network that analyzes and identifies images in a single pass is called YOLO (You Only Look Once). YOLO's unique feature, compared to previous methods, is that it can perform analysis and identification in just a few steps.

Object Recognition in Practice

Object recognition is increasingly finding its way into many areas of software development, such as manufacturing, the Internet of Things, and Industry 4.0/5.0. Typical real-world examples that operate on this principle include license plate recognition and facial recognition.

FOTEC is exploring the possibilities of object recognition in its project "Development of a Semi-Automated Optical Profile Identification System."

Further information about the research festival can be found here.

More information about FOTEC's projects can be found on this website.