Data-efficient Reinforcement Learning

© AdobeStock

Reinforcement learning (RL) describes one of the three fundamental paradigms of machine learning, along with supervised and unsupervised learning. In RL, through interaction with the environment, the agent learns autonomously which actions to perform in which state in order to maximize the reward. Unlike supervised learning, this learning method does not require any prior data. Instead, the necessary data is generated during training via interaction with the environment. In addition, a reward function rewards desirable behavior (e.g. successful assembly) and penalizes less desirable behavior accordingly.

Classical reinforcement learning has made great progress in recent years, especially through its combination with deep learning methods (e.g., AlphaGo, Pluribus). However, these algorithms require a great deal of interaction with the environment and are therefore often impracticable when it comes to real-world applications. This is because the action space in which an RL agent operates quickly becomes very large in real-world applications. The larger the number of different actions the RL agent can choose between, the more time-consuming and data-intensive the training of the RL agent becomes.

Data-efficient RL describes a research field that attempts to make data-hungry algorithms more data-efficient through the use of expert knowledge, physical laws, abstraction, or a digital twin. Exploring the action space and training and planning subsequent steps becomes easier and more data-efficient with expert knowledge, physical laws, or abstraction. Learning through simulation on a digital twin can significantly reduce the necessary interaction time on the real object, e.g., a cobot. These combinations of reinforcement learning coupled with knowledge, laws, and methods from other fields make the learning process practical and thus make it possible for RL to be used in industry.