The image created by DALL-E at the request of Chat-GPT: “A photorealistic image of a large automated warehouse, captured as if taken with a high-quality DSLR camera. The warehouse features autonomous robots resembling rolling shelves with forklift-style arms, actively picking, lifting, and transporting goods. Real humans in high-visibility vests are working in the warehouse, supervising the robots. The environment is brightly lit, clean, and highly organized, showcasing sharp details and realism.”
Reinforcement Learning
Lernen mit Struktur
Deep reinforcement learning (RL) – this branch of machine learning deals with AI systems that learn to make sequential decisions by interacting with the world. RL has already achieved remarkable success in some areas: from complex strategies in games such as Go, to multiple action sequences in simulated robotics, to fine-tuning large language models. However, its use in the real world remains limited due to challenges such as inefficient use of data, lack of security and limited generalisability. A study by the L3S Research Centre and the University of Texas at Austin shows how embedding problem-specific structural information can fundamentally improve the performance and scalability of RL systems.
Overcoming fundamental challenges
“Some of the biggest challenges for RL stem from the unpredictability of real-world scenarios,” says Aditya Mohan, lead author of the study. RL algorithms often fail in dynamic environments or with noisy reward signals. Traditional RL models typically learn by trial and error to maximise extrinsic rewards. This process is not only data-intensive, but also severely limits the transferability of the models to new tasks. For example, a robot trained in a simulation to pick up a blue cup might fail if the colour of the cup changes.
This limitation is in stark contrast to human learning. In essence, children develop a general understanding of their environment that they can apply to specific tasks. In contrast, RL algorithms are trained to implicitly learn just enough about the world to optimise the extrinsic reward provided by the human designer. To adapt such algorithms to change, specific rewards would have to be defined for individual problem variants.
Incorporating structural information
The authors argue for the integration of additional structural information into the models. For example, an RL agent driving a taxi in a city would have to learn the entire road network, traffic behaviour and passenger movements through interaction alone – an almost impossible task. With structural information, such as the separation of traffic and passenger patterns, the learning process can be made more efficient and targeted.
The approach takes advantage of the ability to break down complex problems into manageable sub-components. The authors investigated the extent to which different RL methods assume such decomposability and then developed a framework to categorise these assumptions. The study identifies four basic archetypes for decomposing complex problems in RL models: latent, factorised, relational and modular.
From design choices to design patterns
RL algorithms often differ from the standard RL pipeline only by minor modifications. Algorithms that make structural assumptions do so in a repeatable order. Based on this insight, Mohan and his co-authors present a framework that outlines design patterns for embedding structure in RL algorithms, including abstract states, factorised models, relational architectures, and modular designs. Analysing a wide range of RL work through the lens of design patterns reveals which combinations of patterns have proven effective for particular applications – from generalisation to interpretability. For example, by incorporating relational representations, a robot can efficiently sort parcels in a warehouse because it understands the relationships between objects. Similarly, by using reward models, RL agents can learn efficiently even in environments with few reward signals.
This structured approach not only speeds up data processing but also improves the generalisability of RL agents. The work opens up new areas of research, such as identifying optimal design patterns or combinations thereof for different applications – depending on the desired properties of generalisability, efficiency, security or interpretability. “We hope that our framework will serve as a guide for the further development of RL methods,” says Mohan. “The use of structures could be the key to finally extending RL to the complex real world.”
Aditya Mohan, Amy Zhang, Marius Lindauer: Structure in Deep Reinforcement Learning: A Survey and Open Problems. J. Artif. Intell. Res. 79: 1167-1236 (2024) jair.org/index.php/jair/article/view/15703/27028
Kontakt
Aditya Mohan, M. Sc.
Aditya Mohan is a researcher at the L3S Research Center and the Institute for Information Processing, Department of Automatic Image Interpretation, Leibniz Universität Hannover.