Get in touch with our AI team.
Oxford Robotics Institute
Compete and Compose: Learning Independent Mechanisms for Modular World Models
PUBLISHED 17 SEPT 2024
The Oxford Robotics Institute (ORI) is built from groups of researchers, engineers and students all driven to change what robots can do for us. Industrial collaboration lies at the heart of their research agenda and ORI membership is the vehicle they use to achieve this, accelerating knowledge transfer from the ORI to its industrial members, currently including BP, Oxa, Accenture, Honda, L3 Harris, Navtech Radar and Scan Computers.
The ORI’s current research interests are diverse, from flying to grasping - inspection to running - haptics to driving, and exploring to planning. This spectrum of interests leads to researching a broad span of technical topics, including machine learning and AI, computer vision, fabrication, multispectral sensing, perception and systems engineering.
Compete and Compose: Learning Independent Mechanisms for Modular World Models
Project Background
Humans are capable of interacting with the complex world around us as we have the ability to learn efficiently and adapt flexibly without prior knowledge of scenarios presented to us. Designing and building an artificial agent that is also capable of these attributes - with minimal training data - still remains a significant challenge. The key difference between humans and machines are crucial distinctions in how each learns - it is supposed that humans create knowledge in a structured and modular way, by distilling past experience into general principles about the world, which can then be applied or selectively updated in novel settings. In contrast, current machine world models are mostly based on monolithic architectures, where the resulting entangled representations of the world limit the selective re-using of prior knowledge in new environments.
Project Approach
Recent studies on object-centric world models [1, 2] served as an initial step for the ORI team, towards a structured and compositional understanding of the world. By decomposing the observed scene into discrete object slots, these methods modelled the interaction between entities in the scene and achieved state-of-the-art results. The ORI team then proposed that, just as the state representation of that scene could be factorised into object slots, the dynamics of the environment, too, could be factorised into discrete and independent mechanisms. This hypothesis was based on the fact that while the overall dynamics can change across different environments, there are only a small set of simple interactions, called primitives, that take place such as "A rests on top of B" or "C collides with D”, so if the team could recreate these interactions there was potential to apply them in unseen settings. < /p>
However, acquiring such a set of versatile mechanisms from observations without supervision presented challenges both in terms of model architecture and learning algorithm. As current learning methods excel at learning to predict dynamics from independent and identically distributed environments, learning shared modules that capture diverse environments could not be achieved by directly applying updates to the entire model. To this end, the ORI team argued that the ability to selectively update the model during learning, i.e. to recognise parts of the model that are relevant to the observed data and perform modular updates, was instrumental to the emergence of discrete independent mechanisms.
To tackle this challenge the team designed a two-phase training procedure: firstly the ‘Competition’ phase, in which the model learns a set of independently parameterised modules which encode interaction primitives from diverse environments; and a second phase “Composition’, where the model is trained to apply these learnt interaction primitives in novel environments - illustrated below.
The ORI team named this two-phase model COMET (COmpetitive Mechanisms for Efficient Transfer) and it was evaluated both quantitatively and qualitatively, within image-based environments. Firstly, COMET was evaluated against two competitive baselines - Contrastive learning of Structured World Models (C-SWM) [1]; and Neural Production Systems (NPS) [3]. C-SWM learns a world model from observation via contrastive learning with a graph neural network (GNN) based dynamics model, whilst NPS learns to capture object interactions as independent mechanisms.
To perform these evaluations three distinct datasets were used - Particle Interactions, Traffic, and Team Sports. For each of these datasets, they defined a set of environments where objects can exhibit different behaviours. These environments were designed to test whether COMET could extract meaningful mechanisms and adapt to unseen environments via composition. The Particle Interactions dataset consisted of coloured particles that can interact with each other in different ways such as attraction and repulsion - defined by a combination of rules such as "red particles repel each other". The Traffic dataset contained observation sequences of traffic scenarios defined by traffic rules that apply to different vehicles such as "blue cars do not need to stop at red lights". The Team Sports dataset consisted of a simulated generic hockey game where players can perform different actions such as moving towards the puck or dribbling the puck towards the opponent goal, where some players might tend to take more aggressive actions. The below diagram shows how COMET was able to learn disentangled mechanisms that correspond to ground-truth behaviours in all three datasets, as indicated by the fact that each interaction mode has one main corresponding learnt mechanism. In contrast, NPS did not exhibit the same structure.
It was them tested as to whether COMET could apply these learnt mechanisms in new unseen environments. The below diagram illustrates using coloured tabs to show the ‘winning’ mechanism, and it can be seen that across all environments, the competition ‘winner’ changes as the underlying interaction mode changes. In the first row the particles repel each other when they are close (blue) and move independently when they are apart (green).
In the middle row the orange car obeys a slower speed limit and always picks the slow mechanism (orange), whereas the blue car approaches the red light with normal driving (pink) leading to slowing down (orange) and coming to a stop (green). Finally, in the bottom row the player first waiting to receive the ball (pink) and then moves towards opponent goal when in possession of the ball (orange).
Project Results
The ORI team was able to show experimentally that the proposed two-phase COMET method was able to disentangle shared mechanisms across different environments from image observations, and thus enabled sample-efficient and interpretable adaptation to novel situations. This was further backed up by adaptational efficiency results where rollout errors were measured to ascertain which approach - COMET, GNN-based C-SWM or NPS produced least errors fewer adaptation episodes. The below graphs show rollout error (lower is better) and confirm that COMET outperformed the baselines in the low-data regime for Particle Interactions and Traffic datasets, illustrating that explicitly reusing learnt mechanisms results in improved sample efficiency compared to gradient-based fine-tuning.
For the Team Sports dataset, GNN-based C-SWM performed better than COMET and NPS, however the team hypothesised that this was because both COMET and NPS only model binary interactions between objects whereas C-SWM uses a GNN-based transition model that can take into account the entire scene, however COMET still outperformed NPS.
Conclusion & Next Steps
The ORI team proved COMET was able to learn discrete, abstract concepts from diverse observations, and re-use such concepts to predict the evolution of unseen environments, and furthermore that COMET’s ability to perceive the world through the lens of structured high-level abstract concepts served as an important step towards efficient generalisation and transfer across different task settings.
Currently COMET lacks the ability to update the mechanisms during composition, which limits the model’s ability to adapt to environments with completely new interactions. Designing mechanism-based models that can initiate new mechanisms as they encounter new data was beyond the scope of this investigation, however the ORI team believe the method presented here will be instrumental in the development of such systems.
Further information on these experiments and their results can found by reading the full research paper.
References
[1] Thomas Kipf, Elise van der Pol, and Max Welling. Contrastive learning of structured world models. In International Conference on Learning Representations, 2020.
[2] Zhixuan Lin, Yi Fu Wu, Skand Peri, Bofeng Fu, Jindong Jiang, and Sungjin Ahn. Improving generative imagination in object-centric world models. In 37th International Conference on Machine Learning, ICML 2020, volume PartF16814, pages 6096–6105, 2020. ISBN 978-1-71382-112-0.
[3] Anirudh Goyal, Aniket Rajiv Didolkar, Nan Rosemary Ke, Charles Blundell, Philippe Beaudoin, Nicolas Heess, Michael Curtis Mozer, and Yoshua Bengio. Neural production systems. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021.
The Scan Partnership
The Scan AI team have been supporting ORI research projects as an industrial member for the past three years. Scan provides a hardware cluster of NVIDIA DGX servers, a multi-GPU NVIDIA RTX 6000 server and AI-optimised PEAK:AIO NVMe software-defined storage, to further robotic knowledge and accelerate development. This cluster is overlaid with Run:ai cluster management software in order to virtualise the GPU pool across the compute nodes to facilitate maximum utilisation, and to provide a mechanism of scheduling and allocation of ORI workflows’ across the combined GPU resource. Access to this infrastructure is delivered via the Scan Cloud platform, hosted in a secure UK datacentre.
Project Wins
COMET was able to disentangle shared mechanisms across different environments from image observations
COMET was able to learn discrete, abstract concepts from diverse observations
COMET proved an important step towards efficient generalisation and transfer across different task settings
Speak to an expert
You’ve seen how Scan continues to help the Oxford Robotics Institute further its research into the development of truly useful autonomous machines. Contact our expert AI team to discuss your project requirements.
Related content
Oxford Robotics Institute
Learn how the Scan AI team worked with ORI to research leveraging neural radiance fields for tactile sensory data generation.
Read moreOxford Robotics Institute
Discover how the Scan AI team supported ORI to investigate efficient skill acquisition for complex manipulation tasks in obstructed environments.
Read moreOxford Robotics Institute
Discover how the Scan AI team supported ORI in learning a disentangled gait representation for versatile quadruped locomotion in robots.
Read more