Publications

Conference paper

Kamthe S, Deisenroth MP, 2018,

Data-efficient reinforcement learning with probabilistic model predictive control

, Artificial Intelligence and Statistics, Publisher: PMLR, Pages: 1701-1710

Trial-and-error based reinforcement learning(RL) has seen rapid advancements in recenttimes, especially with the advent of deep neural networks. However, the majority of autonomous RL algorithms require a large number of interactions with the environment. Alarge number of interactions may be impractical in many real-world applications, such asrobotics, and many practical systems have toobey limitations in the form of state spaceor control constraints. To reduce the numberof system interactions while simultaneouslyhandling constraints, we propose a modelbased RL framework based on probabilisticModel Predictive Control (MPC). In particular, we propose to learn a probabilistic transition model using Gaussian Processes (GPs)to incorporate model uncertainty into longterm predictions, thereby, reducing the impact of model errors. We then use MPC tofind a control sequence that minimises theexpected long-term cost. We provide theoretical guarantees for first-order optimality inthe GP-based transition models with deterministic approximate inference for long-termplanning. We demonstrate that our approachdoes not only achieve state-of-the-art dataefficiency, but also is a principled way for RLin constrained environments.

Journal article

Cully AHR, Demiris Y, 2018,

Quality and diversity optimization: a unifying modular framework

, IEEE Transactions on Evolutionary Computation, Vol: 22, Pages: 245-259, ISSN: 1941-0026

The optimization of functions to find the best solution according to one or several objectives has a central role in many engineering and research fields. Recently, a new family of optimization algorithms, named Quality-Diversity optimization, has been introduced, and contrasts with classic algorithms. Instead of searching for a single solution, Quality-Diversity algorithms are searching for a large collection of both diverse and high-performing solutions. The role of this collection is to cover the range of possible solution types as much as possible, and to contain the best solution for each type. The contribution of this paper is threefold. Firstly, we present a unifying framework of Quality-Diversity optimization algorithms that covers the two main algorithms of this family (Multi-dimensional Archive of Phenotypic Elites and the Novelty Search with Local Competition), and that highlights the large variety of variants that can be investigated within this family. Secondly, we propose algorithms with a new selection mechanism for Quality-Diversity algorithms that outperforms all the algorithms tested in this paper. Lastly, we present a new collection management that overcomes the erosion issues observed when using unstructured collections. These three contributions are supported by extensive experimental comparisons of Quality-Diversity algorithms on three different experimental scenarios.

Conference paper

Saputra RP, Kormushev P, 2018,

ResQbot: A mobile rescue robot for casualty extraction

, 2018 ACM/IEEE International Conference on Human-Robot Interaction (HRI 2018), Publisher: Association for Computing Machinery, Pages: 239-240

Performing search and rescue missions in disaster-struck environments is challenging. Despite the advances in the robotic search phase of the rescue missions, few works have been focused on the physical casualty extraction phase. In this work, we propose a mobile rescue robot that is capable of performing a safe casualty extraction routine. To perform this routine, this robot adopts a loco-manipulation approach. We have designed and built a mobile rescue robot platform called ResQbot as a proof of concept of the proposed system. We have conducted preliminary experiments using a sensorised human-sized dummy as a victim, to confirm that the platform is capable of performing a safe casualty extraction procedure.

Journal article

Herrero P, Bondia J, Giménez M, Oliver N, Georgiou Pet al., 2018,

Automatic adaptation of Basal insulin using sensor-augmented pump therapy

, Journal of Diabetes Science and Technology, Vol: 12, Pages: 282-294, ISSN: 1932-2968

BACKGROUND: People with insulin-dependent diabetes rely on an intensified insulin regimen. Despite several guidelines, they are usually impractical and fall short in achieving optimal glycemic outcomes. In this work, a novel technique for automatic adaptation of the basal insulin profile of people with diabetes on sensor-augmented pump therapy is presented. METHODS: The presented technique is based on a run-to-run control law that overcomes some of the limitations of previously proposed methods. To prove its validity, an in silico validation was performed. Finally, the artificial intelligence technique of case-based reasoning is proposed as a potential solution to deal with variability in basal insulin requirements. RESULTS: Over a period of 4 months, the proposed run-to-run control law successfully adapts the basal insulin profile of a virtual population (10 adults, 10 adolescents, and 10 children). In particular, average percentage time in target [70, 180] mg/dl was significantly improved over the evaluated period (first week versus last week): 70.9 ± 11.8 versus 91.1 ± 4.4 (adults), 46.5 ± 11.9 versus 80.1 ± 10.9 (adolescents), 49.4 ± 12.9 versus 73.7 ± 4.1 (children). Average percentage time in hypoglycemia (<70 mg/dl) was also significantly reduced: 9.7 ± 6.6 versus 0.9 ± 1.2 (adults), 10.5 ± 8.3 versus 0.83 ± 1.0 (adolescents), 10.9 ± 6.1 versus 3.2 ± 3.5 (children). When compared against an existing technique over the whole evaluated period, the presented approach achieved superior results on percentage of time in hypoglycemia: 3.9 ± 2.6 versus 2.6 ± 2.2 (adults), 2.9 ± 1.9 versus 2.0 ± 1.5 (adolescents), 4.6 ± 2.8 versus 3.5 ± 2.0 (children), without increasing the percentage time in hyperglycemia. CONCLUSION: The present study shows the potential of a novel technique to effectively adjust the basal insulin profile of a type 1 diab

Conference paper

Tavakoli A, Pardo F, Kormushev P, 2018,

Action branching architectures for deep reinforcement learning

, AAAI 2018, Publisher: AAAI

Discrete-action algorithms have been central to numerousrecent successes of deep reinforcement learning. However,applying these algorithms to high-dimensional action tasksrequires tackling the combinatorial increase of the numberof possible actions with the number of action dimensions.This problem is further exacerbated for continuous-actiontasks that require fine control of actions via discretization.In this paper, we propose a novel neural architecture fea-turing a shared decision module followed by several net-workbranches, one for each action dimension. This approachachieves a linear increase of the number of network outputswith the number of degrees of freedom by allowing a level ofindependence for each individual action dimension. To illus-trate the approach, we present a novel agent, called Branch-ing Dueling Q-Network (BDQ), as a branching variant ofthe Dueling Double Deep Q-Network (Dueling DDQN). Weevaluate the performance of our agent on a set of challeng-ing continuous control tasks. The empirical results show thatthe proposed agent scales gracefully to environments with in-creasing action dimensionality and indicate the significanceof the shared decision module in coordination of the dis-tributed action branches. Furthermore, we show that the pro-posed agent performs competitively against a state-of-the-art continuous control algorithm, Deep Deterministic PolicyGradient (DDPG).

Journal article

Chamberlain B, Levy-Kramer J, Humby C, Deisenroth MPet al., 2018,

Real-time community detection in full social networks on a laptop

, PLoS ONE, Vol: 13, ISSN: 1932-6203

For a broad range of research and practical applications it is important to understand the allegiances, communities and structure of key players in society. One promising direction towards extracting this information is to exploit the rich relational data in digital social networks (the social graph). As global social networks (e.g., Facebook and Twitter) are very large, most approaches make use of distributed computing systems for this purpose. Distributing graph processing requires solving many difficult engineering problems, which has lead some researchers to look at single-machine solutions that are faster and easier to maintain. In this article, we present an approach for analyzing full social networks on a standard laptop, allowing for interactive exploration of the communities in the locality of a set of user specified query vertices. The key idea is that the aggregate actions of large numbers of users can be compressed into a data structure that encapsulates the edge weights between vertices in a derived graph. Local communities can be constructed by selecting vertices that are connected to the query vertices with high edge weights in the derived graph. This compression is robust to noise and allows for interactive queries of local communities in real-time, which we define to be less than the average human reaction time of 0.25s. We achieve single-machine real-time performance by compressing the neighborhood of each vertex using minhash signatures and facilitate rapid queries through Locality Sensitive Hashing. These techniques reduce query times from hours using industrial desktop machines operating on the full graph to milliseconds on standard laptops. Our method allows exploration of strongly associated regions (i.e., communities) of large graphs in real-time on a laptop. It has been deployed in software that is actively used by social network analysts and offers another channel for media owners to monetize their data, helping them to continue to provide

Conference paper

Kanajar P, Caldwell DG, Kormushev P, 2017,

Climbing over large obstacles with a humanoid robot via multi-contact motion planning

, IEEE RO-MAN 2017: 26th IEEE International Symposium on Robot and Human Interactive Communication, Publisher: IEEE, Pages: 1202-1209

Incremental progress in humanoid robot locomotion over the years has achieved important capabilities such as navigation over flat or uneven terrain, stepping over small obstacles and climbing stairs. However, the locomotion research has mostly been limited to using only bipedal gait and only foot contacts with the environment, using the upper body for balancing without considering additional external contacts. As a result, challenging locomotion tasks like climbing over large obstacles relative to the size of the robot have remained unsolved. In this paper, we address this class of open problems with an approach based on multi-body contact motion planning guided through physical human demonstrations. Our goal is to make the humanoid locomotion problem more tractable by taking advantage of objects in the surrounding environment instead of avoiding them. We propose a multi-contact motion planning algorithm for humanoid robot locomotion which exploits the whole-body motion and multi-body contacts including both the upper and lower body limbs. The proposed motion planning algorithm is applied to a challenging task of climbing over a large obstacle. We demonstrate successful execution of the climbing task in simulation using our multi-contact motion planning algorithm initialized via a transfer from real-world human demonstrations of the task and further optimized.

Conference paper

Zhang F, Cully A, Demiris YIANNIS, 2017,

Personalized Robot-assisted Dressing using User Modeling in Latent Spaces

, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Publisher: IEEE, ISSN: 2153-0866

Robots have the potential to provide tremendous support to disabled and elderly people in their everyday tasks, such as dressing. Many recent studies on robotic dressing assistance usually view dressing as a trajectory planning problem. However, the user movements during the dressing process are rarely taken into account, which often leads to the failures of the planned trajectory and may put the user at risk. The main difficulty of taking user movements into account is caused by severe occlusions created by the robot, the user, and the clothes during the dressing process, which prevent vision sensors from accurately detecting the postures of the user in real time. In this paper, we address this problem by introducing an approach that allows the robot to automatically adapt its motion according to the force applied on the robot's gripper caused by user movements. There are two main contributions introduced in this paper: 1) the use of a hierarchical multi-task control strategy to automatically adapt the robot motion and minimize the force applied between the user and the robot caused by user movements; 2) the online update of the dressing trajectory based on the user movement limitations modeled with the Gaussian Process Latent Variable Model in a latent space, and the density information extracted from such latent space. The combination of these two contributions leads to a personalized dressing assistance that can cope with unpredicted user movements during the dressing while constantly minimizing the force that the robot may apply on the user. The experimental results demonstrate that the proposed method allows the Baxter humanoid robot to provide personalized dressing assistance for human users with simulated upper-body impairments.

Conference paper

Rakicevic N, Kormushev P, 2017,

Efficient Robot Task Learning and Transfer via Informed Search in Movement Parameter Space

, Workshop on Acting and Interacting in the Real World: Challenges in Robot Learning, 31st Conference on Neural Information Processing Systems (NIPS 2017)

Conference paper

Tavakoli A, Pardo F, Kormushev P, 2017,

Action Branching Architectures for Deep Reinforcement Learning

, Deep Reinforcement Learning Symposium, 31st Conference on Neural Information Processing Systems (NIPS 2017)

Conference paper

Salimbeni H, Deisenroth M, 2017,

Doubly stochastic variational inference for deep Gaussian processes

, NIPS 2017, Publisher: Advances in Neural Information Processing Systems (NIPS), Pages: 4589-4600, ISSN: 1049-5258

Gaussian processes (GPs) are a good choice for function approximation as theyare flexible, robust to over-fitting, and provide well-calibrated predictiveuncertainty. Deep Gaussian processes (DGPs) are multi-layer generalisations ofGPs, but inference in these models has proved challenging. Existing approachesto inference in DGP models assume approximate posteriors that forceindependence between the layers, and do not work well in practice. We present adoubly stochastic variational inference algorithm, which does not forceindependence between layers. With our method of inference we demonstrate that aDGP model can be used effectively on data ranging in size from hundreds to abillion points. We provide strong empirical evidence that our inference schemefor DGPs works well in practice in both classification and regression.

Conference paper

Eleftheriadis S, Nicholson TFW, Deisenroth MP, Hensman Jet al., 2017,

Identification of Gaussian Process State Space Models

, Advances in Neural Information Processing Systems (NIPS) 2017, Publisher: Neural Information Processing Systems Foundation, Inc., Pages: 5310-5320, ISSN: 1049-5258

The Gaussian process state space model (GPSSM) is a non-linear dynamicalsystem, where unknown transition and/or measurement mappings are described byGPs. Most research in GPSSMs has focussed on the state estimation problem.However, the key challenge in GPSSMs has not been satisfactorily addressed yet:system identification. To address this challenge, we impose a structuredGaussian variational posterior distribution over the latent states, which isparameterised by a recognition model in the form of a bi-directional recurrentneural network. Inference with this structure allows us to recover a posteriorsmoothed over the entire sequence(s) of data. We provide a practical algorithmfor efficiently computing a lower bound on the marginal likelihood using thereparameterisation trick. This additionally allows arbitrary kernels to be usedwithin the GPSSM. We demonstrate that we can efficiently generate plausiblefuture trajectories of the system we seek to model with the GPSSM, requiringonly a small number of interactions with the true system.

Conference paper

Rafiq Y, Dickens L, Russo A, Bandara AK, Yang M, Stuart A, Levine M, Calikli G, Price BA, Nuseibeh Bet al., 2017,

Learning to share: engineering adaptive decision-support for online social networks

, 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), Publisher: IEEE, Pages: 280-285, ISSN: 1527-1366

Some online social networks (OSNs) allow users to define friendship-groups as reusable shortcuts for sharing information with multiple contacts. Posting exclusively to a friendship-group gives some privacy control, while supporting communication with (and within) this group. However, recipients of such posts may want to reuse content for their own social advantage, and can bypass existing controls by copy-pasting into a new post; this cross-posting poses privacy risks. This paper presents a learning to share approach that enables the incorporation of more nuanced privacy controls into OSNs. Specifically, we propose a reusable, adaptive software architecture that uses rigorous runtime analysis to help OSN users to make informed decisions about suitable audiences for their posts. This is achieved by supporting dynamic formation of recipient-groups that benefit social interactions while reducing privacy risks. We exemplify the use of our approach in the context of Facebook.

Conference paper

Kamthe S, Deisenroth MP, 2017,

Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control

, International Conference on Artificial Intelligence and Statistics

Trial-and-error based reinforcement learning (RL) has seen rapid advancementsin recent times, especially with the advent of deep neural networks. However,the majority of autonomous RL algorithms either rely on engineered features ora large number of interactions with the environment. Such a large number ofinteractions may be impractical in many real-world applications. For example,robots are subject to wear and tear and, hence, millions of interactions maychange or damage the system. Moreover, practical systems have limitations inthe form of the maximum torque that can be safely applied. To reduce the numberof system interactions while naturally handling constraints, we propose amodel-based RL framework based on Model Predictive Control (MPC). Inparticular, we propose to learn a probabilistic transition model using GaussianProcesses (GPs) to incorporate model uncertainties into long-term predictions,thereby, reducing the impact of model errors. We then use MPC to find a controlsequence that minimises the expected long-term cost. We provide theoreticalguarantees for the first-order optimality in the GP-based transition modelswith deterministic approximate inference for long-term planning. The proposedframework demonstrates superior data efficiency and learning rates compared tothe current state of the art.

Journal article

Arulkumaran K, Deisenroth MP, Brundage M, Bharath AAet al., 2017,

A brief survey of deep reinforcement learning

, IEEE Signal Processing Magazine, Vol: 34, Pages: 26-38, ISSN: 1053-5888

Deep reinforcement learning (DRL) is poised to revolutionize the field of artificial intelligence (AI) and represents a step toward building autonomous systems with a higherlevel understanding of the visual world. Currently, deep learning is enabling reinforcement learning (RL) to scale to problems that were previously intractable, such as learning to play video games directly from pixels. DRL algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of RL, then progress to the main streams of value-based and policy-based methods. Our survey will cover central algorithms in deep RL, including the deep Q-network (DQN), trust region policy optimization (TRPO), and asynchronous advantage actor critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via RL. To conclude, we describe several current areas of research within the field.

Conference paper

Chabierski P, Russo A, Law M, Broda Ket al., 2017,

Machine comprehension of text using combinatory categorial grammar and answer set programs

, COMMONSENSE 2017, Publisher: CEUR Workshop Proceedings, ISSN: 1613-0073

We present an automated method for generating Answer Set Programs from narratives written in English and demonstrate how such a representation can be used to answer questions about text. The proposed approach relies on a transparent interface between the syntax and semantics of natural language provided by Combinatory Categorial Grammars to translate text into Answer Set Programs, hence creating a knowledge base that, together with background knowledge, can be queried.

Conference paper

Baroni P, Comini G, Rago A, Toni Fet al., 2017,

Abstract Games of Argumentation Strategy and Game-Theoretical Argument Strength

, PRIMA, Publisher: Springer, Pages: 403-419, ISSN: 0302-9743

We define a generic notion of abstract games of argumentation strategy for (attack-only and bipolar) argumentation frameworks, which are zero-sum games whereby two players put forward sets of arguments and get a reward for their combined choices. The value of these games, in the classical game-theoretic sense, can be used to define measures of (quantitative) game-theoretic strength of arguments, which are different depending on whether either or both players have an “agenda” (i.e. an argument they want to be accepted). We show that this general scheme captures as a special instance a previous proposal in the literature (single agenda, attack-only frameworks), and seamlessly supports the definition of a spectrum of novel measures of game-theoretic strength where both players have an agenda and/or bipolar frameworks are considered. We then discuss the applicability of these instances of game-theoretic strength in different contexts and analyse their basic properties.

Conference paper

Rago A, Toni F, 2017,

Quantitative Argumentation Debates with Votes for Opinion Polling

, PRIMA, Publisher: Springer, Pages: 369-385, ISSN: 0302-9743

Opinion polls are used in a variety of settings to assess the opinions of a population, but they mostly conceal the reasoning behind these opinions. Argumentation, as understood in AI, can be used to evaluate opinions in dialectical exchanges, transparently articulating the reasoning behind the opinions. We give a method integrating argumentation within opinion polling to empower voters to add new statements that render their opinions in the polls individually rational while at the same time justifying them. We then show how these poll results can be amalgamated to give a collectively rational set of voters in an argumentation framework. Our method relies upon Quantitative Argumentation Debate for Voting (QuAD-V) frameworks, which extend QuAD frameworks (a form of bipolar argumentation frameworks in which arguments have an intrinsic strength) with votes expressing individuals’ opinions on arguments.

Journal article

Zheng JX, Pawar S, Goodman DFM, 2017,

Graph Drawing by Stochastic Gradient Descent

A popular method of force-directed graph drawing is multidimensional scalingusing graph-theoretic distances as input. We present an algorithm to minimizeits energy function, known as stress, by using stochastic gradient descent(SGD) to move a single pair of vertices at a time. Our results show that SGDcan reach lower stress levels faster and more consistently than majorization,without needing help from a good initialization. We then show how the uniqueproperties of SGD make it easier to produce constrained layouts than previousapproaches. We also show how SGD can be directly applied within the sparsestress approximation of Ortmann et al. [1], making the algorithm scalable up tolarge graphs.

Conference paper

Bao Z, Cyras K, Toni F, 2017,

ABAplus: Attack Reversal in Abstract and Structured Argumentation with Preferences

, PRIMA 2017: The 20th International Conference on Principles and Practice of Multi-Agent Systems, Publisher: Springer Verlag, ISSN: 0302-9743

We present ABAplus, a system that implements reasoningwith the argumentation formalism ABA+. ABA+ is a structured argumentationformalism that extends Assumption-Based Argumentation(ABA) with preferences and accounts for preferences via attack reversal.ABA+ also admits as instance Preference-based Argumentation whichaccounts for preferences by reversing attacks in abstract argumentation(AA). ABAplus readily implements attack reversal in both AA and ABAstylestructured argumentation. ABAplus affords computation, visualisationand comparison of extensions under five argumentation semantics.It is available both as a stand-alone system and as a web application.

Search or filter publications

Filter by type:

Filter by year:

Results

Search results

Data-efficient reinforcement learning with probabilistic model predictive control

Action branching architectures for deep reinforcement learning

Efficient Robot Task Learning and Transfer via Informed Search in Movement Parameter Space

Action Branching Architectures for Deep Reinforcement Learning

Doubly stochastic variational inference for deep Gaussian processes

Identification of Gaussian Process State Space Models

Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control

Machine comprehension of text using combinatory categorial grammar and answer set programs

Graph Drawing by Stochastic Gradient Descent

Contact us