Loading Events

« All Events

  • This event has passed.

Learning to Plan and Planning to Learn – Learning Club talk by Aviv Tamar

April 18, 2021 @ 12:00 pm - 1:00 pm IDT

Learning Club BIU – Aviv Tamar

Title: Learning to Plan and Planning to Learn

Abstract: Two main paradigms for solving sequential decision making problems are planning – searching through possible future outcomes to achieve a goal, and reinforcement learning (RL) – learning reactive policies through trial and error. This talk focuses on algorithmic interfaces between planning and RL.

We start by asking about the capability of deep networks to learn planning computations, and present the Value Iteration Network, a type of differentiable planner that can be used within model-free RL.

Planning problems, however, are goal based, and a planner must provide a solution for every possible goal. Standard RL, on the other hand, is based on a single goal formulation (the reward function), and making RL work in a multi-goal setting is challenging. We introduce Sub-Goal Trees (SGTs) – a new RL formulation based on a different first principle – the all-pairs shortest path problem on graphs. We show that for deterministic multi-goal problems, SGTs are provably better at handling approximation errors than conventional RL (O(NlogN) vs. O(N^2)), and we demonstrate empirical results on learning motion planning for a 7 DoF robot using deep neural networks.

Finally, we ask how an RL agent can plan to best explore/exploit its environment, as cast in the following problem: given the training logs of N conventional RL agents, trained on N different tasks, train an agent that can quickly maximize reward in a new, unseen task from the same task distribution. We take a Bayesian RL view, and seek to learn a Bayes-optimal policy from the offline data. However, the offline nature of the problem entails identifiability challenges, for which we propose several solutions and a practical algorithm. On a range of challenging tasks, we demonstrate learning of near-optimal exploration/exploitation behavior. Remarkably, the learned behavior can be qualitatively different from the behavior of any RL agent in the data.



April 18, 2021
12:00 pm - 1:00 pm IDT
Event Categories:
Event Tags:

Leave a Comment