Learning Club – Page 2 – The Data Science Institute at Bar-Ilan University

Learning to Plan and Planning to Learn – Learning Club talk by Aviv Tamar

Learning Club BIU – Aviv Tamar

Title: Learning to Plan and Planning to Learn

Abstract: Two main paradigms for solving sequential decision making problems are planning – searching through possible future outcomes to achieve a goal, and reinforcement learning (RL) – learning reactive policies through trial and error. This talk focuses on algorithmic interfaces between planning and RL.

We start by asking about the capability of deep networks to learn planning computations, and present the Value Iteration Network, a type of differentiable planner that can be used within model-free RL.

Planning problems, however, are goal based, and a planner must provide a solution for every possible goal. Standard RL, on the other hand, is based on a single goal formulation (the reward function), and making RL work in a multi-goal setting is challenging. We introduce Sub-Goal Trees (SGTs) – a new RL formulation based on a different first principle – the all-pairs shortest path problem on graphs. We show that for deterministic multi-goal problems, SGTs are provably better at handling approximation errors than conventional RL (O(NlogN) vs. O(N^2)), and we demonstrate empirical results on learning motion planning for a 7 DoF robot using deep neural networks.

Finally, we ask how an RL agent can plan to best explore/exploit its environment, as cast in the following problem: given the training logs of N conventional RL agents, trained on N different tasks, train an agent that can quickly maximize reward in a new, unseen task from the same task distribution. We take a Bayesian RL view, and seek to learn a Bayes-optimal policy from the offline data. However, the offline nature of the problem entails identifiability challenges, for which we propose several solutions and a practical algorithm. On a range of challenging tasks, we demonstrate learning of near-optimal exploration/exploitation behavior. Remarkably, the learned behavior can be qualitatively different from the behavior of any RL agent in the data.

Learning Club talk by Roni Livni

Time: Sunday Apr 11th, 2021 12:00 PM — 13:00 PM.

The recording of Roi’s talk:

https://us02web.zoom.us/rec/play/7uL_4UkcH6PhgKWsSEn_lc3QOLqEuQOu7ax08fgnj93YgVhkfMM9_9f_2VbfaoqJFif7tpFffh4V6mt-.-MC-XWpEjtb97tvZ?continueMode=true

On Sunday 11.4.2021 at 12:00, we will host Roi Livni from Tel-Aviv University.

Roi will present his work on: “Regularization, what is it good for?”.

Zoom:
https://us02web.zoom.us/j/83325342820?pwd=cUcxR2lJMzloL0pwNHJUWG0wNTQvUT09

Meeting ID: 833 2534 2820
Passcode: 449581

Title: Regularization, what is it good for?

Abstract:

Regularization is considered a key-concept in the explanation and analysis of successful learning algorithms. In contrast, modern machine learning practice often suggests invoking highly expressive models that can completely interpolate the data with far more free parameters than examples. To resolve this alleged contradiction the notion of implicit bias, or implicit regularization, has been suggested as a means to explain the surprising generalization ability of modern-day overparameterized learning algorithms. In this talk, we will revisit this paradigm in one of the most well-studied and well-understood models for theoretical machine learning: Stochastic Convex Optimization (SCO).

We begin by discussing new results that highlight the role of the optimization algorithm for learning. We provide a new result that separates between the generalization performance of stochastic gradient descent (SGD) and of full-batch gradient descent (GD), as well as regularized GD. We show that while all algorithms optimize the empirical loss at the same rate, their generalization performance can be significantly different. We next discuss the implicit bias of Stochastic Gradient Descent (SGD) in this context and ask if the implicit bias accounts for the success of SGD to generalize. We provide several constructions that point out to significant difficulties in providing a comprehensive explanation of an algorithm’s generalization performance by solely arguing about its implicit regularization properties.

On the one hand, these results demonstrate the importance of the optimization algorithm in generalization. On the other hand, they also hint that the reason or cause for the different performances may not necessarily be explained or understood via investigations of the algorithm’s bias.

Based on joint works with: Idan Amir, Assaf Dauber, Meir Feder, Tomer Koren.

New Capabilities in Unsupervised Image to Image Translation by Sagie Benaim, TAU (DSI Learning Club talk)

Feb. 24th 2019, Sun. 12:00 , Sagie Benaim (webpage).

Tel-Aviv University (PhD Student).

Location: Gonda Building (901), Room 101.

New Capabilities in Unsupervised Image to Image Translation

Abstract:

In Unsupervised Image to Image Translation, we are given an unmatched set of images from domain A and domain B, and our task is to generate, given an image from domain A, its analogous image in domain B.

In the first part of the talk, I’ll describe a new capability which allows us to perform such translation, where only a single image is present in domain A. Specifically, given a single image x from domain A and a set of images from domain B, our task is to generate the analogous of x in B. We argue that this task could be a key AI capability that underlines the ability of cognitive agents to act in the world and present empirical evidence that the existing unsupervised domain translation methods fail on this task.

In the second part of the talk, I’ll describe a new capability which allows us to disentangle the “common” and “domain-specific” information of domains A and B. This allows us to generate, given a sample a in A and a sample b in B, an image in domain B which contains the “common” information of a and “domain-specific” information of b. For example, ignoring occlusions, B can be “people with glasses”, A can be “people without”. The “common” information is “faces” where the “domain-specific” information of B is “glasses”. At test time, we add the glasses of person in domain B to any person in domain A.

Lastly, time permitting, I’ll describe the application of these techniques in the context of Singing Voice Separation, where the training data contains a set of samples of mixed music (singing and instrumental) and an unmatched set of instrumental music.