On Sunday 27.6.2021 at 12:00, we will host Sagie Benaim fromTel-Aviv university.
Please see the details below.
On Sunday 27.6.2021 at 12:00, we will host Sagie Benaim fromTel-Aviv university.
Please see the details below.
———————————–
Hey everyone,
On Sunday 30.5.2021 we will host Yedid Hoshen from the Hebrew University.
Please see the details below.
See you then,
Roni
——
Zoom:
https://us02web.zoom.us/j/
Meeting ID: 847 5832 0608
Passcode: 942440
Title:
Scaling-up Disentanglement
Abstract:
Disentangling data into attributes that are meaningful to humans is a key task in machine learning. Here, we tackle the case where supervision is available for some of the attributes but not for others. Most large-scale disentanglement methods use adversarial training which typically yields imperfect results. We will first describe a simple, generic non-adversarial method, LORD, which outperforms the corresponding GAN-based methods. The secret sauce of this method is a latent optimization-based information bottleneck. Unfortunately, LORD is unable to scale-up to high-dimensional multi-modal disentanglement tasks such as image translation. We therefore present an advanced framework, OverLORD which overcomes these issues. We show that this flexible framework covers multiple image translation settings e.g. attribute manipulation, pose-appearance translation, segmentation-guided synthesis and shape-texture transfer. In an extensive evaluation, we present significantly better disentanglement with higher translation quality and greater output diversity than state-of-the-art methods.
Hey everyone,After the holiday, on Sunday 4.4.2021 at 12:00, we will host Ronen Basri from Weizman Institute.Ronen will present his work on the connection between deep neural networks and kernel methods.Meeting ID: 880 0879 7212Passcode: 416934Title: On the Connection between Deep Neural Networks and Kernel MethodsAbstract: Recent theoretical work has shown that massively overparameterized neural networks are equivalent to kernel regressors that use Neural Tangent Kernels (NTKs). Experiments indicate that these kernel methods perform similarly to real neural networks. My work in this subject aims to better understand the properties of NTK and relate them to properties of real neural networks. In particular, I will argue that for input data distributed uniformly on the sphere NTK favors low frequency predictions over high frequency ones, potentially explaining why overparameterized networks do not overfit their training data. I will further discuss the behavior of NTK when data is distributed nonuniformly, and finally show that NTK is tightly related to the classical Laplace kernel, which has a simple closed form. Our results suggest that much insight about neural networks can be obtained from analysis of NTK.
Learning Club – Joanathan Berant
Learning Club – Roi Reichart
On Sunday 13.6.2021 at 12:00, we will host Roi Reichart from the Technion.
Zoom:
https://us02web.zoom.us/j/
Meeting ID: 838 6634 9257
Passcode: 754963
Title:
CausaLM: Causal Model Explanation Through Counterfactual Language Models
Abstract:
Understanding predictions made by deep neural networks is notoriously difficult, but also crucial to their dissemination. As all ML-based methods, they are as good as their training data, and can also capture unwanted biases. While there are tools that can help understand whether such biases exist, they do not distinguish between correlation and causation, and might be ill-suited for text-based models and for reasoning about high level language concepts. A key problem of estimating the causal effect of a concept of interest on a given model is that this estimation requires the generation of counterfactual examples, which is challenging with existing generation technology. To bridge that gap, we propose CausaLM, a framework for producing causal model explanations using counterfactual language representation models. Our approach is based on fine-tuning of deep contextualized embedding models with auxiliary adversarial tasks derived from the causal graph of the problem. Concretely, we show that by carefully choosing auxiliary adversarial pre-training tasks, language representation models such as BERT can effectively learn a counterfactual representation for a given concept of interest, and be used to estimate its true causal effect on model performance. A byproduct of our method is a language representation model that is unaffected by the tested concept, which can be useful in mitigating unwanted bias ingrained in the data.
Learning Club – Yair Carmon
Zoom:
https://us02web.zoom.us/j/
Meeting ID: 848 4708 6771
Passcode: 607548
Title:
Accuracy on the line: on the predictability of out-of-distribution generalization
Abstract:
To make machine learning reliable, we must understand generalization to out-of-distribution environments (unseen during training) in addition to the in-distribution generalization measured by standard test sets. We show empirically that, to very good approximation and to a large extent, out-of-distribution performance is in fact a simple function of in-distribution performance – our experiments span a wide range of models, computer vision datasets and types of distributions shifts. We also present a number of exceptions to this relationship and test hypotheses for their causes. Finally, we show how a simple Gaussian generative model exhibits several of the phenomena we observe, and discuss the scope and implications of our findings.
Joint work with John Miller, Rohan Tauri, Aditi Raghunathan, Shiori Sagawa, Pang Wei Koh, Vaishaal Shankar, Percy Liang and Ludwig Schmidt.
BIU Learning Club 23.5.2021 — Yonathan Efroni — Confidence-Budget Matching for Sequential Budgeted Learning
Yonathan Efroni from Microsoft research Israel/New York.
Zoom:
https://us02web.zoom.us/j/84891223876?pwd=UGcyZU8rZHY2NGpTMFNuUFIzajVRUT09
Meeting ID: 848 9122 3876
Passcode: 750812
Title:
Confidence-Budget Matching for Sequential Budgeted Learning
Abstract:
A core element in decision-making under uncertainty is the feedback on the quality of the performed actions. However, in many applications, such feedback is restricted. For example, in recommendation systems, repeatedly asking the user to provide feedback on the quality of recommendations will annoy them. In this work, we formalize decision-making problems with querying budget, where there is a (possibly time-dependent) hard limit on the number of reward queries allowed. Specifically, we consider multi-armed bandits, linear bandits, and reinforcement learning problems. We start by analyzing the performance of `greedy’ algorithms that query a reward whenever they can. We show that in fully stochastic settings, doing so performs surprisingly well, but in the presence of any adversity, this might lead to linear regret. To overcome this issue, we propose the Confidence-Budget Matching (CBM) principle that queries rewards when the confidence intervals are wider than the inverse square root of the available budget. We analyze the performance of CBM based algorithms in different settings and show that they perform well in the presence of adversity in the contexts, initial states, and budgets.
Joint work with Nadav Merlis, Aadirupa Saha and Shie Mannor (to be in ICML 2021).
on Sunday 2.5.2021at 12:00, we will host Raja Giryes from Tel-Aviv University.
Title:
Zoom:
https://us02web.zoom.us/j/
Meeting ID: 850 9895 5112
Passcode: 109415
Learning Club – Asaf Noy
Title:
A Convergence Theory Towards Practical Over-parameterized Deep Neural Networks
Abstract:
Deep neural networks’ remarkable ability to correctly fit training data when optimized by gradient-based algorithms is yet to be fully understood. Recent theoretical results explain the convergence for ReLU networks that are wider than those used in practice by orders of magnitude. In this work, we take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time. We show that convergence to a global minimum is guaranteed for networks with widths quadratic in the sample size and linear in their depth at a time logarithmic in both. Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size. This construction can be viewed as a novel technique to accelerate training, while its tight finite-width equivalence to Neural Tangent Kernel (NTK) suggests it can be utilized to study generalization as well.