- This event has passed.
Analyzing Optimization and Generalization in Deep Learning via Trajectories of Gradient Descent – Talk by Nadav Cohen (TAU)
November 24 @ 12:00 pm - 1:00 pm IST
Nadav Cohen from TAU
Location: Gonda building (901), room 102.
Time: Sunday Nov 24th, 12:00 AM — 13:00 AM.
Title: Analyzing Optimization and Generalization in Deep Learning via Trajectories of Gradient Descent
Abstract: Understanding deep learning calls for addressing the questions of: (i) optimization — the effectiveness of simple gradient-based algorithms in solving neural network training programs that are non-convex and thus seemingly difficult; and (ii) generalization — the phenomenon of deep learning models not overfitting despite having many more parameters than examples to learn from. Existing analyses of optimization and/or generalization typically adopt the language of classical learning theory, abstracting away many details on the setting at hand. In this talk I will argue that a more refined perspective is in order, one that accounts for the specific trajectories taken by the optimizer. I will then demonstrate a manifestation of this approach, analyzing the trajectories of gradient descent over linear neural networks. We will derive what is, to the best of my knowledge, the most general guarantee to date for efficient convergence to global minimum of a gradient-based algorithm training a deep network. Moreover, in stark contrast to conventional wisdom, we will see that sometimes, adding (redundant) linear layers to a classic linear model significantly accelerates gradient descent, despite the introduction of non-convexity. Finally, we will show that such addition of layers induces an implicit bias towards low rank, and by this explain generalization of deep linear neural networks for the classic problem of low rank matrix recovery.
Works covered in this talk were in collaboration with Sanjeev Arora, Noah Golowich, Elad Hazan, Wei Hu and Yuping Luo.