- This event has passed.
BIU learning club – Students’ talks
March 26, 2023 @ 12:00 pm - 2:00 pm IDT
On Sunday 26.03.23, at 12:00 PM, we will have our third session of students’ presentations. In this session, four students from BIU will present their work. Note that, unlike regular learning club meetings, this meeting will last 2 hours, and will include lunch. It will take place at the Engineering building (1103) in room 329. Please see the schedule below.
12:00 – 12:15
Presenter: Aviv Navon
Lab Head: Gal Chechik & Ethan Fetaya
Title: Equivariant Architectures for Learning in Deep Weight Spaces
Abstract: Designing machine learning architectures for processing neural networks in their raw weight matrix form is a newly introduced research direction. Unfortunately, the unique symmetry structure of deep weight spaces makes this design very challenging. If successful, such architectures would be capable of performing a wide range of intriguing tasks, from adapting a pre-trained network to a new domain to editing objects represented as functions (INRs or NeRFs). As a first step towards this goal, we present here a novel network architecture for learning in deep weight spaces. It takes as input a concatenation of weights and biases of a pre-trained MLP and processes it using a composition of layers that are equivariant to the natural permutation symmetry of the MLP’s weights: Changing the order of neurons in intermediate layers of the MLP does not affect the function it represents. We provide a full characterization of all affine equivariant and invariant layers for these symmetries and show how these layers can be implemented using three basic operations: pooling, broadcasting, and fully connected layers applied to the input in an appropriate manner. We demonstrate the effectiveness of our architecture and its advantages over natural baselines in a variety of learning tasks.
12:15 – 12:30
Presenter: Yochai Yemini
Lab Head: Sharon Gannot & Ethan Fetaya
Title: Generating Speech from Silent Videos with Diffusion Models
Abstract: In this work, we propose a novel lip-to speech generative model. Given a silent talking face video, we use classifier-free guidance to condition a diffusion model on the video frames to generate a corresponding melspectrogram. Since syllables uttered in a silent talking face video can be ambiguous, a lip reading model is deployed to infer the words likely to be spoken in the video. An automatic speech recognition (ASR) model is then utilised to ground the generated speech to the extracted text using classifier guidance. We show that while previous techniques managed to generate speech signals only for datasets with a limited vocabulary and number of speakers, the proposed model achieves excellent results on LRS3, an in the wild dataset with 4,000 identities and an unrestricted vocabulary.
12:30 – 12:45
Presenter: Yaniv Zimmer
Lab Head: Oren Glickman
Title: Investigating Hyperspectral Band Selection Methods for Deep Learning Applications
Abstract: In this lecture I will talk about hyperspectral band selection methods for deep learning applications. Band selection is the process of finding optimal spectrum subset of the full hyperspectral data, with regards to optimal downstream task performance. I’ll survey prominent band selection methods, known gaps in there performance evaluation, their connection to deep learning applications and future research directions in our study.
12:45 – 13:00
Presenter: Aviad Eisenberg
Lab Head: Sharon Gannot
Title: A two-stage speaker extraction algorithm under adverse acoustic conditions using a single-microphone
Abstract: In this work, we present a two-stage method for speaker extraction under reverberant and noisy conditions. Given a reference signal of the desired speaker, the clean, but the still reverberant, desired speaker is first extracted from the noisy-mixed signal. In the second stage, the extracted signal is further enhanced by joint dereverberation and residual noise and interference reduction. The proposed architecture comprises two sub-networks, one for the extraction task and the second for the dereverberation task. We present a training strategy for this architecture and show that the performance of the proposed method is on par with other ac{SOTA} methods when applied to the WHAMR! dataset. Furthermore, we present a new dataset with more realistic adverse acoustic conditions and show that our method outperforms the competing methods when applied to this dataset as well.
13:00 – 14:00
Lunch & Mingling