Loading Events

« All Events

  • This event has passed.

BIU learning club – Yuval Pinter – When Language Models Meet Words

April 23, 2023 @ 12:00 pm - 1:00 pm IDT


Engineering building (1103), room 329


When Language Models Meet Words


Over the last few years, deep neural models have taken over the field of natural language processing (NLP), brandishing great improvements on many of its sequence-level tasks. But the end-to-end nature of these models makes it hard to figure out whether the way they represent individual words aligns with how language builds itself from the bottom up, or how lexical changes in register and domain can affect the untested aspects of such representations, or which phenomena can be modeled by units smaller than the word.

In this talk, I will present NYTWIT, a dataset created to challenge large language models (LLMs) at the lexical level, tasking them with identification of processes leading to the formation of novel English words, as well as with segmentation and recovery of the specific subclass of lexical blends, demonstrating the ways in which subword-tokenized LLMs fail to analyze them. I will then present XRayEmb, a method which alleviates the hardships of processing these novelties by fitting a character-level encoder to existing models’ subword tokenizers; and SaGe, a subword tokenizer that incorporates context into the vocabulary creation objective.

Short Bio:

Yuval Pinter is a Senior Lecturer in the Department of Computer Science at Ben-Gurion University of the Negev, focusing on natural language processing as PI of the MeLeL lab. Yuval got his PhD at the Georgia Institute of Technology School of Interactive Computing as a Bloomberg Data Science PhD Fellow. Prior to this, he worked as a Research Engineer at Yahoo Labs and as a Computational Linguist at Ginger Software, and obtained an MA in Linguistics and a BSc in CS and Mathematics, both from Tel Aviv University. Yuval blogs (in Hebrew) about language matters on Dagesh Kal.


April 23, 2023
12:00 pm - 1:00 pm IDT
Event Categories:

Leave a Comment