- This event has passed.
What’s wrong with Hebrew NLP? And How to Make it Right – Talk by Prof. Reut Tsarfaty
March 5, 2020 @ 11:00 am - 12:00 pm IST
Talk at AI For Human Languages (free student tickets available for biu students – talk to Reut)
Title: What’s wrong with Hebrew NLP? And How to Make it Right
Speaker: Prof. Reut Tsarfaty
The ability to automatically process massive volumes of unstructured texts and turn them into structured information is key to many success stories in AI, BI and Data Science (DS). Furthermore, day-to-day applications routinely used for, e.g., information extraction, machine translation, question answering, dialogue systems, digital assistants and more, are strictly based on a Natural Language Processing (NLP) core.
NLP engines nowadays focus mainly on English. Processing Modern Hebrew texts presents serious challenges to standard NLP methods, in turn undermining the capacity to break any new boundaries in DS, BI or AI targeting Hebrew. Hebrew violates a basic assumption concerning the NLP pipeline, namely, the input signal is extremely ambiguous, and words essentially unknown in advance. For example, take an input word “הקפה”; it is unknown whether it is to be processed as the word “orbit”, as the sequence “the coffee”, as “her perimeter”, etc. This extreme ambiguity of the input signal causes a serious performance degradation to any downsteam AI/BI/DS task.
In this talk I introduce novel algorithms to tackle this extreme-ambiguity problem. Our solution is based on a joint model interleaving the different tasks in the NLP pipeline, utilizing a single objective function, joint training and joint inference to solve them all at once. Our experiments show that the resulting parsing algorithm obtains state-of-the-art performance for key NLP tasks (segmentation, tagging, lemmatization, dependency parsing), better than the state-of-the-art for any of the individual components. I will demo our open-source Hebrew parsing suite, and present our additional efforts to expand Hebrew NLP capacities further.