Open Master Thesis Position in Networks and NLP

We are seeking a CS or Math Master Student to do his/her thesis on a funded project on .

The Network Properties of Word Embeddings project is funded by Data Science Institute.

Participating Faculty members:

Prof. Reuven Cohen, Department of Mathematics,

Prof. Yoav Goldberg, Department of Computer Science,

Dr. Simcha (Simi) Haber, Department of Mathematics,

Description of research work:

This work combines two strength areas of Bar-Ilan University – Network Science and Natural Language Processing – to study the network properties of word embeddings.

Word embedding — the mapping of words into numerical vector spaces — has seen tremendous success in numerous NLP tasks in recent years ([1]). Multiple methods for learning word embeddings from textual corpora have also be proposed. The resulting representations typically preserve semantic, syntactic and other properties of words. Once represented as vectors in an Euclidean space of dimension n (typically n is in the range of 50 to 500), it is natural to consider the implied graph or network of the words. In such a graph, two words are adjacent based on their distance as vectors or other similarity metric. Graphs and networks are widely used in Natural Language Processing including word graphs (e.g. [2], [3]), and it has been shown that word graphs share statistical features as other complex networks ([5]). However, the network properties of word embedding graphs have not been investigated until now.

The work will involve analyzing and comparing network properties of various word-embedding based networks – across word embedding algorithms and parameters, corpora and language and in addition study the relationship between these network properties and linguistics properties. When analyzing and comparing the word networks we will look at common network graph properties and algorithms – such as various centrality measures, clustering algorithms, page rank etc ([4]). We will also investigate properties of individual nodes (words), edges (relationships between words) and components (groups of words).

There are multiple word embedding techniques commonly used for NLP applications – e.g. Word2Vec or GloVe. The techniques vary in the underlying algorithm, the objective function and the context used for words in the text. Other parameters such as window size or embedding size also have an effect on the resulting word vectors. We will compare the networks resulting from the different algorithms and parameters. 

Different textual dataset (corpora) result in different embeddings vectors as words have different usage patterns in, say, a general and quite formal corpus such as wikipedia to, say, Twitter tweets. Naturally, we will compare the different network properties across languages as well. In all cases (algorithm, corpus and language), universal network properties will be searched (e.g. whether certain centrality or other measures indifferent to language or algorithm). For measure or properties that do differ across different embedding graphs we will investigate if one can attribute or tie the differences to linguistic properties (e.g., does a certain measure correlate to the morphological richness of a language or do languages from same linguistic families have similar network properties).

This interdisciplinary work brings together experts from two different data science fields (NLP and Network Science) who will co-advise an MSc student. We expect to publish the results of this project in top venues of both fields of study.

[1] Speech and Language Processing (3rd ed.). Dan Jurafsky and James H. Martin.2019. Chapter 6: Vector Semantics and Embeddings.

[2] Graph-Based Methods for Natural Language Processing and Understanding – A Survey and Analysis. Mills, Michael T. and Bourbakis, Nikolaos G. IEEE Transactions on Systems, Man, and Cybernetics: Systems. (2014).

[3] A survey of graphs in natural language processing. NASTASE, V., MIHALCEA, R., & RADEV, D. Natural Language Engineering, 21(5), 665-698. (2015).

[4] Graph Theory. Adrian Bondy  And U.S.R. Murty. Springer, 2017.

[5] The small world of human language. Ferreri and Sole. Proceedings of the Royal Society B: Biological Sciences 268(1482):2261-5. 2001.

If interested please contact Dr. Simi)

MA Student Required for Research Project about Sign Languages

Seeking to recruit an MA student with python/data-science skills for work on a funded project led by Dr. Rose Stamp in the English Literature & Linguistics department. The project will include analysis of sign language and motion capture data using machine learning and python scripts.

50% position. Work may result in academic publications.

For details please contact Dr. Rose Stamp –

CFP – Machine Lawyering Conference: Human Sovereignty and Machine Efficiency in the Law

Call For Papers
Machine Lawyering Conference: Human Sovereignty and Machine Efficiency in the Law

14 Jan 2021 – 16 Jan 2021, The Chinese University of Hong Kong, Hong Kong (or virtual)

Automation is efficient: data is gathered, processed and used to do things people cannot. Yet we have lived in a world run by and for humans, in which human sovereignty has seemed to be more important than efficiency of performance. Sovereignty and efficiency are beginning to show strong potential for conflict. For the law, this presents questions at every stage: the nature and protection of data, the manner in which it is communicated and stored, the models and algorithms according to which it is processed, the authority it is given to act, and the people whose labor it replaces. Conceptually, this evolution presents legal academics with questions ranging from the nature of society (human centered or efficiency centered?) and data to the shape of technical rules for the construction of algorithms and the supervision of automated execution.

The CUHK Law Faculty’s Centre for Financial Regulation and Economic Development (CFRED) created Machine Lawyering in 2017 to offer an umbrella under which all of the above issues can be explored. Participants in the first Machine Lawyering conference, January 2020, presented papers on issues from the consciousness of artificial intelligence to the ability of fintech applications to manipulate market prices and the use of automation in the practice of law. It created a very effective cross-pollination of ideas for all participants. Machine Lawyering’s second conference in 2021 will be equally broad.

We hope to hold the conference physically in Hong Kong, but will also work through virtual presentations, so please do not let uncertainty regarding travel dissuade you from submitting a paper. A number of best paper prizes (round trip economy to Hong Kong and three nights’ accommodation) will be offered.

These are the RELEVANT DATES:
Abstract (max 350 words) submission: 14 September 2020
Notification of acceptance: 16 October 2020
Paper submission for best paper prize 2 November 2020
Registration deadline for accepted authors: 7 December 2020
Draft paper submission for all authors: 8 January 2021

TOPICS: For the avoidance of doubt, papers on the following topics are within the conference scope:
– Algorithms and profiling (from KYC to oppression)
– Big Data and data analytics in finance, regulation, scholarship and society
– Competition law in the big data industries
– Cybersecurity
– Emerging alternative finance
– Facial recognition: special problems
– Intellectual property in automated systems
– Legal treatment of AI-driven operations and services
– Nature and regulation of payment systems (cryptoassets, stored value, mobile payment, non-bank systems)
– Personal data, including nature, collection, processing and use
– Social consequences of automation for the workforce
– Special problems of distributed ledger technology

SUBMISSION PROCEDURE: Please submit abstracts via email to Bonnie Leung at Please provide an attached file with an anonymous, blind copy of your abstract, but indicate your name, contact details, and the proposed paper title in your cover email. The subject line of the email should state “Submission for Machine Lawyering Conference.” Further information is available on the Machine Lawyering website.

Research Assistant – Open Position

Position closed


Seeking a research assistant for a project by Dr. Gabrielle Gayer, Prof. Offer Lieberman and Prof. Itzhak Gilboa — “אמידת כללים: טכניקות מבוססות-עצים כמודלים קוגניטיביים” .

Project includes analysis of large database of real-estate prices in Australia and applying prediction models to estimate prices via machine learning techniques such as decision trees and random forests.

65%-75% position for up to three years.

Proper background in programming and data science required.

If interested please contact Gabi at





מגשימים AI

קול קורא למנטורים בתחום בינה מלאכותית: למידת מכונה ומדעי הנתונים

אמ;לק: אנחנו מחפשים את עזרתך להנחות מקצועית זוג תלמידי י”ב מצטיינים בפרויקט AI. שעה בשבוע בזום. (הרשמה)

(אנו משתמשים בלשון זכר לשם הקיצור, הפניה היא לזכר ונקבה כאחד.) ההנחיה מתאימה לתלמידי מאסטר ודוקטורט, פוסטדוקים וחברי סגל בתחום למידת מכונה ומדעי הנתונים.

מזה עשר שנים פועלת בישראל תוכנית לאומית בשם מגשימים להכשרת תלמידי תיכון בעלי יכולות גבוהות בתחום מדעי המחשב המגיעים מהפריפריה החברתית. תלמידים שעוברים את המיונים, עוברים הכשרה במקביל ללימודי התיכון בתחומי מדעי המחשב, רשתות, סייבר ותכנות. בכיתה י”ב הם מממשים פרוייקט מעשי בהיקף נרחב, שעליו הם מקבלים ציון בגרות. את הפרויקט מלווים “מנטורים” מקצועיים שמייעצים לחניכים בהיקף של שעה בשבוע. 70% מבוגרי התוכנית מתגייסים למערכי הסייבר במערכת הביטחון. בעבר, רוב הפרויקטים היו בתחומי הסייבר, אך כיום כשני שליש מהחניכים מעוניינים בפרויקטים בתחומי למידה ובינה מלאכותית. 

הדרישה הגוברת הזו לפרוייקטי י”ב  בתחום ה- AI מהווה הזדמנות מיוחדת להכשיר תלמידים. לכן, ביחד עם מגשימים, אנחנו בונים הרחבה של התוכנית והפרוייקטים לתחום. ישראל נמצאת היום בתנופה בתחומי הבינה המלאכותית, עם ביקוש עצום למומחיות בתחומי למידת מכונה ומדעי הנתונים ומחסור בידע נגיש. יש לנו הזדמנות נדירה שמשיגה מספר מטרות: להנגיש את הידע האקדמי לתלמידים, לפתח אצל סטודנטים חוקרים את היכולת להנחות פרוייקטים, לקדם פריפריה חברתית בישראל, וליצור ידע בעברית שיהיה נגיש למתחילים בתחומי ה- DS/ML. תוכנית מגשימים כבר הוכיחה שיש להם תהליך מנצח כולל איתור ושימור תלמידי מרקעים לא פשוטים, ואפשר לרתום אותה כדי לקדם AI בישראל, ואת ישראל ב- AI.  

אנחנו מחפשים מנטורים שיכולים להנחות מקצועית את החניכים במשך כשעה בשבוע בזום. התפקיד כולל הכוונה מקצועית, איפיון בעיית הלמידה, הכוונה באילו ארכיטקטורות ואלגוריתמים להשתמש וכיצד לבחון את המודלים שנלמדו.

עבורך, זוהי הזדמנות להתנדב ולתרום לחברה הישראלית, לחלוק את הידע שלך, להתנסות בהנחייה של פרויקטים בתחום הלמידה, לגדל דור חדש בתחום הלמידה והבינה המלאכותית, ולסייע במקום שבו למומחיות שלך אין תחליף. 

למתעניינים נספק מידע נוסף והנחיות.

מידע מפורט על תוכנית מגשימים סייבר נמצא כאן אתר התוכנית כאן.

אם זה יכול לעניין אותך, השנה או בשנים הבאות, אנא מלא את הטופס בקישור הבא.

פרופ’ גל צ’צ’יק 

פרופ’ שי מנור

דר’ חגי מרון

דר’ גל דלל