« All Events
Learning Club BIU — NLP Student Session 26.1
January 26 @ 12:00 pm - 1:00 pm IST
Title: Unsupervised Distillation of Syntactic Information From Contextualized Neural Representation.
Abstract: Contextualized word representations, such as ELMo and BERT, were shown to perform well on a various of semantic and structural (syntactic) task. In this work, we tackle the task of unsupervised disentanglement between semantics and structure in neural language representations: we aim to learn a transformation of the contextualized vectors, that discards the lexical semantics, but keeps the structural information. To this end, we automatically generate groups of sentences which are structurally similar but semantically different, and use metric-learning approach to learn a transformation that emphasizes the structural component that is encoded in the vectors. We demonstrate that our transformation clusters vectors in space by structural properties, rather than by lexical semantics. Finally, we demonstrate the utility of our distilled representations by showing that they outperform the original contextualized representations in few-shot parsing setting.
Title: Automatically Identifying Gender Bias in Machine Translation using Perturbations
Abstract: Gender bias has been shown to affect many applications in NLU. In the setting of machine translation (MT), research has primarily focused on measuring bias via synthetic datasets. We present an automatic method for identifying gender biases in MT using a novel-application of BERT-generated sentence perturbations. Using this method, we compile a dataset to serve as a benchmark for evaluating gender bias in MT across a diverse range of languages. Our dataset further serves to highlight the limitations of the current task definition which requires a single translation be produced, even in the presence of underspecified input.
Title: Learning to Navigate in Real Urban Environments Using Natural Language Directions
Abstract: According to the Universal Postal Union (UPU), the majority of people in many developing countries do not have a set address. With very few alternatives, they often rely on natural language (NL) description of the path to their house. E.g.,
“Turn right after bar and it will be the first house after the school”. What if we could automate the interpretation of such directions, allowing robots or autonomous vehicles to automatically navigate based on free NL descriptions of such routes?
The task of following NL navigation instructions requires composition of language and domain knowledge, and it raises several challenges including: grounding language to physical objects; resolving ambiguity; avoiding cascading errors; and many more. The main datasets collected for the NL navigation task so far (SAIL and HCRC) present a very simplistic, unrealistic depiction of the world, with a small fixed set of entities that are known to the navigator in advance. Such representations bypass the great complexity of navigation based on real urban maps, where an abundance of previously ungrounded and unseen entities are observed at test time.
In this work, we redefine the task of NL navigation by endorsing the complexity and challenge presented by real urban environments and we present RUN- a novel data-set with large real urban maps and richer information layers than ever before. We present an effective baseline architecture for the NL navigation task, which augments a standard encoder-decoder model with an entity abstraction layer, attention over words and worlds, and a constantly updating world-state. Our experiments show that this architecture is indeed better-equipped to treat the grounding challenge in realistic urban settings than standard sequence-to-sequence architectures.