Name		Name	Last commit message	Last commit date
parent directory ..
FreeTransfer-X		FreeTransfer-X
GenerateRank		GenerateRank
UniMS		UniMS
cross_aligner		cross_aligner
dylex		dylex
README.md		README.md

README.md

Natural Language Processing (NLP)

This repository provides some of the NLP techniques developed by Huawei Noah's Ark Lab.

Directory structure

FreeTransfer-X enables safe and label-free cross-lingual transfer from off-the-shelf models, which is published in NAACL Findings 2022.
XeroAlign allows for efficient SOTA zero-shot cross-lingual transfer with machine translated pairs via a simple and lightweight auxiliary loss, originally published in ACL Findings 2021.
CrossAligner is an extension of XeroAlign (above) with a more effective NER (slot tagging) alignment based on machine translated pairs, new labels/objective derived from English labels and a SOTA weighted combination of losses. Additional analysis in the appendix, please read our ACL Findings 2022 paper).
DyLex
Generate&Rank is a multi-task framework for math word problems (MWP) based on a generative pre-trained language model. By joint training with generation and ranking, the model learns from its own mistakes and is able to distinguish between correct and incorrect expressions. Please find more details in the EMNLP Findings 2021 paper.
UniMS a unified multimodal summarization framework with an encoder-decoder multitask architecture on top of BART, which simultaneously outputs extractive and abstractive summaries, and image selection results. Our framework adopts knowledge distillation to improve image selection without any requirement on the existence and quality of image captions. We further introduce the extractive objective in the encoder and visual guided attention in the decoder to better integrate both textual and visual modalities in the conditional text generation. Our unified method achieves a new state-of-the-art result of multimodal summarization, and more details can be found in the AAAI 2022 paper.
SumTitles
Conversation Graph allows for effective data augmentation, training loss 'augmentation' and a fairer evaluation of dialogue mamagement in a modular conversational agent. We introduce a novel idea of a convgraph to achieve all that. Read more in our TACL 2021 paper.
FreeGBDT investigates whether it is feasible (or superior) to replace the conventional MLP classifier head used with pretrained transformers with a gradient-boosted decision tree. Want to know if it worked? Take a look at the ACL Findings 2021 paper!
Dual Transfer
MultiUAT
PERT is a transformer-based solution for pinyin-to-character conversion task which is core to the Chinese input method.
Maha_OOD is unsupervised method for out-of-domain detection for intents. The code to reproduce the results from the paper AAAI 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NLP

NLP

README.md

Natural Language Processing (NLP)

Directory structure

Files

NLP

Directory actions

More options

Directory actions

More options

Latest commit

History

NLP

Folders and files

parent directory

README.md

Natural Language Processing (NLP)

Directory structure