Subword Neural Machine Translation
-
Updated
Jun 20, 2017 - Python
Subword Neural Machine Translation
Effective Subword Segmentation for Text Comprehension (TASLP 2019)
Subword-augmented Embedding for Cloze Reading Comprehension (COLING 2018)
Unsupervised Word Segmentation using Minimum Description Length for Neural Machine Translation (NMT)
Keyword Search Recipe for Subword ASR
An implementation of subword division algorithm proposed in T. Mikolov (2012).
Korean text normalization and language preparation package for LM in Kaldi-based ASR system
This repository contains source code implementation of assignments for NTU's MSAI course AI6127 on Deep Neural Networks for Natural Language Processing (2019 Sem 2).
johnny - a neural network graph based DEPendency Parser
Tokenization is a way of separating a piece of text into smaller units called tokens. Here, tokens can be either words, characters, or subwords. Hence, tokenization can be broadly classified into 3 types – word, character, and subword (n-gram characters) tokenization.
A framework for generating subword vocabulary from a tensorflow dataset and building custom BERT tokenizer models.
The concept of DAWGs is based on: Blumer, A. et al. (1985). The smallest automation recognizing the subwords of a text. Theoretical Computer Science, 40, 31–55.
A causal intervention framework to learn robust and interpretable character representations inside subword-based language models
Simple-to-use scoring function for arbitrarily tokenized texts.
Add a description, image, and links to the subword topic page so that developers can more easily learn about it.
To associate your repository with the subword topic, visit your repo's landing page and select "manage topics."