Skip to content

This is the template code to use BERT for sequence lableing and text classification, in order to facilitate BERT for more tasks. Currently, the template code has included conll-2003 named entity identification, Snips Slot Filling and Intent Prediction.

License

Notifications You must be signed in to change notification settings

yuanxiaosc/BERT-for-Sequence-Labeling-and-Text-Classification

Repository files navigation

Template Code: BERT-for-Sequence-Labeling-and-Text-Classification

  • BERT is used for sequence annotation and text categorization template code to facilitate BERT for more tasks. Welcome to use this BERT template to solve more NLP tasks, and then share your results and code here.
  • 这是使用BERT进行序列注释和文本分类的模板代码,方便大家将BERT用于更多任务。欢迎使用这个BERT模板解决更多NLP任务,然后在这里分享你的结果和代码。

Template Code Usage Method

  1. Move google's BERT code to file bert (I've prepared a copy for you.);
  2. Download google's BERT pretrained model and unzip then to file pretrained_model;
  3. Run Code! You can change task_name and output_dir.
python run_text_classification.py \
--task_name=Snips \
--do_train=true \
--do_eval=true \
--data_dir=data/snips_Intent_Detection_and_Slot_Filling \
--vocab_file=pretrained_model/uncased_L-12_H-768_A-12/vocab.txt \
--bert_config_file=pretrained_model/uncased_L-12_H-768_A-12/bert_config.json \
--init_checkpoint=pretrained_model/uncased_L-12_H-768_A-12/bert_model.ckpt \
--max_seq_length=128 \
--train_batch_size=32 \
--learning_rate=2e-5 \
--num_train_epochs=3.0 \
--output_dir=./output/snips_Intent_Detection/

File Structure

   BERT-for-Sequence-Labeling-and-Text-Classification
  |____ bert store google's [BERT code](https://github.com/google-research/bert)
  |____ calculating_model_score store model test report
  |____ data store task data set
  |____ output store model output
  |____ pretrained_model store [BERT pretrained model](https://github.com/google-research/bert)
  |____ run_sequence_labeling.py for Sequence Labeling Task
  |____ run_text_classification.py for Text Classification Task
  |____ run_sequence_labeling_and_text_classification.py for join task (come soon!)  
  |____ tf_metrics.py for evaluation model 

How to add a new task

Just write a small piece of code according to the existing template!

Data

For example, If you have a new classification task QQP.

Before running this example you must download the GLUE data by running this script.

Code

Now, write code!

class QqpProcessor(DataProcessor):
    """Processor for the QQP data set."""

    def get_train_examples(self, data_dir):
        """See base class."""
        return self._create_examples(
            self._read_tsv(os.path.join(data_dir, "train.tsv")), "train")

    def get_dev_examples(self, data_dir):
        """See base class."""
        return self._create_examples(
            self._read_tsv(os.path.join(data_dir, "dev.tsv")), "dev")

    def get_test_examples(self, data_dir):
        """See base class."""
        return self._create_examples(
            self._read_tsv(os.path.join(data_dir, "test.tsv")), "test")

    def get_labels(self):
        """See base class."""
        return ["0", "1"]

    def _create_examples(self, lines, set_type):
        """Creates examples for the training and dev sets."""
        examples = []
        for (i, line) in enumerate(lines):
            if i == 0 or len(line)!=6:
                continue
            guid = "%s-%s" % (set_type, i)
            text_a = tokenization.convert_to_unicode(line[3])
            text_b = tokenization.convert_to_unicode(line[4])
            if set_type == "test":
                label = "1"
            else:
                label = tokenization.convert_to_unicode(line[5])
            examples.append(
                InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))
        return examples

Registration task

def main(_):
   tf.logging.set_verbosity(tf.logging.INFO)
   processors = {
       "qqp": QqpProcessor,
   }

Run

python run_text_classification.py \
--task_name=qqp \
--do_train=true \
--do_eval=true \
--data_dir=data/snips_Intent_Detection_and_Slot_Filling \
--vocab_file=pretrained_model/uncased_L-12_H-768_A-12/vocab.txt \
--bert_config_file=pretrained_model/uncased_L-12_H-768_A-12/bert_config.json \
--init_checkpoint=pretrained_model/uncased_L-12_H-768_A-12/bert_model.ckpt \
--max_seq_length=128 \
--train_batch_size=32 \
--learning_rate=2e-5 \
--num_train_epochs=3.0 \
--output_dir=./output/qqp_Intent_Detection/

Task

Welcome to add!

Task name Explain data source
CoNLL-2003 named entity recognition NER
Atis Joint Slot Filling and Intent Prediction https://github.com/MiuLab/SlotGated-SLU/tree/master/data/atis
Snips Joint Slot Filling and Intent Prediction https://github.com/MiuLab/SlotGated-SLU/tree/master/data/snips
GLUE

About

This is the template code to use BERT for sequence lableing and text classification, in order to facilitate BERT for more tasks. Currently, the template code has included conll-2003 named entity identification, Snips Slot Filling and Intent Prediction.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published