Bug in TinyBert data augmentation? #141

gowtham1997 · 2021-08-27T13:40:35Z

Hello,

Pretrained-Language-Model/TinyBERT/data_augmentation.py

Lines 147 to 154 in 54ca698

    
           tokenized_text = self.tokenizer.tokenize(sent) 
        
           tokenized_text = ['[CLS]'] + tokenized_text 
        
           tokenized_len = len(tokenized_text) 
        
           tokenized_text = word_pieces + ['[SEP]'] + tokenized_text[1:] + ['[SEP]'] 
        
           if len(tokenized_text) > 512: 
        
               tokenized_text = tokenized_text[:512]

In line 154, the tokenized text is sliced to len <= 512, if it exceeds 512 tokens but the corresponding tokenized_len in line 149 is not updated.

The segment_ids in the subsequent lines seem to be using the un-updated tokenized_len and causing errors in the forward pass.

 File "/lusnlsas/paramsiddhi/iitm/vinodg/glue_data_generation/plm/TinyBERT/transformer/modeling.py", line 361, in forward
    embeddings = words_embeddings + position_embeddings + token_type_embeddings
RuntimeError: The size of tensor a (512) must match the size of tensor b (763) at non-singleton dimension 1

^ this bug occurs when I try to generate data augmentations using the bert-base-cased model.

The text was updated successfully, but these errors were encountered:

zwjyyc · 2021-09-17T02:11:29Z

Thanks! We agree with your comment and will fix this bug. Your pull request is also welcome.

gowtham1997 · 2021-09-27T17:46:27Z

@zwjyyc I've submitted a pull request for the same. Can you please help review this?

gowtham1997 mentioned this issue Sep 27, 2021

Fix augmentation bug and support cased models #147

Merged

jxfeb closed this as completed in #147 Oct 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in TinyBert data augmentation? #141

Bug in TinyBert data augmentation? #141

gowtham1997 commented Aug 27, 2021 •

edited

Loading

zwjyyc commented Sep 17, 2021

gowtham1997 commented Sep 27, 2021

Bug in TinyBert data augmentation? #141

Bug in TinyBert data augmentation? #141

Comments

gowtham1997 commented Aug 27, 2021 • edited Loading

zwjyyc commented Sep 17, 2021

gowtham1997 commented Sep 27, 2021

gowtham1997 commented Aug 27, 2021 •

edited

Loading