Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in TinyBert data augmentation? #141

Closed
gowtham1997 opened this issue Aug 27, 2021 · 2 comments · Fixed by #147
Closed

Bug in TinyBert data augmentation? #141

gowtham1997 opened this issue Aug 27, 2021 · 2 comments · Fixed by #147

Comments

@gowtham1997
Copy link
Contributor

gowtham1997 commented Aug 27, 2021

Hello,

tokenized_text = self.tokenizer.tokenize(sent)
tokenized_text = ['[CLS]'] + tokenized_text
tokenized_len = len(tokenized_text)
tokenized_text = word_pieces + ['[SEP]'] + tokenized_text[1:] + ['[SEP]']
if len(tokenized_text) > 512:
tokenized_text = tokenized_text[:512]

In line 154, the tokenized text is sliced to len <= 512, if it exceeds 512 tokens but the corresponding tokenized_len in line 149 is not updated.

The segment_ids in the subsequent lines seem to be using the un-updated tokenized_len and causing errors in the forward pass.

 File "/lusnlsas/paramsiddhi/iitm/vinodg/glue_data_generation/plm/TinyBERT/transformer/modeling.py", line 361, in forward
    embeddings = words_embeddings + position_embeddings + token_type_embeddings
RuntimeError: The size of tensor a (512) must match the size of tensor b (763) at non-singleton dimension 1

^ this bug occurs when I try to generate data augmentations using the bert-base-cased model.

@zwjyyc
Copy link
Contributor

zwjyyc commented Sep 17, 2021

Thanks! We agree with your comment and will fix this bug. Your pull request is also welcome.

@gowtham1997
Copy link
Contributor Author

@zwjyyc I've submitted a pull request for the same. Can you please help review this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants