Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix convert_token_type_ids_from_sequences for fast tokenizers #4503

Merged
merged 1 commit into from
May 22, 2020

Conversation

n1t0
Copy link
Member

@n1t0 n1t0 commented May 21, 2020

Before this fix, the generic version of convert_token_type_ids_from_sequences from tokenizer_utils gets called when called on a PreTrainedTokenizerFast. The type_ids for the special token are thus not included.
There is no way at the moment to get this information from the rust tokenizers, so we just use the implementation from the original python tokenizers. Tests added as well.

Thanks @dirkgr for reporting this.

@codecov-commenter
Copy link

codecov-commenter commented May 21, 2020

Codecov Report

Merging #4503 into master will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4503      +/-   ##
==========================================
+ Coverage   77.83%   77.86%   +0.02%     
==========================================
  Files         123      123              
  Lines       20514    20526      +12     
==========================================
+ Hits        15968    15982      +14     
+ Misses       4546     4544       -2     
Impacted Files Coverage Δ
src/transformers/tokenization_bert.py 95.00% <100.00%> (+0.12%) ⬆️
src/transformers/tokenization_roberta.py 94.52% <100.00%> (+0.49%) ⬆️
src/transformers/modeling_tf_utils.py 88.66% <0.00%> (+0.16%) ⬆️
src/transformers/file_utils.py 73.85% <0.00%> (+0.41%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a086527...795f44a. Read the comment docs.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @n1t0

@LysandreJik LysandreJik merged commit 35df911 into master May 22, 2020
@LysandreJik LysandreJik deleted the fix-convert-typeids-fast branch May 22, 2020 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants