Skip to content

Commit

Permalink
Truncate sentences longer then the maximum size
Browse files Browse the repository at this point in the history
  • Loading branch information
Pligabue committed Jan 7, 2024
1 parent dcf0bbc commit bad0ef8
Showing 1 changed file with 1 addition and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ def split_on_predicate(self, sentence) -> list[str]:
return trimmed_split

def format_input(self, sentence: str) -> SentenceInput:
return bert.tokenizer.encode(sentence, padding="max_length", max_length=self.sentence_size)
return bert.tokenizer.encode(sentence, padding="max_length", truncation=True, max_length=self.sentence_size)

def format_inputs(self, sentences: list[str]) -> SentenceInputs:
return [self.format_input(sentence) for sentence in sentences]
Expand Down

0 comments on commit bad0ef8

Please sign in to comment.