Generalize T5 modules (#5166)

* initial commit * general self attn * fixing bugs, adding tests, adding docs * updating other modules * refactor * bug fix * update changelog * fix shape * fix format * address feedback * small doc fix * Update allennlp/modules/transformer/transformer_stack.py Co-authored-by: Pete <petew@allenai.org> * remove old file Co-authored-by: epwalsh <epwalsh10@gmail.com> Co-authored-by: Pete <petew@allenai.org>
allenai · Jun 2, 2021 · b0aa1d4 · b0aa1d4
1 parent 5b111d0
commit b0aa1d4
Show file tree

Hide file tree

Showing 13 changed files with 861 additions and 472 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -37,6 +37,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Added a `min_steps` parameter to `BeamSearch` to set a minimum length for the predicted sequences.
 - Added the `FinalSequenceScorer` abstraction to calculate the final scores of the generated sequences in `BeamSearch`. 
 - Added `shuffle` argument to `BucketBatchSampler` which allows for disabling shuffling.
+- Added `allennlp.modules.transformer.attention_module` which contains a generalized `AttentionModule`. `SelfAttention` and `T5Attention` both inherit from this.
 - Added a `Constraint` abstract class to `BeamSearch`, which allows for incorporating constraints on the predictions found by `BeamSearch`,
   along with a `RepeatedNGramBlockingConstraint` constraint implementation, which allows for preventing repeated n-grams in the output from `BeamSearch`.
 - Added `DataCollator` for dynamic operations for each batch.

diff --git a/allennlp/modules/transformer/__init__.py b/allennlp/modules/transformer/__init__.py
@@ -131,7 +131,7 @@ def forward(self, token_ids: torch.LongTensor, mask: torch.BoolTensor):
     TransformerEmbeddings,
     ImageFeatureEmbeddings,
 )
-from allennlp.modules.transformer.self_attention import SelfAttention
+from allennlp.modules.transformer.attention_module import SelfAttention, T5Attention
 from allennlp.modules.transformer.activation_layer import ActivationLayer
 from allennlp.modules.transformer.transformer_layer import AttentionLayer, TransformerLayer
 from allennlp.modules.transformer.transformer_stack import TransformerStack