Changed how checklist balance is used in action prediction #1204

pdasigi · 2018-05-12T21:58:17Z

Summary of changes:

Made the output projection layer the same in both mml and erm parsers for both nlvr and wikitables parsers
The checklist balance in nlvr (and unlinked checklist balance in wikitables) is now used to select appropriate action embeddings, the sum of which is added to predicted action embedding.
There are now learned scalar parameters that regularize the addition to predicted embedding (in nlvr and wikitables) and action logits (in wikitables).

matt-gardner

I like the simplification to the code; how does this actually affect performance?

matt-gardner · 2018-05-13T04:21:52Z

allennlp/models/semantic_parsing/nlvr/nlvr_decoder_step.py

-        the current hidden state, attended encoder input and the current checklist balance into the
-        action space. The size of the checklist balance vector is the same as the number of
-        terminals. This is not needed if we are training the parser using target action sequences.
+    use_coverage : ``bool``


It'd be nice to mention this is optional here, and give the default value.

matt-gardner · 2018-05-13T04:28:42Z

allennlp/models/semantic_parsing/nlvr/nlvr_decoder_step.py

@@ -132,10 +127,12 @@ def take_step(self,  # type: ignore
        # action_mask: (group_size, num_embedded_actions)
        action_embeddings, embedded_action_mask = self._get_action_embeddings(state,
                                                                              global_actions_to_embed)
-        action_query = self._get_action_query(state, hidden_state, attended_sentence)
+        action_query = torch.cat([hidden_state, attended_sentence], dim=-1)
        # (group_size, action_embedding_dim)
        predicted_action_embedding = self._output_projection_layer(action_query)


Not related to this PR, but you probably want a non-linearity here. See #1150, where I added this to the wikitables model. There's another spot where you probably don't have one but probably want one, too.

And, now that I think about it, because this is getting a dot product with action embeddings, I wonder if relu isn't the best non-linearity. We're basically saying that any embedding dimension with a negative value is entirely ignored, and thus artificially constraining the space that's available to the dot product... Maybe tanh would be better?

Did you find that adding relus worked better? In my other PR I added dropout and a relu here to match the wikitables parser, and that seemed to give slightly better results.

Adding a non-linearity at the decoder input as well, and made both of them tanh.

I ran it with tanh instead of relu last night, and it didn't really change anything. It looked like it was learning faster - epoch 1 performance was a bit higher - but in the end, final performance was within the normal variance that I've seen across runs. So, it probably doesn't matter.

matt-gardner · 2018-05-13T04:34:41Z

allennlp/models/semantic_parsing/nlvr/nlvr_decoder_step.py

-
+        if state.checklist_state[0] is not None:
+            embedding_addition = self._get_predicted_embedding_addition(state)
+            predicted_action_embedding += self._checklist_embedding_multiplier * embedding_addition


Don't ever use += on a tensor. It doesn't do what you expect. Use x = x + y instead.

Yes, I figured out that += causes in-place updates, messing up the computation graph while x = x+y does not. I changed the other places I did in-place updates, but I guess I missed these two. Thanks, fixed them.

matt-gardner · 2018-05-13T04:38:55Z

allennlp/models/semantic_parsing/wikitables/wikitables_decoder_step.py

+            embedding_addition = self._get_predicted_embedding_addition(state,
+                                                                        self._unlinked_terminal_indices,
+                                                                        unlinked_balance)
+            predicted_action_embedding += self._unlinked_checklist_multiplier * embedding_addition


Don't use +=.

pdasigi · 2018-05-13T15:49:35Z

@matt-gardner On NLVR, the variant with this change is at least slightly better (about 0.5 pp) than the run without this change. The experiment is still running though.

) * changed action projection in nlvr * removed +=

pdasigi requested a review from matt-gardner May 12, 2018 21:58

pdasigi mentioned this pull request May 12, 2018

Added dropout to the NLVR parser #1205

Merged

matt-gardner reviewed May 13, 2018

View reviewed changes

pdasigi force-pushed the better_balance_projection branch from 22fd0f2 to 7ef6b24 Compare May 13, 2018 15:39

matt-gardner approved these changes May 13, 2018

View reviewed changes

pdasigi added 2 commits May 13, 2018 11:03

changed action projection in nlvr

4c928b7

removed +=

ada48af

pdasigi force-pushed the better_balance_projection branch from 7ef6b24 to ada48af Compare May 13, 2018 18:03

pdasigi merged commit 2dda95b into allenai:master May 13, 2018

pdasigi deleted the better_balance_projection branch May 13, 2018 18:14

gabrielStanovsky pushed a commit to gabrielStanovsky/allennlp that referenced this pull request Sep 7, 2018

Changed how checklist balance is used in action prediction (allenai#1204

a51099b

) * changed action projection in nlvr * removed +=

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changed how checklist balance is used in action prediction #1204

Changed how checklist balance is used in action prediction #1204

pdasigi commented May 12, 2018

matt-gardner left a comment

matt-gardner May 13, 2018

matt-gardner May 13, 2018

matt-gardner May 13, 2018

pdasigi May 13, 2018 •

edited

Loading

pdasigi May 13, 2018

matt-gardner May 13, 2018

matt-gardner May 13, 2018

pdasigi May 13, 2018

matt-gardner May 13, 2018

pdasigi May 13, 2018

pdasigi commented May 13, 2018

Changed how checklist balance is used in action prediction #1204

Changed how checklist balance is used in action prediction #1204

Conversation

pdasigi commented May 12, 2018

matt-gardner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pdasigi May 13, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pdasigi commented May 13, 2018

pdasigi May 13, 2018 •

edited

Loading